The bioinformatics community offers a wealth of tools, each honed to perform a specific function. Performing complex tasks will invariably involve passing your data from one of these tools to another – along with suitable parameters – and writing some scripts to connect the pieces. To record this sequence of steps and describe the results, a log or README is usually written. This certainly gets the job done, but I argue there is a better way to create and record workflows involving a mixture of command line tools, scripting languages, and written narrative: Jupyter Notebook. The Jupyter Notebook is a browser-based command shell for interactive computing in several languages: Python, bash, R, Julya, Haskell, Ruby, and more. To provide a feel for what Jupyter Notebook can do I’ll first present an overview of the user interface. The second part of this blog post will discuss use cases.

 

Note: you can view this blog post as a notebook here

Overview

Once you’ve got Jupyter Notebook installed (instructions here or try it out in your browser without installing anything here) you can start a notebook server from the terminal with $ jupyter notebook. With a new notebook opened, you’ll see something like this:

Cells

The cell (the box with In [ ]) at the top of the page is the place to provide some sort of input:

  • code
  • terminal commands
  • markdown text
  • mathematical expressions
  • html

Pressing shift-enter will compute the input will and return results in an output cell as shown below:

A useful aspect of this input/output configuration is that after saving, closing and reopening the notebook, all of your work will be reproduced immediately without the need to recompute anything again. In addition, with the rich text formatting (markdown, latex math), you are able to discuss code, output and plots alike:

Terminal commands

If you are using the Python kernel, Jupyter will interpret any cells that start with ! as terminal commands and %%bash as bash commands. This enables you to quickly change directories, list directory contents, do some awk, submit jobs and check on job status all from one notebook (and record your activity too). Note that this doesn’t require a kernel change, other cells will still be interpreted as Python.

Multiple programming language support

You’ll notice that at the top of the page is the word Python. This indicates which kernel is being used. You can change this at any point Kernel > Change kernel (to install more kernels, following instructions here). This ability enables you in one notebook to:

  • prepare some data with Python
  • run command line tools
  • change kernel
  • analyze and plot the output with R

Export and share

As well as the standard ipynb file type, a notebook can be exported in html, pdf and markdown. These exports make it very convenient to share all computational aspects of some analysis with collaborators.

If your analysis can be in the pubic domain, you can push your code to github repository where it can be hosted and rendered for free with the nbviewer. This blog post, for example, is based on a Jupyter notebook in a github repository and is rendered here.

 

Magic, Autcomoplete and Help

To draw this feature showcase to a close it is worth mentioning the following utilities:

Autocompletion

Pressing <tab> after typing a few characters will present:

  • available variables that match your characters
  • a list of modules/functions for a package

Magic

There are several built-in Jupyter functions called magic commands, all preceded by %%. Particularly useful are:

%%perl

Executes the input cell contents as a perl subprocess, without needing to change kernel.

%%who_ls

Lists all interactive variables entered so far.

%%timeit

Times function execution, useful when optimizing code to be as quick as possible.

Help

A time saving feature is immediate access to the help or docstring. Typing a question mark after any object, function or method will return the docstring within the browser.

Use Cases

Personal log

The features described above make a compelling case for a one-stop-shop solution to recording your daily computational work. Often scripts, results, logs are in separate files. Jupyter, on the other hand, provides a unifying platform to view all of these.

Education and training

Being able to present a mix of narrative, code, and output, a notebook is a great vehicle for teaching. If the learner has the .ipynb file, it is a great way to modify and play around with the code and observe immediately observe the effects. There are some great examples of using a notebook as an educational medium:

An extensive gallery of interesting IPython Notebooks has been collected and is available here

Learning material is rarely static and is constantly being evolved. Adding a notebook to a publicly available github repository would be an excellent way to iteratively improve material, as is done for several of the notebooks listed above.

Reproducibility

Providing a self contained notebook that lists all steps performed in an analysis is a great way to supplement papers. Several papers currently have notebooks included in the supplementary material. For example, The probability of improvement in Fisher’s geometric model: A probabilistic approach by Yoav Ram and Lilach Hadany has provided an excellent publicly available notebook which details the steps used to generate a figure from their article.

Conclusion

Jupyter notebooks can serve to record daily computational work, provide an excellent teaching platform, and capture and share computational steps in supplementary material. As most of science is becoming increasingly computational, tools such as Jupyter Notebook will help to record and share the many computations a scientist will perform on a daily basis.

Resources and Further Reading

1 Comment

  1. Amazing!