How to setup a Jupyter notebook for exploring data

Walkthrough on setting up a new Jupyter environment for data analyses with NumPy, pandas, matplotlib, and their counterparts.

1. Introduction

Recently it happen ever more often that I need to analyse data coming from various sources and drawing conclusions for making technical decision. Python, Jupyter notebooks and a few ubiquitous libraries such as NumPy are the tools I have been using for this purpose.

This article is mostly a note to myself in the hope that it's going to be useful for others who wish to pursue such endeavors.

2. Steps

I'll provide the code snippets below in GitHub gists so that copy / paste wouldn't spoil them.

2.1. Create an empty directory

Most of the time I call it lab, explore or something alike and place it inside the project's root where data analyses is necessary.

2.2. Initialize a virtual Python environment

The following is a shell script init.sh that creates a new virtual environment from scratch. I generally prefer to enumerate the packages explicitly instead of relying on requirements.txt. This way I've got a better sense of what's actually been installed. Once the project is setup, it's okay to check requirements.txt into version control.

#!/bin/bash

env_name="venv"
python3 -m venv ${env_name}
source ${env_name}/bin/activate

if [ ! -f requirements.txt ]; then
  pip install numpy pandas matplotlib scipy scikit-learn jupyter jupyter-datatables
  pip freeze > requirements.txt
else
  pip install -r requirements.txt
fi

python -m ipykernel install --prefix ./${env_name}

2.3. Start the notebook environment

It's equally useful to have a startup (start.sh) script for the newly created environment.

#!/bin/bash

env_name="venv"
source ${env_name}/bin/activate

${env_name}/bin/jupyter notebook

3. Conclusion

Well, that's it folks.