Walkthrough on setting up a new Jupyter environment for data analyses with NumPy, pandas, matplotlib, and their counterparts.
Recently it happen ever more often that I need to analyse data coming from various sources and drawing conclusions for making technical decision. Python, Jupyter notebooks and a few ubiquitous libraries such as NumPy are the tools I have been using for this purpose.
This article is mostly a note to myself in the hope that it's going to be useful for others who wish to pursue such endeavors.
I'll provide the code snippets below in GitHub gists so that copy / paste wouldn't spoil them.
Most of the time I call it lab
, explore
or something alike and place it inside the project's root where data
analyses is necessary.
The following is a shell script init.sh
that creates a new virtual environment from scratch. I generally prefer to enumerate the packages explicitly instead of
relying on requirements.txt
. This way I've got a better sense of what's actually been installed. Once the project is
setup, it's okay to check requirements.txt
into version control.
#!/bin/bash
env_name="venv"
python3 -m venv ${env_name}
source ${env_name}/bin/activate
if [ ! -f requirements.txt ]; then
pip install numpy pandas matplotlib scipy scikit-learn jupyter jupyter-datatables
pip freeze > requirements.txt
else
pip install -r requirements.txt
fi
python -m ipykernel install --prefix ./${env_name}
It's equally useful to have a startup (start.sh
)
script for the newly created environment.
#!/bin/bash
env_name="venv"
source ${env_name}/bin/activate
${env_name}/bin/jupyter notebook
Well, that's it folks.
If you like Java and Spring as much as I do, sign up for my newsletter.