You are currently viewing Setup an environment for data analysis

Setup an environment for data analysis

Creating an efficient data analysis environment is a crucial step for anyone diving into the realm of data exploration and interpretation. Whether you’re a seasoned data scientist or just beginning your journey into the world of analytics, having a well-structured setup can significantly enhance your productivity and the quality of your insights. In this blog, we’ll explore the key components and considerations for setting up a robust data analysis environment that maximizes your analytical potential. From selecting the right tools and platforms to configuring your workspace for optimal efficiency, we’ll cover everything you need to know to establish a solid foundation for your data analysis endeavors.

BOOK RECOMMENDATIONS

As an Amazon Associate, I earn from qualifying purchases.

1. Miniconda Installation

Firstly, install Miniconda which is a free minimal installer for conda. It is a small bootstrap version of Anaconda. It includes conda, Python, the packages they both depend on, and a small number of other useful packages (like pip, zlib, and a few others).

  • Installation on macOS

Open your mac Terminal and run the four commands below to install Miniconda.

mkdir -p ~/miniconda3
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

To activate Miniconda after the installation is completed, type the following on your Teminal.

source ~/miniconda3/bin/activate

If everything works correctly you should be in the base environment of conda. You can try to type python to see if it works.

python

If it is well done you should see the version of Python on your machine and the >>> at the end indicating that you are in the classical Python interpreter. You can exit Python interpreter by typing the following:

exit()

Well done you have Miniconda installed. If you are using Window here is the way to install Miniconda.

  • Installation on Windows

Open the Command Prompt and run the three commands below to install Miniconda.

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe -o miniconda.exe
start /wait "" miniconda.exe /S
del miniconda.exe

After the installation is completed, open Anaconda Prompt (Miniconda3) from the start menu or by searching it in the search bar. If everything works correctly you should be in the base environment of conda. You can try to type python to see if it works.

python

If is well done you should the version of Python on your machine and the >>> at the end indicating that you are in the classical Python interpreter. You can exit Python interpreter by typing the following:

exit()

Well done you have miniconda intalled.

At this step we have Python running on our machine as you have seen but for data analysis we need to add some packages and for the purpose of following best practice we need also to create an environment.

2. Miniconda configuration and environment creation

Starting from this step, it doesn’t matter whether you are on Window or Mac, just make sure the Miniconda is activated. This means that you have (base) at the starting of the command line (you are in the conda base environment).

Firstly, let’s configure conda-forge as our default package channel (we do this to still be able to install packages that may be missing on the “defaults” channel of Miniconda) by running the following:

conda config --add channels conda-forge 
conda config –-set channel_priority strict

Secondly, let’s create a new environment called data-env (feel free to choose the name you like). By doing so, we can decouple and isolate Python installs and associated pip or packages. This allows us to install and manage our packages that are independent of those provided by the system or used by other projects. Here is the command for this.

Conda create -y --name data-env python=3.11

The -y flag mean we do not want to be asked for confirmation. You change 3.11 by the another Python version you want. This step may take several minutes to finish. If everything went correctly you will see this at the end.

Now let’s activate the created environment by running the following code:

conda activate data-env

Do not forget to change data-env with the name of your environment if you have chosen a different name.

At this level you may notice that the (base) at the starting of the command line is changed into (data-env). This means that you are currently in the newly created environment. You can switch between available environments by using conda activate env_name, with env_name the name of the environment to which you want to switch.

Every time you open your Terminal/Command Prompt make sure you activate the environment in which you want to work.

3. Installing useful packages

Again make sure you are still in the data-env environment. First of all, let’s install Jupyter notebook, an important web-based application for coding and documents sharing by running the command below. As a data analyst, you can use Jupyter notebook for coding, visualization, documentation and sharing.

conda install -y notebook

Now let’s install some packages frequently used in data analysis by running the following command.

conda install -y pandas numpy matplotlib beautifulsoup4 lxml html5lib requests sqlalchemy seaborn scipy statsmodels scikit-learn openpyxl

Note that some packages are not available through conda, then you can run use pip install instead of conda install for those packages.

If you have the packages already installed you can run the code below to update them. Replace package_name with the name of the package to update.

To avoid environment problems, if you installed a library/package using pip, use pip to update it and if you installed it using conda then use conda to update it.

conda update package_name
pip install --upgrade package_name

4. Open Jupyter notebook

Make sure you are in the environment you have created previously (data_env or the name you have chosen) then just type the following:

jupyter notebook

This will start a new browser window (or a new tab) showing the Notebook Dashboard. That’s it enjoy your new python environment.

BOOK RECOMMENDATIONS

As an Amazon Associate, I earn from qualifying purchases.

Leave a Reply