You are currently viewing Data loading with pandas

Data loading with pandas

Pandas is a suitable python library for data analysis, especially while dealing with tabular data. Nowadays, Pandas is widely used in data analysis for data loading, cleaning, visualization… Here is different approaches on how to load data using pandas:

1- Identify data source

You may start data loading process by identifying where the data to load is located (data source). For example, the data you want to load may be stored in files, databases, website or APIs. In this article the tabular data (data in table format) is covered as most of times the data processed in data analysis is in tabular format.

2- Install pandas and select the pandas function to use

After knowing the data source, the second step to take is to install pandas and select the function to use for data reading.

  • text files : install pandas and use pandas function read_csv to load data from the file. 
  • web scraping: install pandas and use the read_html function to read all html tables. 
  • databases: install pandas and use the read_sql function to load the data

If you like doing data analysis using python like me, here is the book I recommend: Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition

3- Worked examples:

First of all intall and import pandas by runing this code:

pip install pandas
import pandas as pd

or if you are using conda run this code:

conda install -c conda-forge pandas
import pandas as pd

  • Text file loading: Let’s load data from this csv file.

To load a csv file we use the read_csv function of the pandas library. The function reads a comma-separated values (csv) file into DataFrame. The function takes the file path as a string argument. The string could be a URL. We saved the resulting DataFrame to df variable.

df = pd.read_csv("https://raw.githubusercontent.com/zarifoudjibril/blog_files/main/dummy_file.csv")

let’s display the df DataFrame:

df

If you run the code on jupyter notebook, you will get the table below:

we can read the first 2 rows by passing 2 to to the head function:

df.head(2)

Here is the result you should obtain:

  • Web scraping: Let’s load the table containing the list of African countries per population from this wikipedia page.

To do so, let’s use the read_html pandas function which reads HTML tables into a list of DataFrame objects. The function take a string as a first mendatory argument. The string can represent a URL or the HTML itself. The output is same in the variable list_df.

list_df = pd.read_html("https://en.wikipedia.org/wiki/List_of_African_countries_by_population")

Let’s first see the number of items in the list by using len function:

len(list_df)

The output is 9. By browsing the wikipedia page we can notice that the table we are willing to load is the first table on the page. Now let’s use the list indexing to fetch our DataFrame and tail function to display the last five rows.

df2 = list_df[0]
df2.tail()

The output looks like this:

  • Database interactions: To show you an example we will fetch some data from this database stored in the SQLite database engine.

To do so let’s first import sqlite3 and establish a connection with the database. Our database is nammed dummybase.db database.

import sqlite3
con = sqlite3.connect("dummybase.db")

After establishing the connection let’s load data from a table of the dummybase database using the read_sql pandas function. The database contains a table named Persons. The function read SQL query or database table into a DataFrame. Let’s also save the resulting DataFrame to the df3 variable.

df3 = pd.read_sql('SELECT * FROM Persons', con)

Now let’s display the df3 DataFrame.

df3

The output is like this:

4- Some useful pandas data loading functions to know

Here are few Pandas functions you may need for some most common files types.

ToUse
Read an Excel file into a pandas DataFrameread_excel()
Convert a JSON string to pandas objectread_json()
Read XML document into a DataFrame objectread_xml()

Feel free to find more information from Pandas documentation if the function you are looking for is not mentionned above.

If you like doing data analysis using python like me, here is the book I recommend: Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition

As an Amazon Associate, I earn from qualifying purchases.

Leave a Reply