Blog, Python

Start to use pandas! — Say GoodBye to Excel in 2024!

Excel in 2024 — Nah! Let’s learn and use Pandas Library in Python!

Pandas is a powerful open-source data manipulation and analysis library for Python. It is widely used for cleaning, aggregating, and transforming data, and is an essential tool for many data scientists and analysts.

We can handle any task when is time taking in excel and more important is we can automate those repititive task. Specially for people who do not like Excel Macros as it freezes, make error and even sometime shutdown all system while rulling if have some bugs. Pandas let you handle all those excel works you can imagine just with few line of codes.

I recommend to start using Spyder Kernal to start with as it provide a interface to see the dataframe uploaded into system .

To start using pandas, you will need to install it. You can install pandas using pip, the Python package manager, by running the following command in your terminal:

pip install pandas

Once you have installed pandas, you can start using it in your Python scripts by importing it:

The pd alias is a common convention and will be used throughout this tutorial.

Reading data

One of the most basic tasks in pandas is reading data into a DataFrame. A DataFrame is a two-dimensional size-mutable tabular data structure with rows and columns. You can think of it as a spreadsheet or a SQL table.

There are several ways to read data into a DataFrame. Some of the most common ones are:

  • pd.read_csv(): Read a CSV (comma-separated values) file
  • pd.read_excel(): Read an Excel file
  • pd.read_json(): Read a JSON (JavaScript Object Notation) file
  • pd.read_sql(): Read data from a SQL database

Here’s an example of how to read a CSV file into a DataFrame using pd.read_csv():# First, import pandas and load the Excel file

import pandas as pd

# Replace 'path/to/file.xlsx' with the path to your Excel file
df = pd.read_excel('path/to/file.xlsx')

# You can view the first few rows of the DataFrame using the head() method

# You can select specific columns by name using the square bracket [] operator

# You can also select multiple columns by passing a list of column names
print(df[['column_1', 'column_2']])

# You can filter the DataFrame using a boolean mask
mask = df['column_name'] > value
filtered_df = df[mask]

# You can also use the query() method to filter the DataFrame using a string expression
filtered_df = df.query('column_name > value')

# You can group the data by one or more columns and apply a function to each group
grouped_df = df.groupby('column_name')['column_name_2'].mean()

# You can apply multiple functions to a group by using the agg() method
grouped_df = df.groupby('column_name').agg(['mean', 'std'])

This will read the data.csv file into a DataFrame and store it in the df variable. The read_csv() function has many optional parameters that allow you to customize the way the data is read. For example, you can use the sep parameter to specify a different delimiter, or the header parameter to specify which row should be used as the column names.

Exploring data

Once you have your data in a DataFrame, you can start exploring it and performing various operations on it. Here are a few common tasks:

  • df.head(): Display the first few rows of the DataFrame
  • df.tail(): Display the last few rows of the DataFrame
  • Display information about the DataFrame, such as the data types of each column and the number of non-missing values
  • df.describe(): Generate descriptive statistics for the numerical columns
  • df.columns: Display the column names of the DataFrame
  • df.dtypes: Display the data types of the columns

You can also use the [] operator to select specific columns or rows. For example, to select the 'name' and 'age' columns, you can do:

To select a single row, you can use the .loc[] or .iloc[] attributes, which allow you to select rows by label or integer index, respectively. For example:# First, import pandas and read the data into a DataFrame

import pandas as pd
df = pd.read_csv('path/to/data.csv')

# View the first few rows of the DataFrame

# Get information about the DataFrame, including the data types and number of rows and columns

# Get statistical summary of the data

# Select a specific column

# Select multiple columns
print(df[['column_1', 'column_2']])

# Select rows that meet a certain condition
mask = df['column_name'] > value
filtered_df = df[mask]

# You can also use the query() method to filter the DataFrame using a string expression

Well just try this first and get familier with your data, In Upcoming Blogs we will discuss how to handle some complex excel functions using Pandas!


#Artificial Intelligence

#Data Visualization

#Towards Data Science

#Data Analysis

Leave a Reply