Pandas is a popular Python library used for data manipulation and analysis. It provides powerful tools for working with tabular data, including data cleaning, filtering, merging, and aggregation.
Here are some common data manipulation tasks that can be performed using Pandas:
- Reading data: Pandas can read data from a variety of sources, including CSV files, Excel files, SQL databases, and JSON files.
import pandas as pd
# Read a CSV file
data = pd.read_csv('data.csv')
# Read an Excel file
data = pd.read_excel('data.xlsx')
# Read from a SQL database
import sqlite3
conn = sqlite3.connect('database.db')
data = pd.read_sql_query('SELECT * FROM table', conn)
Filtering data: Pandas can filter data based on certain conditions.
# Filter rows where column 'age' is greater than 30
filtered_data = data[data['age'] > 30]
# Filter rows where column 'gender' is 'male'
filtered_data = data[data['gender'] == 'male']
Sorting data: Pandas can sort data based on one or more columns.
# Sort by column 'age' in ascending order
sorted_data = data.sort_values('age')
# Sort by column 'age' in descending order
sorted_data = data.sort_values('age', ascending=False)
# Sort by multiple columns
sorted_data = data.sort_values(['age', 'income'], ascending=[False, True])
Grouping data: Pandas can group data based on one or more columns and perform aggregation operations on the groups.
# Group by column 'gender' and calculate the mean of column 'income'
grouped_data = data.groupby('gender')['income'].mean()
# Group by multiple columns and calculate the sum of column 'sales'
grouped_data = data.groupby(['gender', 'region'])['sales'].sum()
Merging data: Pandas can merge data from multiple sources based on one or more common columns.
# Merge two dataframes based on a common column 'id'
merged_data = pd.merge(data1, data2, on='id')
# Merge two dataframes based on multiple common columns
merged_data = pd.merge(data1, data2, on=['id', 'date'])
These are just a few examples of what Pandas can do for data manipulation. Pandas provides many more functions and features for working with data, making it a powerful tool for data analysis and manipulation.
