Why Use Categorical Data?
Saves memory by storing categories as integer codes instead of strings.
Faster operations like sorting, filtering, and grouping compared to object dtype.
Provides order to categorical values.
Creating Categorical Data
-
Converting an Existing Column
import pandas as pd
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'C', 'B', 'A']
})
df['Category'] = df['Category'].astype('category')
print(df.dtypes)
✅ The Category column is now of type category, reducing memory usage.
-
Creating from Scratch
cat_series = pd.Categorical(['red', 'blue', 'green', 'red', 'blue'])
print(cat_series)
-
Categorical Data with Defined Categories
categories = ['small', 'medium', 'large']
sizes = pd.Categorical(['small', 'large', 'medium', 'small'], categories=categories, ordered=True)
print(sizes)
✅ Using ordered=True allows comparison (small < medium < large).
-
Operations on Categorical Data
-
Accessing Categories & Codes
print(sizes.categories) # ['small', 'medium', 'large']
print(sizes.codes) # [0, 2, 1, 0] -> Internal integer representation
-
Sorting
sorted_sizes = sizes.sort_values()
print(sorted_sizes)
-
Filtering
filtered_sizes = sizes[sizes > 'small'] # Keeps 'medium' and 'large'
print(filtered_sizes)
-
Changing Categories
sizes = sizes.rename_categories(['S', 'M', 'L'])
print(sizes)
✅ Renames 'small' → 'S', 'medium' → 'M', etc.
-
Adding & Removing Categories
sizes = sizes.add_categories(['extra-large'])
sizes = sizes.remove_categories(['small'])
print(sizes)
-
Use Case: Grouping & Aggregation
df = pd.DataFrame({
'Size': pd.Categorical(['small', 'large', 'medium', 'small', 'large'],
categories=['small', 'medium', 'large'], ordered=True),
'Price': [10, 30, 20, 15, 35]
})
grouped = df.groupby('Size').mean()
print(grouped)
✅ Efficient grouping with meaningful category order.
When to Use?
Use categorical data when:
The column contains a fixed number of possible values (e.g., gender, product sizes, regions).
You need ordered categories (e.g., low < medium < high).
Memory efficiency and performance improvements matter.