5 Advanced Python Tricks for Data Geeks
Sometimes, when we analyze data, we need a robust solution to save time and/or optimize our code. Here are 5 Python tricks that are helpful and can speed up your analysis.
I have been using Python for data analysis for a while.
While I think I can manage to do a whole data project from beginning to end, I still do not know many of the tools in Python libraries.
Here are 5 of them that I have used or heard about, which I think may be useful for you too!
1. pandas_profiling: Detailed Report for Your Dataset
Gaining an initial understanding of the data we’re working with is crucial.
This is why we often repeat the same steps each time we examine a new dataset.
This is where pandas_profiling
truly excels. With just one line of code, this library generates a comprehensive data summary.
It saves time and offers valuable insights into the data's structure, missing values, and correlations.
#If you need to install the package first !pip install pandas-profiling import pandas as pd import pandas_profiling data = pd.read_csv("data.csv") profile = pandas_profiling.ProfileReport(data) profile.to_notebook_iframe()
This produces a comprehensive, interactive report covering statistics, distributions, and visualizations.
2. zip: Combine Lists Simultaneously
The zip()
function combines elements from multiple lists, allowing for easy iteration over them simultaneously.
It streamlines tasks that require processing multiple lists together.
list1 = ['Data', 'Study', 'Actuarial'] list2 = ['Tricks', 'Tips', 'Journey'] list3 = ['for', '', 'Updates'] list4 = ['Geeks', '', ''] for word1, word2, word3, word4 in zip(list1, list2, list3, list4): print(word1, word2, word3, word4)
And the output looks like this:
Data Tricks for Geeks Study Tips Actuarial Journey Updates
3. NumPy Broadcasting: Instead of Loops
Broadcasting in NumPy enables operations between arrays of different shapes without the need for loops.
It enhances efficiency when performing numerical operations on large arrays.
import numpy as np arr = np.array([1, 2, 3]) print(arr + 10)
Which creates:
[11 12 13]
4. apply(): Optimize DataFrame Updates
The apply()
function in pandas lets you efficiently apply a function to rows or columns.
It eliminates the need for manual loops when transforming large DataFrames.
import pandas as pd df = pd.DataFrame({'a': [10, 20, 30], 'b': [4, 5, 6]}) df['sum'] = df.apply(lambda row: row['a'] + row['b'], axis=1) print(df)
And modified data frame looks like this:
a b sum 0 10 4 14 1 20 5 25 2 30 6 36
5. map(): Faster Data Transformations
The map()
function applies a given function to each element of an iterable, such as a list or tuple.
It streamlines element-wise transformations, making your code cleaner and more efficient than using manual loops.
numbers = [1, 2, 3, 4] cubes = list(map(lambda x: x**3, numbers)) print(cubes)
And the result is:
[1, 8, 27, 64]