5 Advanced Python Tricks for Data Geeks

Data Tricks

Apr 14

Sometimes, when we analyze data, we need a robust solution to save time and/or optimize our code. Here are 5 Python tricks that are helpful and can speed up your analysis.

I have been using Python for data analysis for a while.

While I think I can manage to do a whole data project from beginning to end, I still do not know many of the tools in Python libraries.

Here are 5 of them that I have used or heard about, which I think may be useful for you too!

1. pandas_profiling: Detailed Report for Your Dataset

Gaining an initial understanding of the data we’re working with is crucial.

This is why we often repeat the same steps each time we examine a new dataset.

This is where pandas_profiling truly excels. With just one line of code, this library generates a comprehensive data summary.

It saves time and offers valuable insights into the data's structure, missing values, and correlations.

#If you need to install the package first
!pip install pandas-profiling

import pandas as pd
import pandas_profiling

data = pd.read_csv("data.csv")

profile = pandas_profiling.ProfileReport(data)
profile.to_notebook_iframe()

This produces a comprehensive, interactive report covering statistics, distributions, and visualizations.

2. zip: Combine Lists Simultaneously

The zip() function combines elements from multiple lists, allowing for easy iteration over them simultaneously.

It streamlines tasks that require processing multiple lists together.

list1 = ['Data', 'Study', 'Actuarial']
list2 = ['Tricks', 'Tips', 'Journey']
list3 = ['for', '', 'Updates']
list4 = ['Geeks', '', '']

for word1, word2, word3, word4 in zip(list1, list2, list3, list4):
    print(word1, word2, word3, word4)

And the output looks like this:

Data Tricks for Geeks
Study Tips  
Actuarial Journey Updates 

3. NumPy Broadcasting: Instead of Loops

Broadcasting in NumPy enables operations between arrays of different shapes without the need for loops.

It enhances efficiency when performing numerical operations on large arrays.

import numpy as np

arr = np.array([1, 2, 3])
print(arr + 10)

Which creates:

[11 12 13]

4. apply(): Optimize DataFrame Updates

The apply() function in pandas lets you efficiently apply a function to rows or columns.

It eliminates the need for manual loops when transforming large DataFrames.

import pandas as pd

df = pd.DataFrame({'a': [10, 20, 30], 'b': [4, 5, 6]})
df['sum'] = df.apply(lambda row: row['a'] + row['b'], axis=1)
print(df)

And modified data frame looks like this:

5. map(): Faster Data Transformations

The map() function applies a given function to each element of an iterable, such as a list or tuple.

It streamlines element-wise transformations, making your code cleaner and more efficient than using manual loops.

numbers = [1, 2, 3, 4]
cubes = list(map(lambda x: x**3, numbers))
print(cubes)

And the result is:

[1, 8, 27, 64]

#python#datanalysis#datascience#datatricks#data#actuarial#analyst#erykfaracik

Eryk Faracik