07.07.2019       Выпуск 289 (01.07.2019 - 07.07.2019)       Статьи

10 способов ускорить анализ данных наPython


Экспериментальная функция:

Ниже вы видите текст статьи по ссылке. По нему можно быстро понять ссылка достойна прочтения или нет

Просим обратить внимание, что текст по ссылке и здесь может не совпадать.

1. Profiling the pandas dataframe

Profilingis a process that helps us in understanding our data andPandas Profiling is python package which does exactly that. It is a simple and fast way to perform exploratory data analysis of a Pandas Dataframe. The pandas df.describe()and df.info()functions are normally used as a first step in the EDA process. However, it only gives a very basic overview of the data and doesn’t help much in the case of large data sets. The Pandas Profiling function, on the other hand, extends the pandas DataFrame with df.profile_report() for quick data analysis. It displays a lot of information with a single line of code and that too in an interactive HTML report.

For a given dataset the pandas profiling package computes the following statistics:

Statistics computer by Pandas Profiling package.


pip install pandas-profiling
conda install -c anaconda pandas-profiling


Let’s use the age-old titanic dataset to demonstrate the capabilities of the versatile python profiler.

#importing the necessary packages
import pandas as pd
import pandas_profiling

# Depreciated: pre 2.0.0 version
df = pd.read_csv('titanic/train.csv')

Edit: A week after this article was published, Pandas-Profiling came out with a major upgrade -version 2.0.0. The syntax has changed a bit, in fact, the functionality has been included in the pandas itself and the report has become more comprehensive. Below is the latest usage syntax:


To display the report in a Jupyter notebook, run:

#Pandas-Profiling 2.0.0

This single line of code is all that you need to display the data profiling report in a Jupyter notebook. The report is pretty detailed including charts wherever necessary.

The report can also be exported into an interactive HTML file with the following code.

profile = df.profile_report(title='Pandas Profiling Report')
profile.to_file(outputfile="Titanic data profiling.html")

Refer the documentation for more details and examples.

Разместим вашу рекламу

Пиши: mail@pythondigest.ru

Нашли опечатку?

Выделите фрагмент и отправьте нажатием Ctrl+Enter.

Система Orphus