Get the Descriptive Statistics in Pandas DataFrame

To get the descriptive statistics for a specific column in your DataFrame:

df["dataframe_column"].describe()

To get the descriptive statistics for an entire DataFrame:

df.describe(include="all")

Steps

Step 1: Collect the Data

To start, collect the data for your DataFrame.

Here is an example of a dataset:

product	price	year
A	22000	2014
B	27000	2015
C	25000	2016
C	29000	2017
D	35000	2018

Step 2: Create the DataFrame

Next, create the DataFrame based on the data collected:

import pandas as pddata = {
 "product": ["A", "B", "C", "C", "D"],
 "price": [22000, 27000, 25000, 29000, 35000],
 "year": [2014, 2015, 2016, 2017, 2018],
}
df = pd.DataFrame(data)
print(df)

Run the code in Python, and you’ll get the following DataFrame:

 product price year0 A 22000 20141 B 27000 20152 C 25000 20163 C 29000 20174 D 35000 2018

Step 3:Get the Descriptive Statistics

To get the descriptive statistics for the “price” column, which contains numerical data:

df["price"].describe()

The full code:

import pandas as pddata = {
 "product": ["A", "B", "C", "C", "D"],
 "price": [22000, 27000, 25000, 29000, 35000],
 "year": [2014, 2015, 2016, 2017, 2018],
}
df = pd.DataFrame(data)
stats_numeric = df["price"].describe()
print(stats_numeric)

Descriptive Statistics for Categorical Data

To get the descriptive statistics for the “product” column, which contains categorical data:

import pandas as pddata = {
 "product": ["A", "B", "C", "C", "D"],
 "price": [22000, 27000, 25000, 29000, 35000],
 "year": [2014, 2015, 2016, 2017, 2018],
}
df = pd.DataFrame(data)
See Also
How to get summary statistics of a Pandas Dataframe in Python?
stats_categorical = df["product"].describe()
print(stats_categorical)

Here are the results:

count 5unique 4top Cfreq 2Name: product, dtype: object

Get the Descriptive Statistics for the Entire DataFrame

To get the descriptive statistics for the entire DataFrame:

import pandas as pddata = {
 "product": ["A", "B", "C", "C", "D"],
 "price": [22000, 27000, 25000, 29000, 35000],
 "year": [2014, 2015, 2016, 2017, 2018],
}
df = pd.DataFrame(data)
stats = df.describe(include="all")
print(stats)

The result:

 product price yearcount 5 5.000000 5.000000unique 4 NaN NaNtop C NaN NaNfreq 2 NaN NaNmean NaN 27600.000000 2016.000000std NaN 4878.524367 1.581139min NaN 22000.000000 2014.00000025% NaN 25000.000000 2015.00000050% NaN 27000.000000 2016.00000075% NaN 29000.000000 2017.000000max NaN 35000.000000 2018.000000

Breaking Down the Descriptive Statistics

You can further breakdown the descriptive statistics into the following:

Count:

df["dataframe_column"].count()

Mean:

df["dataframe_column"].mean()

Standard deviation:

df["dataframe_column"].std()

Minimum:

df["dataframe_column"].min()

0.25 Quantile:

df["dataframe_column"].quantile(q=0.25)

0.50 Quantile (Median):

df["dataframe_column"].quantile(q=0.50)

0.75 Quantile:

df["dataframe_column"].quantile(q=0.75)

Maximum:

df["dataframe_column"].max()

Putting everything together:

import pandas as pddata = {
 "product": ["A", "B", "C", "C", "D"],
 "price": [22000, 27000, 25000, 29000, 35000],
 "year": [2014, 2015, 2016, 2017, 2018],
}
df = pd.DataFrame(data)
statistics = {
 "count": df["price"].count(),
 "mean": df["price"].mean(),
 "std": df["price"].std(),
 "min": df["price"].min(),
 "quantile_25": df["price"].quantile(q=0.25),
 "quantile_50": df["price"].quantile(q=0.50),
 "quantile_75": df["price"].quantile(q=0.75),
 "max": df["price"].max(),
}
for stat, value in statistics.items():
 print(f"{stat}: {value}")

Once you run the code in Python, you’ll get the following stats:

count: 5mean: 27600.0std: 4878.524367060188min: 22000quantile_25: 25000.0quantile_50: 27000.0quantile_75: 29000.0max: 35000

Get the Descriptive Statistics in Pandas DataFrame – Data to Fish (2024)

FAQs

How to find descriptive statistics in Pandas? ›

describe() function generates descriptive statistics.It is used to view some basic statistical details like mean, median, min, max, percentiles, count of a dataframe, or series of numeric values. All columns of the input will be included in the output.

Read On ›

How do you generate descriptive statistics for all the columns for the data frame df? ›

If we apply . describe() to an entire DataFrame, it returns a brand new DataFrame with rows that correspond to all essential descriptive statistics. By default, it will only include the columns with integer and float dtypes.

Discover More Details ›

How can we get the statistical summary of data in a Pandas DataFrame? ›

The describe() function computes a summary of statistics pertaining to the DataFrame columns. This function gives the mean, std and IQR values.

Which method prints the descriptive summary of a Pandas DataFrame? ›

You can use the pandas DataFrame describe() method.

See Details ›

How do you generate descriptive statistics in Python? ›

Descriptive or summary statistics in python – pandas, can be obtained by using the describe() function. The describe() function gives us the count , mean , standard deviation(std) , minimum , Q1(25%) , median(50%) , Q3(75%) , IQR(Q3 - Q1) and maximum values.

Find Out More ›

How do you display descriptive statistics? ›

There are several ways of presenting descriptive statistics in your paper. These include graphs, central tendency, dispersion and measures of association tables. Graphs: Quantitative data can be graphically represented in histograms, pie charts, scatter plots, line graphs, sociograms and geographic information systems.

Tell Me More ›

Which Pandas data frame function produces descriptive statistics of the columns in the data frame? ›

DataFrame. describe. Generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN values.

Show Me More ›

What is df.describe() in Python? ›

Pandas DataFrame describe() Method

The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column: count - The number of not-empty values.

Explore More ›

How to extract data from DataFrame based on column value? ›

The Solution

Method 1: Using Boolean Indexing. Boolean indexing is a powerful feature of Pandas that allows you to filter a DataFrame based on a condition. ...
Method 2: Using the query Method. The query method is another way to filter a DataFrame based on a condition. ...
Method 3: Using the groupby Method.

Jan 11, 2024

How to analyse data using pandas DataFrame? ›

Step-By-Step Guide To Data Analysis With Pandas In Python

1.) Install the package.
2.) Importing the Pandas package.
3.) Import the dataset with read_csv.
4.) Sort Columns based on specific criteria.
5.) Count the occurrences of variables.
6.) Data Filtering.
7.) Null values (NaN)

Feb 23, 2023

Show Me More ›

How do I find the details of a DataFrame in pandas? ›

Pandas DataFrame info() Method

The info() method prints information about the DataFrame. The information contains the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column (non-null values). Note: the info() method actually prints the info.

Read The Full Story ›

Which command gives the statistical summary of the data in Python? ›

describe() command shows summary statistics. This produces basic summary statistics.

See Details ›

Which of the following methods will you use to get the descriptive statistics of a given dataset? ›

Descriptive statistics are methods used to summarize and describe the main features of a dataset. Examples include measures of central tendency, such as mean, median, and mode, which provide information about the typical value in the dataset.

Get More Info Here ›

How to summarize data in Pandas? ›

Summarizing Data

mean() : Calculates the mean of numerical columns.
median() : Finds the median of numerical columns.
mode() : Determines the mode of each column.
sum() : Computes the sum of numerical columns.
max() : Identifies the maximum value in each numerical column.

More items...

Jan 30, 2024

What is the difference between DF describe and DF info? ›

info() method allows us to learn the shape of object types of our data. The . describe() method gives us summary statistics for numerical columns in our DataFrame.

What is the formula for descriptive statistics? ›

This is calculated by summing all of the data values and dividing by the total number of data items you have. It is normally called the mean or the average. If you have a data consisting of n observations (x1,...,xn) ( x 1 , . . . , x n ) then the mean (¯x) is given by the formula: ¯x=1nn∑i=1 xi.

View Details ›

What is df info() in Python? ›

Pandas DataFrame info() Method

The info() method prints information about the DataFrame. The information contains the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column (non-null values).

Get the Descriptive Statistics in Pandas DataFrame – Data to Fish (2024)

Steps

Step 1: Collect the Data

Step 2: Create the DataFrame

Step 3:Get the Descriptive Statistics

Descriptive Statistics for Categorical Data

Get the Descriptive Statistics for the Entire DataFrame

Breaking Down the Descriptive Statistics

FAQs

How to find descriptive statistics in Pandas? ›

How do I find the details of a DataFrame in pandas? ›