Get the Descriptive Statistics in Pandas DataFrame – Data to Fish (2024)

To get the descriptive statistics for a specific column in your DataFrame:

df["dataframe_column"].describe()

To get the descriptive statistics for an entire DataFrame:

df.describe(include="all")

Steps

Step 1: Collect the Data

To start, collect the data for your DataFrame.

Here is an example of a dataset:

productpriceyear
A220002014
B270002015
C250002016
C290002017
D350002018

Step 2: Create the DataFrame

Next, create the DataFrame based on the data collected:

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

print(df)

Run the code in Python, and you’ll get the following DataFrame:

 product price year0 A 22000 20141 B 27000 20152 C 25000 20163 C 29000 20174 D 35000 2018

Step 3:Get the Descriptive Statistics

To get the descriptive statistics for the “price” column, which contains numerical data:

df["price"].describe()

The full code:

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

stats_numeric = df["price"].describe()

print(stats_numeric)

The resulted descriptive statistics for the “price” column:

count 5.000000mean 27600.000000std 4878.524367min 22000.00000025% 25000.00000050% 27000.00000075% 29000.000000max 35000.000000Name: price, dtype: float64

Notice that the output contains 6 decimal places. You can convert the values to integers using astype(int):

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

stats_numeric = df["price"].describe().astype(int)

print(stats_numeric)

Run the code, and you’ll get only integers:

count 5mean 27600std 4878min 2200025% 2500050% 2700075% 29000max 35000Name: price, dtype: int32

Descriptive Statistics for Categorical Data

To get the descriptive statistics for the “product” column, which contains categorical data:

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

stats_categorical = df["product"].describe()

print(stats_categorical)

Here are the results:

count 5unique 4top Cfreq 2Name: product, dtype: object

Get the Descriptive Statistics for the Entire DataFrame

To get the descriptive statistics for the entire DataFrame:

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

stats = df.describe(include="all")

print(stats)

The result:

 product price yearcount 5 5.000000 5.000000unique 4 NaN NaNtop C NaN NaNfreq 2 NaN NaNmean NaN 27600.000000 2016.000000std NaN 4878.524367 1.581139min NaN 22000.000000 2014.00000025% NaN 25000.000000 2015.00000050% NaN 27000.000000 2016.00000075% NaN 29000.000000 2017.000000max NaN 35000.000000 2018.000000

Breaking Down the Descriptive Statistics

You can further breakdown the descriptive statistics into the following:

Count:

df["dataframe_column"].count()

Mean:

df["dataframe_column"].mean()

Standard deviation:

df["dataframe_column"].std()

Minimum:

df["dataframe_column"].min()

0.25 Quantile:

df["dataframe_column"].quantile(q=0.25)

0.50 Quantile (Median):

df["dataframe_column"].quantile(q=0.50)

0.75 Quantile:

df["dataframe_column"].quantile(q=0.75)

Maximum:

df["dataframe_column"].max()

Putting everything together:

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

statistics = {
"count": df["price"].count(),
"mean": df["price"].mean(),
"std": df["price"].std(),
"min": df["price"].min(),
"quantile_25": df["price"].quantile(q=0.25),
"quantile_50": df["price"].quantile(q=0.50),
"quantile_75": df["price"].quantile(q=0.75),
"max": df["price"].max(),
}

for stat, value in statistics.items():
print(f"{stat}: {value}")

Once you run the code in Python, you’ll get the following stats:

count: 5mean: 27600.0std: 4878.524367060188min: 22000quantile_25: 25000.0quantile_50: 27000.0quantile_75: 29000.0max: 35000
Get the Descriptive Statistics in Pandas DataFrame – Data to Fish (2024)

FAQs

How to find descriptive statistics in Pandas? ›

describe() function generates descriptive statistics.It is used to view some basic statistical details like mean, median, min, max, percentiles, count of a dataframe, or series of numeric values. All columns of the input will be included in the output.

How do you generate descriptive statistics for all the columns for the data frame df? ›

If we apply . describe() to an entire DataFrame, it returns a brand new DataFrame with rows that correspond to all essential descriptive statistics. By default, it will only include the columns with integer and float dtypes.

How can we get the statistical summary of data in a Pandas DataFrame? ›

The describe() function computes a summary of statistics pertaining to the DataFrame columns. This function gives the mean, std and IQR values.

Which method prints the descriptive summary of a Pandas DataFrame? ›

You can use the pandas DataFrame describe() method.

How do you generate descriptive statistics in Python? ›

Descriptive or summary statistics in python – pandas, can be obtained by using the describe() function. The describe() function gives us the count , mean , standard deviation(std) , minimum , Q1(25%) , median(50%) , Q3(75%) , IQR(Q3 - Q1) and maximum values.

How do you display descriptive statistics? ›

There are several ways of presenting descriptive statistics in your paper. These include graphs, central tendency, dispersion and measures of association tables. Graphs: Quantitative data can be graphically represented in histograms, pie charts, scatter plots, line graphs, sociograms and geographic information systems.

Which Pandas data frame function produces descriptive statistics of the columns in the data frame? ›

DataFrame. describe. Generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN values.

What is df.describe() in Python? ›

Pandas DataFrame describe() Method

The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column: count - The number of not-empty values.

How to extract data from DataFrame based on column value? ›

The Solution
  1. Method 1: Using Boolean Indexing. Boolean indexing is a powerful feature of Pandas that allows you to filter a DataFrame based on a condition. ...
  2. Method 2: Using the query Method. The query method is another way to filter a DataFrame based on a condition. ...
  3. Method 3: Using the groupby Method.
Jan 11, 2024

How to analyse data using pandas DataFrame? ›

Step-By-Step Guide To Data Analysis With Pandas In Python
  1. 1.) Install the package.
  2. 2.) Importing the Pandas package.
  3. 3.) Import the dataset with read_csv.
  4. 4.) Sort Columns based on specific criteria.
  5. 5.) Count the occurrences of variables.
  6. 6.) Data Filtering.
  7. 7.) Null values (NaN)
Feb 23, 2023

How do I find the details of a DataFrame in pandas? ›

Pandas DataFrame info() Method

The info() method prints information about the DataFrame. The information contains the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column (non-null values). Note: the info() method actually prints the info.

Which command gives the statistical summary of the data in Python? ›

describe() command shows summary statistics. This produces basic summary statistics.

Which of the following methods will you use to get the descriptive statistics of a given dataset? ›

Descriptive statistics are methods used to summarize and describe the main features of a dataset. Examples include measures of central tendency, such as mean, median, and mode, which provide information about the typical value in the dataset.

How to summarize data in Pandas? ›

Summarizing Data
  1. mean() : Calculates the mean of numerical columns.
  2. median() : Finds the median of numerical columns.
  3. mode() : Determines the mode of each column.
  4. sum() : Computes the sum of numerical columns.
  5. max() : Identifies the maximum value in each numerical column.
Jan 30, 2024

What is the difference between DF describe and DF info? ›

info() method allows us to learn the shape of object types of our data. The . describe() method gives us summary statistics for numerical columns in our DataFrame.

What is the formula for descriptive statistics? ›

This is calculated by summing all of the data values and dividing by the total number of data items you have. It is normally called the mean or the average. If you have a data consisting of n observations (x1,...,xn) ( x 1 , . . . , x n ) then the mean (¯x) is given by the formula: ¯x=1nn∑i=1 xi.

What is df info() in Python? ›

Pandas DataFrame info() Method

The info() method prints information about the DataFrame. The information contains the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column (non-null values).

Top Articles
The Three-Month Money Challenge – Alpha Finance
Does gmail delete older emails when you run out of storage?
Www.1Tamilmv.cafe
Blorg Body Pillow
Danatar Gym
Mrh Forum
Voorraad - Foodtrailers
Northern Whooping Crane Festival highlights conservation and collaboration in Fort Smith, N.W.T. | CBC News
Obituary (Binghamton Press & Sun-Bulletin): Tully Area Historical Society
Tanger Outlets Sevierville Directory Map
Osrs But Damage
What’s the Difference Between Cash Flow and Profit?
Jscc Jweb
Audrey Boustani Age
Amelia Bissoon Wedding
Gas Station Drive Thru Car Wash Near Me
House Party 2023 Showtimes Near Marcus North Shore Cinema
Amc Flight Schedule
Carolina Aguilar Facebook
Xxn Abbreviation List 2023
Sound Of Freedom Showtimes Near Cinelux Almaden Cafe & Lounge
Best Uf Sororities
How pharmacies can help
Jang Urdu Today
ELT Concourse Delta: preparing for Module Two
Free Personals Like Craigslist Nh
Walgreens On Bingle And Long Point
Unity Webgl Car Tag
What we lost when Craigslist shut down its personals section
Penn State Service Management
Myaci Benefits Albertsons
Craigslist Cars And Trucks Mcallen
Was heißt AMK? » Bedeutung und Herkunft des Ausdrucks
How to Play the G Chord on Guitar: A Comprehensive Guide - Breakthrough Guitar | Online Guitar Lessons
Helloid Worthington Login
Best Weapons For Psyker Darktide
Waffle House Gift Card Cvs
Quake Awakening Fragments
Gary Lezak Annual Salary
Frommer's Philadelphia & the Amish Country (2007) (Frommer's Complete) - PDF Free Download
Tedit Calamity
Discover Things To Do In Lubbock
Energy Management and Control System Expert (f/m/d) for Battery Storage Systems | StudySmarter - Talents
Umd Men's Basketball Duluth
Grizzly Expiration Date Chart 2023
Martha's Vineyard – Travel guide at Wikivoyage
Watch Chainsaw Man English Sub/Dub online Free on HiAnime.to
Embry Riddle Prescott Academic Calendar
10 Types of Funeral Services, Ceremonies, and Events » US Urns Online
26 Best & Fun Things to Do in Saginaw (MI)
3367164101
Comenity/Banter
Latest Posts
Article information

Author: Kareem Mueller DO

Last Updated:

Views: 6534

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Kareem Mueller DO

Birthday: 1997-01-04

Address: Apt. 156 12935 Runolfsdottir Mission, Greenfort, MN 74384-6749

Phone: +16704982844747

Job: Corporate Administration Planner

Hobby: Mountain biking, Jewelry making, Stone skipping, Lacemaking, Knife making, Scrapbooking, Letterboxing

Introduction: My name is Kareem Mueller DO, I am a vivacious, super, thoughtful, excited, handsome, beautiful, combative person who loves writing and wants to share my knowledge and understanding with you.