How to calculate top 5 max values in Pandas

Aggregation of fields is one of the basic necessity for data analysis and data science. Python’s Pandas module provide easy ways to do aggregation and calculate metrics. Finding Top 5 maximum value for each group can also be achieved while doing the group by. The function that is helpful for finding the Top 5 maximum value is nlargest(). The below article explains with the help of an example How to calculate Top 5 max values by Group in Pandas Python.

John has store sales data available for analysis. There are five columns present in the data, Geography (country of store), Department (Industry category of the store), StoreID (Unique ID of each store), Time Period (Month of sales), Revenue (Total Sales for the month). John is looking forward to calculate Top 5 maximum revenue for each Geography.

Top 5 max values in Pandas

  • Step 1: Firstly, Import all the necessary modules.
import pandas as pd
  • Step 2: Use nlargest() function along with groupby operation. As we are looking forward to group by each Geography, by=”Geography” works as groupby parameter. The Revenue field contains the sales of each Geography. To find the top 5 maximum value, we will be using “Revenue” for value calculation. For the current example, syntax is:
df1.groupby(by="Geography")["Revenue"].nlargest(5)

Example 2: Top 5 max values for each Month / Time Period

  • Here we are looking forward to calculate the top 5 max value across each time period. So, the field in groupby operation will be “Time Period”
df1.groupby(by="Time Period")["Revenue"].nlargest(5)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rakesh Tripathi

Rakesh Tripathi

Consulting Engineer, Software Developer, Infra, Quora