Postgres rank() window function
Use rank() to assign ranks to rows within a result set
The rank()
window function computes a ranking for each row within a partition of the result set. The rank is determined by the order of rows specified in the ORDER BY
clause of the OVER
clause. Rows with equal values for the ranking criteria receive the same rank, with the next rank(s) skipped.
This function is useful in scenarios such as finding the top N rows per group, calculating percentiles, or generating leaderboards.
Function signature
The rank()
function has the following form:
The OVER
clause defines the window frame for the function.
- The
ORDER BY
clause specifies the order in which ranks are assigned to rows. - The
PARTITION BY
clause is optional - if specified, it divides the result set into partitions and ranks are assigned within each partition. Otherwise, ranks are computed for each row over the entire result set.
Example usage
Consider an employees
table with columns for employee ID, name, department, and salary. We can use rank()
to rank employees within each department by their salary.
This query ranks employees within each department based on their salary in descending order. Employees with the same salary within a department receive the same rank.
Advanced examples
Top N per group
You can use rank()
in a subquery to find the top N rows per group.
This query finds the top 2 most expensive products in each category. The subquery ranks products within each category by price, and the outer query filters for rows with a rank less than or equal to 2.
Percentile calculation
You can calculate percentiles using the rank()
function with some arithmetic.
This query calculates the percentile rank for each student based on their score. The percentile is calculated by dividing the rank of each row by the total number of rows and multiplying by 100.
Alternative functions
dense_rank
The dense_rank()
function is similar to rank()
, but it does not skip ranks when there are ties. If multiple rows have the same rank, the next rank will be the next consecutive integer.
This query demonstrates the difference between rank()
and dense_rank()
. While rank()
skips rank 3 due to the tie at rank 2, dense_rank()
assigns consecutive ranks.
row_number
The row_number()
function assigns a unique, sequential integer to each row within the partition of a result set. Unlike rank()
and dense_rank()
, it does not handle ties.
This query assigns a unique row number to each sale within a date, ordered by the sale amount descending. Even though there are ties for the date 2023-01-02
, each row receives a distinct row number.
Additional considerations
Handling ties
The rank()
and dense_rank()
functions handle ties differently. rank()
assigns the same rank to tied rows and skips the next rank(s), while dense_rank()
assigns the same rank to tied rows but does not skip ranks. Choose the appropriate function based on your requirements.
Performance implications
Like other window functions, rank()
performs calculations across a set of rows defined by the OVER
clause. This can be computationally expensive, especially for large datasets or complex window definitions.
To optimize performance:
- Include an
ORDER BY
clause in theOVER
clause to avoid sorting the entire dataset. - Use partitioning (
PARTITION BY
) to divide the data into smaller chunks when possible. - Create appropriate indexes on the columns used in the
OVER
clause.