In our previous lessons, we learned about the `OVER` clause and `PARTITION BY`. Now, let's explore ranking functions, which are special window functions that assign ranks to rows based on specified ordering.
In our previous lessons, we learned about the `OVER` clause, `PARTITION BY`, and `ORDER BY`. Now, let's explore ranking functions, which are special window functions that assign ranks to rows based on specified ordering.
The three main ranking functions are `ROW_NUMBER()`, `RANK()`, and `DENSE_RANK()`. Let's explore each one using our bookstore data.
SQL provides three main ranking functions:
## ROW_NUMBER()
`ROW_NUMBER()` assigns a unique sequential number to each row within a partition. Let's look at a simple query without any `PARTITION BY` to see how it works:
Notice how `row_number` starts at 1 and increments by 1 for each row. Let's add a `PARTITION BY` clause to see how it works within a partition.
### Ranking within a Partition
Let's use `PARTITION BY` to assign sequential numbers to each book sold on a given date. We will use the `sale_date` column to partition the data. Our query will look like this:
| Pride and Prejudice | Fiction | 2024-01-16 | 2 |
| Data Science | Technical | 2024-01-16 | 3 |
| Web Development | Technical | 2024-01-17 | 1 |
| The Hobbit | Fiction | 2024-01-17 | 2 |
| SQL Basics | Technical | 2024-01-17 | 3 |
| The Great Gatsby | Fiction | 2024-01-17 | 4 |
Notice how `order_counter` restarts at 1 for each `sale_date`.
## ROW_NUMBER()
### Ranking and Ordering
`ROW_NUMBER()` assigns a unique sequential number to each row within a partition. This is useful for pagination, finding the first/last occurrence of something, or getting unique sequential numbers.
We can also use `ORDER BY` inside the `ROW_NUMBER()` function to order the rows before assigning numbers. Let's rank the books by their revenue:
Let's look at a simple example ranking books by revenue:
```sql
SELECT
book_title,
category,
revenue,
ROW_NUMBER() OVER(ORDER BY revenue DESC) as revenue_rank
Notice how it sorted the rows by revenue and assigned ranks.
Looking at the results, we can see:
#### Ranking, Partitioning, and Ordering
- Books are ordered by revenue (highest to lowest)
- Each row gets a unique number
- Even though some books have the same revenue (like SQL Basics), they get different numbers (2 and 3)
We can also combine `PARTITION BY` and `ORDER BY` to rank within partitions and order the rows before assigning numbers. Let's rank the books by their revenue for each date:
### ROW_NUMBER() with PARTITION BY
We can combine `ROW_NUMBER()` with `PARTITION BY` to number rows within categories:
Notice how the `revenue_rank` restarts at 1 for each `sale_date` and also higher the revenue in the same day, higher the rank. We can also apply sorting on the final result.
Looking at the results, we can see:
```sql
SELECT
book_title,
sale_date,
revenue,
ROW_NUMBER() OVER(
PARTITION BY sale_date
ORDER BY revenue DESC
) as revenue_rank
FROM
sale
ORDER BY
sale_date ASC,
revenue_rank DESC;
```
The output will now be sorted by `sale_date` and `revenue_rank`.
- Within each category, books are ordered by revenue
- Each book gets a unique number within its category
## RANK()
`RANK()` is similar to `ROW_NUMBER()`, but it handles ties (i.e. two or more rows with the same value) differently. When values are equal, they get the same rank, and the next rank skips numbers to account for the tie.
`RANK()` is similar to `ROW_NUMBER()`, but handles ties differently. When values are equal, they get the same rank, and the next rank skips numbers to account for the tie.
Let's rank books by copies sold:
```sql
SELECT
book_title,
category,
copies_sold,
RANK() OVER(ORDER BY copies_sold DESC) as sales_rank
- `ROW_NUMBER()` is useful for pagination (getting rows 1-10, 11-20, etc.), finding the first/last occurrence of something, or when you need unique sequential numbers
Notice how:
- `RANK()` is perfect for competition or sports rankings where multiple participants can tie. For example, in a race, if two runners finish in 20.5 seconds, they both get 1st place. The next runner finishing in 20.7 seconds gets 3rd place (not 2nd). This matches how real-world competitions handle ties
- `category_rank` restarts at 1 for each category
- `overall_rank` considers all books regardless of category
- `DENSE_RANK()` is perfect for grading systems or classification tiers. For example, in a class grading system, if three students score 95%, they all get rank 1. If two students score 92%, they get rank 2 (not rank 4). This matches how real-world grading systems handle ties.
In the next lesson, we'll explore window frames and how they affect our calculations.