parent
19c54ab424
commit
925dd9294c
1 changed files with 306 additions and 0 deletions
@ -0,0 +1,306 @@ |
||||
--- |
||||
title: Use of ORDER BY |
||||
description: Learn how ORDER BY changes window function behavior |
||||
order: 111 |
||||
type: lesson-challenge |
||||
setup: | |
||||
```sql |
||||
CREATE TABLE sale ( |
||||
id INTEGER PRIMARY KEY, |
||||
book_title VARCHAR(100), |
||||
category VARCHAR(50), |
||||
sale_date DATE, |
||||
price DECIMAL(10, 2), |
||||
customer_rating INTEGER |
||||
); |
||||
|
||||
INSERT INTO sale (id, book_title, category, sale_date, price, customer_rating) |
||||
VALUES |
||||
(1, 'The Great Gatsby', 'Fiction', '2024-01-15', 24.99, 4), |
||||
(2, 'SQL Basics', 'Technical', '2024-01-15', 39.99, 5), |
||||
(3, '1984', 'Fiction', '2024-01-16', 19.99, 5), |
||||
(4, 'Python Programming', 'Technical', '2024-01-16', 44.99, 4), |
||||
(5, 'Pride and Prejudice', 'Fiction', '2024-01-16', 14.99, 3), |
||||
(6, 'Data Science', 'Technical', '2024-01-17', 49.99, 5), |
||||
(7, 'The Hobbit', 'Fiction', '2024-01-17', 29.99, 4), |
||||
(8, 'Web Development', 'Technical', '2024-01-17', 34.99, 3); |
||||
``` |
||||
--- |
||||
|
||||
In our previous lesson, we learned about `OVER` and `PARTITION BY`. Now let's learn about `ORDER BY` and how it works specifically within window functions. |
||||
|
||||
## ORDER BY in Window Functions |
||||
|
||||
When you add `ORDER BY` to a window function's `OVER` clause, you fundamentally change how that function processes data. Without `ORDER BY`, a window function processes **all rows in the partition simultaneously**. With `ORDER BY`, the function processes rows one at a time in the specified order, building up its result as it goes. |
||||
|
||||
Let's look at an example to understand this better. Let's say we have the following `sale` table: |
||||
|
||||
| id | book_title | category | sale_date | price | customer_rating | |
||||
| --- | ------------------- | --------- | ---------- | ----- | --------------- | |
||||
| 1 | The Great Gatsby | Fiction | 2024-01-15 | 24.99 | 4 | |
||||
| 2 | SQL Basics | Technical | 2024-01-15 | 39.99 | 5 | |
||||
| 3 | 1984 | Fiction | 2024-01-16 | 19.99 | 5 | |
||||
| 4 | Python Programming | Technical | 2024-01-16 | 44.99 | 4 | |
||||
| 5 | Pride and Prejudice | Fiction | 2024-01-16 | 14.99 | 3 | |
||||
| 6 | Data Science | Technical | 2024-01-17 | 49.99 | 5 | |
||||
| 7 | The Hobbit | Fiction | 2024-01-17 | 29.99 | 4 | |
||||
| 8 | Web Development | Technical | 2024-01-17 | 34.99 | 3 | |
||||
|
||||
Let's run the following two queries; one without `ORDER BY` and one with `ORDER BY` and see the difference. |
||||
|
||||
```sql |
||||
-- Without ORDER BY - processes all rows at once |
||||
SELECT |
||||
book_title, |
||||
price, |
||||
SUM(price) OVER() as total_price |
||||
FROM sale; |
||||
|
||||
-- With ORDER BY - processes rows sequentially |
||||
SELECT |
||||
book_title, |
||||
price, |
||||
SUM(price) OVER(ORDER BY price) as running_total |
||||
FROM sale; |
||||
``` |
||||
|
||||
The first query (without `ORDER BY`) returns: |
||||
|
||||
| book_title | price | total_price | |
||||
| ------------------- | ----- | ----------- | |
||||
| The Great Gatsby | 24.99 | 259.92 | |
||||
| SQL Basics | 39.99 | 259.92 | |
||||
| 1984 | 19.99 | 259.92 | |
||||
| Python Programming | 44.99 | 259.92 | |
||||
| Pride and Prejudice | 14.99 | 259.92 | |
||||
| Data Science | 49.99 | 259.92 | |
||||
| The Hobbit | 29.99 | 259.92 | |
||||
| Web Development | 34.99 | 259.92 | |
||||
|
||||
In this result, as you can see, every row shows the same `total_price` because the `SUM` function without `ORDER BY` considers all rows at once. |
||||
|
||||
The second query (with `ORDER BY`) returns: |
||||
|
||||
| book_title | price | running_total | |
||||
| :------------------ | ----: | ------------: | |
||||
| Pride and Prejudice | 14.99 | 14.99 | |
||||
| 1984 | 19.99 | 34.98 | |
||||
| The Great Gatsby | 24.99 | 59.97 | |
||||
| The Hobbit | 29.99 | 89.96 | |
||||
| Web Development | 34.99 | 124.95 | |
||||
| SQL Basics | 39.99 | 164.94 | |
||||
| Python Programming | 44.99 | 209.93 | |
||||
| Data Science | 49.99 | 259.92 | |
||||
|
||||
Notice how the `running_total` is different in each row. This is because the inclusion of `ORDER BY` changed the way the `SUM` function works. It now processes rows sequentially, adding up the `price` of each row as it goes. Let me explain how this works: |
||||
|
||||
1. Since we didn't specify `PARTITION BY`, all rows are in a single partition (the entire table) |
||||
2. Within this partition, the rows are sorted by price (lowest to highest) due to our `ORDER BY price` clause. This also ensures that the the window function i.e. `SUM` in this case, will process the rows sequentially i.e. from the first row to the current row. |
||||
3. For each row, the `SUM` function only considers the current row and all previous rows in the ordered sequence |
||||
4. This creates a cumulative sum (running total) that builds up as we move through the ordered rows |
||||
|
||||
Let's see how the running total builds up for the first few rows. The sorted partition to be processed by the window function `OVER(ORDER BY price)` is: |
||||
|
||||
| book_title | price | |
||||
| :------------------ | ----: | |
||||
| Pride and Prejudice | 14.99 | |
||||
| 1984 | 19.99 | |
||||
| The Great Gatsby | 24.99 | |
||||
| The Hobbit | 29.99 | |
||||
| Web Development | 34.99 | |
||||
| SQL Basics | 39.99 | |
||||
| Python Programming | 44.99 | |
||||
| Data Science | 49.99 | |
||||
|
||||
When the window function processes the first row, it only considers the first row: |
||||
|
||||
| book_title | price | running_total | calculation | |
||||
| :------------------ | ----: | ------------: | :---------- | |
||||
| Pride and Prejudice | 14.99 | 14.99 | 14.99 | |
||||
|
||||
When the window function processes the second row, it considers the first two rows: |
||||
|
||||
| book_title | price | running_total | calculation | |
||||
| :------------------ | ----: | ------------: | :------------ | |
||||
| Pride and Prejudice | 14.99 | 14.99 | 14.99 | |
||||
| 1984 | 19.99 | 34.98 | 14.99 + 19.99 | |
||||
|
||||
When the window function processes the third row, it considers the first three rows: |
||||
|
||||
| book_title | price | running_total | calculation | |
||||
| :------------------ | ----: | ------------: | :-------------------- | |
||||
| Pride and Prejudice | 14.99 | 14.99 | 14.99 | |
||||
| 1984 | 19.99 | 34.98 | 14.99 + 19.99 | |
||||
| The Great Gatsby | 24.99 | 59.97 | 14.99 + 19.99 + 24.99 | |
||||
|
||||
This process continues for each subsequent row, always adding the current price to the previous running total until all rows have been processed. |
||||
|
||||
This behavior of `ORDER BY` in window functions is particularly useful when you may want to process rows in a specific order, such as calculating running totals or processing rows in a specific order. |
||||
|
||||
## FIRST_VALUE() and LAST_VALUE() |
||||
|
||||
`FIRST_VALUE()` and `LAST_VALUE()` are userful when you may want to get the first or last value in a partition. `ORDER BY` is quite useful when using these functions. |
||||
|
||||
Let's understand how these functions work by taking an example. |
||||
|
||||
### Example: Cheapest book we Sold |
||||
|
||||
Let's say we want to get the cheapest book we sold. In order to do this, we can create a partition of all the books (i.e. no `PARTITION BY` clause) and then use the `ORDER BY` clause to sort the rows by price. Here's how we can do it: |
||||
|
||||
```sql |
||||
SELECT |
||||
book_title, |
||||
price, |
||||
FIRST_VALUE(book_title) OVER(ORDER BY price) as cheapest_book |
||||
FROM sale; |
||||
``` |
||||
|
||||
The output will be: |
||||
|
||||
| book_title | price | cheapest_book | |
||||
| ------------------- | ----- | ------------------- | |
||||
| Pride and Prejudice | 14.99 | Pride and Prejudice | |
||||
| 1984 | 19.99 | Pride and Prejudice | |
||||
| The Great Gatsby | 24.99 | Pride and Prejudice | |
||||
| The Hobbit | 29.99 | Pride and Prejudice | |
||||
| Web Development | 34.99 | Pride and Prejudice | |
||||
| SQL Basics | 39.99 | Pride and Prejudice | |
||||
| Python Programming | 44.99 | Pride and Prejudice | |
||||
| Data Science | 49.99 | Pride and Prejudice | |
||||
|
||||
### Example: Most Expensive Book we Sold |
||||
|
||||
To get the most expensive book we sold (i.e. `Data Science`), your first instinct may be to use the opposite of the `FIRST_VALUE()` function i.e. `LAST_VALUE()` function i.e. |
||||
|
||||
```sql |
||||
SELECT |
||||
book_title, |
||||
price, |
||||
LAST_VALUE(book_title) OVER(ORDER BY price) as most_expensive_book |
||||
FROM sale; |
||||
``` |
||||
|
||||
The output from this query however will be: |
||||
|
||||
| book_title | price | most_expensive_book | |
||||
| ------------------- | ----- | ------------------- | |
||||
| Data Science | 49.99 | Data Science | |
||||
| Python Programming | 44.99 | Python Programming | |
||||
| SQL Basics | 39.99 | SQL Basics | |
||||
| Web Development | 34.99 | Web Development | |
||||
| The Hobbit | 29.99 | The Hobbit | |
||||
| The Great Gatsby | 24.99 | The Great Gatsby | |
||||
| 1984 | 19.99 | 1984 | |
||||
| Pride and Prejudice | 14.99 | Pride and Prejudice | |
||||
|
||||
Notice how the `most_expensive_book` is wrong and always same as the current row. This is because, as mentioned earlier, when using `ORDER BY` in a window function, it processes rows sequentially i.e. from the first row to the current row. So the `LAST_VALUE()` function will always return the current row's value. |
||||
|
||||
Alternatively, to get the most expensive book we can use the `ORDER BY price DESC` combined with `FIRST_VALUE()` function. Here's how we can do it: |
||||
|
||||
```sql |
||||
SELECT |
||||
book_title, |
||||
price, |
||||
FIRST_VALUE(book_title) OVER(ORDER BY price DESC) as most_expensive_book |
||||
FROM sale; |
||||
``` |
||||
|
||||
The output from this query will be: |
||||
|
||||
| book_title | price | most_expensive_book | |
||||
| ------------------- | ----- | ------------------- | |
||||
| Data Science | 49.99 | Data Science | |
||||
| Python Programming | 44.99 | Data Science | |
||||
| SQL Basics | 39.99 | Data Science | |
||||
| Web Development | 34.99 | Data Science | |
||||
| The Hobbit | 29.99 | Data Science | |
||||
| The Great Gatsby | 24.99 | Data Science | |
||||
| 1984 | 19.99 | Data Science | |
||||
| Pride and Prejudice | 14.99 | Data Science | |
||||
|
||||
Notice how the `most_expensive_book` is now correctly set to `Data Science`. |
||||
|
||||
> Note: You might notice that these results appear sorted by price even though we didn't include a query-level `ORDER BY` clause. This is because many SQL implementations automatically sort the results to match the window function's `ORDER BY` clause for better readability. However, this behavior isn't guaranteed by the SQL standard - if you need the results in a specific order, you should always include an explicit `ORDER BY` clause at the query level. |
||||
|
||||
### Example: Cheapest + Most Expensive Book |
||||
|
||||
We can get both the cheapest and most expensive books above using a single query. Here's how we can do it: |
||||
|
||||
```sql |
||||
SELECT |
||||
book_title, |
||||
price, |
||||
FIRST_VALUE(book_title) OVER(ORDER BY price ASC) as cheapest_book, |
||||
FIRST_VALUE(book_title) OVER(ORDER BY price DESC) as most_expensive_book |
||||
FROM sale; |
||||
``` |
||||
|
||||
The output from this query will be: |
||||
|
||||
| book_title | price | cheapest_book | most_expensive_book | |
||||
| ------------------- | ----- | ------------------- | ------------------- | |
||||
| Pride and Prejudice | 14.99 | Pride and Prejudice | Data Science | |
||||
| 1984 | 19.99 | Pride and Prejudice | Data Science | |
||||
| The Great Gatsby | 24.99 | Pride and Prejudice | Data Science | |
||||
| The Hobbit | 29.99 | Pride and Prejudice | Data Science | |
||||
| Web Development | 34.99 | Pride and Prejudice | Data Science | |
||||
| SQL Basics | 39.99 | Pride and Prejudice | Data Science | |
||||
| Python Programming | 44.99 | Pride and Prejudice | Data Science | |
||||
| Data Science | 49.99 | Pride and Prejudice | Data Science | |
||||
|
||||
## ORDER BY with PARTITION BY |
||||
|
||||
When combining `ORDER BY` with `PARTITION BY`, we create independent running calculations within each partition. The window function resets its calculations whenever it encounters a new partition. Let's analyze how this works: |
||||
|
||||
```sql |
||||
SELECT |
||||
category, |
||||
book_title, |
||||
price, |
||||
SUM(price) OVER( |
||||
PARTITION BY category |
||||
ORDER BY price |
||||
) as category_running_total |
||||
FROM sale; |
||||
``` |
||||
|
||||
The output from this query will be the books with their category and the running total for each category. |
||||
|
||||
| category | book_title | price | category_running_total | |
||||
| --------- | :------------------ | ----: | ---------------------: | |
||||
| Fiction | Pride and Prejudice | 14.99 | 14.99 | |
||||
| Fiction | 1984 | 19.99 | 34.98 | |
||||
| Fiction | The Great Gatsby | 24.99 | 59.97 | |
||||
| Fiction | The Hobbit | 29.99 | 89.96 | |
||||
| Technical | Web Development | 34.99 | 34.99 | |
||||
| Technical | SQL Basics | 39.99 | 74.98 | |
||||
| Technical | Python Programming | 44.99 | 119.97 | |
||||
| Technical | Data Science | 49.99 | 169.96 | |
||||
|
||||
Let's see how the running total builds up within each partition: |
||||
|
||||
`Fiction` books (first partition) with items ordered by price: |
||||
|
||||
| category | book_title | price | category_running_total | calculation | |
||||
| -------- | :------------------ | ----: | ---------------------: | :------------ | |
||||
| Fiction | Pride and Prejudice | 14.99 | 14.99 | 14.99 | |
||||
| Fiction | 1984 | 19.99 | 34.98 | 14.99 + 19.99 | |
||||
| Fiction | The Great Gatsby | 24.99 | 59.97 | 34.98 + 24.99 | |
||||
| Fiction | The Hobbit | 29.99 | 89.96 | 59.97 + 29.99 | |
||||
|
||||
`Technical` books (second partition - notice how the total resets and items ordered by price): |
||||
|
||||
| category | book_title | price | category_running_total | calculation | |
||||
| --------- | :----------------- | ----: | ---------------------: | :------------- | |
||||
| Technical | Web Development | 34.99 | 34.99 | 34.99 | |
||||
| Technical | SQL Basics | 39.99 | 74.98 | 34.99 + 39.99 | |
||||
| Technical | Python Programming | 44.99 | 119.97 | 74.98 + 44.99 | |
||||
| Technical | Data Science | 49.99 | 169.96 | 119.97 + 49.99 | |
||||
|
||||
Key points to notice here are: |
||||
|
||||
- Each category starts its own running total from scratch |
||||
- Within each category, rows are ordered by price |
||||
- The running total only considers rows within the same category |
||||
|
||||
In the next lesson, we'll learn about Ranking functions i.e. `ROW_NUMBER()`, `RANK()`, and `DENSE_RANK()`. |
Loading…
Reference in new issue