parent
9478904e4b
commit
84206d2c56
1 changed files with 279 additions and 0 deletions
@ -0,0 +1,279 @@ |
||||
--- |
||||
title: Grouping and Filtering |
||||
description: Learn about filtering grouped data with HAVING, DISTINCT aggregation, and conditional aggregation |
||||
order: 130 |
||||
type: lesson-challenge |
||||
setup: | |
||||
```sql |
||||
CREATE TABLE sale ( |
||||
id SERIAL PRIMARY KEY, |
||||
sale_date DATE, |
||||
amount DECIMAL(10, 2), |
||||
category TEXT, |
||||
store_location TEXT |
||||
); |
||||
|
||||
INSERT INTO sale (sale_date, amount, category, store_location) |
||||
VALUES |
||||
('2024-01-15', 29.99, 'Fiction', 'Downtown'), |
||||
('2024-01-15', 19.99, 'Non-Fiction', 'Suburb'), |
||||
('2024-01-16', 24.99, 'Fiction', 'Downtown'), |
||||
('2024-01-16', 39.99, 'Technical', 'Suburb'), |
||||
('2024-01-16', 14.99, 'Fiction', 'Downtown'), |
||||
('2024-01-17', 49.99, 'Technical', 'Downtown'), |
||||
('2024-01-17', 27.99, 'Non-Fiction', 'Airport'), |
||||
('2024-01-17', 22.99, 'Fiction', 'Downtown'), |
||||
('2024-01-18', 17.99, 'Non-Fiction', 'Suburb'), |
||||
('2024-01-18', 34.99, 'Technical', 'Airport'); |
||||
``` |
||||
--- |
||||
|
||||
In our previous lesson, we learned about grouping data using `GROUP BY`. While grouping is powerful on its own, it becomes even more useful when combined with filtering. |
||||
|
||||
In this lesson, we'll learn about filtering grouped data using the `HAVING` clause and explore other advanced grouping patterns. |
||||
|
||||
## The HAVING Clause |
||||
|
||||
The `HAVING` clause allows us to filter groups based on aggregate values. Think of it as a `WHERE` clause for grouped data. |
||||
|
||||
The basic syntax is: |
||||
|
||||
```sql |
||||
SELECT column1, aggregate_function(column2) |
||||
FROM table_name |
||||
GROUP BY column1 |
||||
HAVING condition; |
||||
``` |
||||
|
||||
Let's look at some examples using our bookstore `sale` table: |
||||
|
||||
| id | sale_date | amount | category | store_location | |
||||
| --- | ---------- | ------ | ----------- | -------------- | |
||||
| 1 | 2024-01-15 | 29.99 | Fiction | Downtown | |
||||
| 2 | 2024-01-15 | 19.99 | Non-Fiction | Suburb | |
||||
| 3 | 2024-01-16 | 24.99 | Fiction | Downtown | |
||||
| 4 | 2024-01-16 | 39.99 | Technical | Suburb | |
||||
| 5 | 2024-01-16 | 14.99 | Fiction | Downtown | |
||||
| 6 | 2024-01-17 | 49.99 | Technical | Downtown | |
||||
| 7 | 2024-01-17 | 27.99 | Non-Fiction | Airport | |
||||
| 8 | 2024-01-17 | 22.99 | Fiction | Downtown | |
||||
| 9 | 2024-01-18 | 17.99 | Non-Fiction | Suburb | |
||||
| 10 | 2024-01-18 | 34.99 | Technical | Airport | |
||||
|
||||
### Example 1: Finding Categories with High Sales |
||||
|
||||
Let's find categories where total sales exceed $75. Your initial instinct might to be to write the following query: |
||||
|
||||
```sql |
||||
-- ERROR: WHERE cannot use aggregate functions |
||||
SELECT |
||||
category, |
||||
SUM(amount) as total_sales |
||||
FROM sale |
||||
GROUP BY category |
||||
WHERE SUM(amount) > 75; |
||||
|
||||
-- ERROR: SELECT aliases cannot be used in WHERE |
||||
SELECT |
||||
category, |
||||
SUM(amount) as total_sales |
||||
FROM sale |
||||
GROUP BY category |
||||
WHERE total_sales > 75; |
||||
``` |
||||
|
||||
Both of the above queries will result in an error because as we learned in the previous lesson, `WHERE` filters rows before they are grouped so neither `SUM(amount)` nor `total_sales` are available. |
||||
|
||||
To fix this, we can use the `HAVING` clause which filters groups after they are formed. |
||||
|
||||
```sql |
||||
SELECT |
||||
category, |
||||
SUM(amount) as total_sales |
||||
FROM sale |
||||
GROUP BY category |
||||
HAVING SUM(amount) > 75; |
||||
``` |
||||
|
||||
The output will be only categories with total sales greater than $75: |
||||
|
||||
| category | total_sales | |
||||
| --------- | ----------- | |
||||
| Fiction | 92.96 | |
||||
| Technical | 124.97 | |
||||
|
||||
### Example 2: Finding Busy Store Locations |
||||
|
||||
Let's find store locations with more than 3 sales: |
||||
|
||||
```sql |
||||
SELECT |
||||
store_location, |
||||
COUNT(*) as sale_count |
||||
FROM sale |
||||
GROUP BY store_location |
||||
HAVING COUNT(*) > 3; |
||||
``` |
||||
|
||||
The output shows locations with more than 3 sales: |
||||
|
||||
| store_location | sale_count | |
||||
| -------------- | ---------- | |
||||
| Downtown | 5 | |
||||
|
||||
## WHERE vs HAVING |
||||
|
||||
While both `WHERE` and `HAVING` are used for filtering, they serve different purposes: |
||||
|
||||
- `WHERE` filters individual rows **before** they're grouped |
||||
- `HAVING` filters groups **after** they're formed |
||||
|
||||
Let's see this difference in action: |
||||
|
||||
### Example: Sales Above $25 |
||||
|
||||
Using `WHERE` (filters rows **before** grouping): |
||||
|
||||
```sql |
||||
SELECT |
||||
category, |
||||
COUNT(*) as sale_count, |
||||
SUM(amount) as total_sales |
||||
FROM sale |
||||
WHERE amount > 25 |
||||
GROUP BY category; |
||||
``` |
||||
|
||||
This shows the count and sum of only sales above $25: |
||||
|
||||
| category | sale_count | total_sales | |
||||
| ----------- | ---------- | ----------- | |
||||
| Fiction | 1 | 29.99 | |
||||
| Non-Fiction | 1 | 27.99 | |
||||
| Technical | 3 | 124.97 | |
||||
|
||||
Using `HAVING` (filters groups **after** aggregation). For example, let's find the categories where the average sale amount is above $25: |
||||
|
||||
```sql |
||||
SELECT |
||||
category, |
||||
COUNT(*) as sale_count, |
||||
SUM(amount) as total_sales |
||||
FROM sale |
||||
GROUP BY category |
||||
HAVING AVG(amount) > 25; |
||||
``` |
||||
|
||||
This shows categories where the average sale amount is above $25: |
||||
|
||||
| category | sale_count | total_sales | |
||||
| --------- | ---------- | ----------- | |
||||
| Technical | 3 | 124.97 | |
||||
|
||||
### Using Both WHERE and HAVING |
||||
|
||||
You can use both `WHERE` and `HAVING` in the same query. For example, let's find categories that had more than `2` sales in `Downtown` stores: |
||||
|
||||
```sql |
||||
SELECT |
||||
category, |
||||
COUNT(*) as sale_count, |
||||
SUM(amount) as total_sales |
||||
FROM sale |
||||
WHERE store_location = 'Downtown' |
||||
GROUP BY category |
||||
HAVING COUNT(*) > 2; |
||||
``` |
||||
|
||||
The output shows categories with more than 2 sales in Downtown: |
||||
|
||||
| category | sale_count | total_sales | |
||||
| -------- | ---------- | ----------- | |
||||
| Fiction | 4 | 92.96 | |
||||
|
||||
Let's look at another example to find the top 2 locations by total sales: |
||||
|
||||
```sql |
||||
SELECT |
||||
store_location, |
||||
COUNT(*) as sale_count, |
||||
SUM(amount) as total_sales |
||||
FROM sale |
||||
GROUP BY store_location |
||||
ORDER BY total_sales DESC |
||||
LIMIT 2; |
||||
``` |
||||
|
||||
This shows the top 2 locations by total sales: |
||||
|
||||
| store_location | sale_count | total_sales | |
||||
| -------------- | ---------- | ----------- | |
||||
| Downtown | 5 | 142.95 | |
||||
| Suburb | 3 | 77.97 | |
||||
|
||||
## Important Rules and Gotchas |
||||
|
||||
You should keep the following rules in mind when working with `GROUP BY` and `HAVING`: |
||||
|
||||
### HAVING requires GROUP BY |
||||
|
||||
You can't use `HAVING` without `GROUP BY` (except with window functions). For example, the following query will result in an error: |
||||
|
||||
```sql |
||||
SELECT |
||||
category, |
||||
SUM(amount) as total_sales |
||||
FROM sale |
||||
HAVING SUM(amount) > 75; |
||||
``` |
||||
|
||||
This will result in an error because `HAVING` requires a `GROUP BY` clause. |
||||
|
||||
### NULL in Groups |
||||
|
||||
NULL values form their own group in `GROUP BY`. |
||||
|
||||
### Multiple Conditions |
||||
|
||||
You can combine multiple conditions in `HAVING` with `AND`/`OR`. For example, the query below finds categories with more than 3 sales: |
||||
|
||||
```sql |
||||
SELECT |
||||
category, |
||||
COUNT(*) as sale_count, |
||||
SUM(amount) as total_sales |
||||
FROM sale |
||||
GROUP BY category |
||||
HAVING COUNT(*) >= 3; |
||||
``` |
||||
|
||||
The output will be: |
||||
|
||||
| category | sale_count | total_sales | |
||||
| ----------- | ---------- | ----------- | |
||||
| Fiction | 4 | 92.96 | |
||||
| Non-Fiction | 3 | 65.97 | |
||||
| Technical | 3 | 124.97 | |
||||
|
||||
We can use `AND` to combine multiple conditions in `HAVING`. For example, the query below finds categories with more than 3 sales and total sales above $100: |
||||
|
||||
```sql |
||||
SELECT |
||||
category, |
||||
COUNT(*) as sale_count, |
||||
SUM(amount) as total_sales |
||||
FROM sale |
||||
GROUP BY category |
||||
HAVING COUNT(*) >= 3 |
||||
AND SUM(amount) > 100; |
||||
``` |
||||
|
||||
The output will be: |
||||
|
||||
| category | sale_count | total_sales | |
||||
| --------- | ---------- | ----------- | |
||||
| Technical | 3 | 124.97 | |
||||
|
||||
### Aggregate Functions |
||||
|
||||
`HAVING` can only use the columns in `GROUP BY` and aggregate functions. |
Loading…
Reference in new issue