From 84206d2c56e467fe2bef0e54358318d6793de134 Mon Sep 17 00:00:00 2001 From: Kamran Ahmed Date: Fri, 3 Jan 2025 17:01:34 +0000 Subject: [PATCH] Add grouping and filtering lesson --- .../lessons/grouping-and-filtering.md | 279 ++++++++++++++++++ 1 file changed, 279 insertions(+) create mode 100644 src/data/courses/sql-mastery/chapters/aggregate-functions/lessons/grouping-and-filtering.md diff --git a/src/data/courses/sql-mastery/chapters/aggregate-functions/lessons/grouping-and-filtering.md b/src/data/courses/sql-mastery/chapters/aggregate-functions/lessons/grouping-and-filtering.md new file mode 100644 index 000000000..5e4f57ebf --- /dev/null +++ b/src/data/courses/sql-mastery/chapters/aggregate-functions/lessons/grouping-and-filtering.md @@ -0,0 +1,279 @@ +--- +title: Grouping and Filtering +description: Learn about filtering grouped data with HAVING, DISTINCT aggregation, and conditional aggregation +order: 130 +type: lesson-challenge +setup: | + ```sql + CREATE TABLE sale ( + id SERIAL PRIMARY KEY, + sale_date DATE, + amount DECIMAL(10, 2), + category TEXT, + store_location TEXT + ); + + INSERT INTO sale (sale_date, amount, category, store_location) + VALUES + ('2024-01-15', 29.99, 'Fiction', 'Downtown'), + ('2024-01-15', 19.99, 'Non-Fiction', 'Suburb'), + ('2024-01-16', 24.99, 'Fiction', 'Downtown'), + ('2024-01-16', 39.99, 'Technical', 'Suburb'), + ('2024-01-16', 14.99, 'Fiction', 'Downtown'), + ('2024-01-17', 49.99, 'Technical', 'Downtown'), + ('2024-01-17', 27.99, 'Non-Fiction', 'Airport'), + ('2024-01-17', 22.99, 'Fiction', 'Downtown'), + ('2024-01-18', 17.99, 'Non-Fiction', 'Suburb'), + ('2024-01-18', 34.99, 'Technical', 'Airport'); + ``` +--- + +In our previous lesson, we learned about grouping data using `GROUP BY`. While grouping is powerful on its own, it becomes even more useful when combined with filtering. + +In this lesson, we'll learn about filtering grouped data using the `HAVING` clause and explore other advanced grouping patterns. + +## The HAVING Clause + +The `HAVING` clause allows us to filter groups based on aggregate values. Think of it as a `WHERE` clause for grouped data. + +The basic syntax is: + +```sql +SELECT column1, aggregate_function(column2) +FROM table_name +GROUP BY column1 +HAVING condition; +``` + +Let's look at some examples using our bookstore `sale` table: + +| id | sale_date | amount | category | store_location | +| --- | ---------- | ------ | ----------- | -------------- | +| 1 | 2024-01-15 | 29.99 | Fiction | Downtown | +| 2 | 2024-01-15 | 19.99 | Non-Fiction | Suburb | +| 3 | 2024-01-16 | 24.99 | Fiction | Downtown | +| 4 | 2024-01-16 | 39.99 | Technical | Suburb | +| 5 | 2024-01-16 | 14.99 | Fiction | Downtown | +| 6 | 2024-01-17 | 49.99 | Technical | Downtown | +| 7 | 2024-01-17 | 27.99 | Non-Fiction | Airport | +| 8 | 2024-01-17 | 22.99 | Fiction | Downtown | +| 9 | 2024-01-18 | 17.99 | Non-Fiction | Suburb | +| 10 | 2024-01-18 | 34.99 | Technical | Airport | + +### Example 1: Finding Categories with High Sales + +Let's find categories where total sales exceed $75. Your initial instinct might to be to write the following query: + +```sql +-- ERROR: WHERE cannot use aggregate functions +SELECT + category, + SUM(amount) as total_sales +FROM sale +GROUP BY category +WHERE SUM(amount) > 75; + +-- ERROR: SELECT aliases cannot be used in WHERE +SELECT + category, + SUM(amount) as total_sales +FROM sale +GROUP BY category +WHERE total_sales > 75; +``` + +Both of the above queries will result in an error because as we learned in the previous lesson, `WHERE` filters rows before they are grouped so neither `SUM(amount)` nor `total_sales` are available. + +To fix this, we can use the `HAVING` clause which filters groups after they are formed. + +```sql +SELECT + category, + SUM(amount) as total_sales +FROM sale +GROUP BY category +HAVING SUM(amount) > 75; +``` + +The output will be only categories with total sales greater than $75: + +| category | total_sales | +| --------- | ----------- | +| Fiction | 92.96 | +| Technical | 124.97 | + +### Example 2: Finding Busy Store Locations + +Let's find store locations with more than 3 sales: + +```sql +SELECT + store_location, + COUNT(*) as sale_count +FROM sale +GROUP BY store_location +HAVING COUNT(*) > 3; +``` + +The output shows locations with more than 3 sales: + +| store_location | sale_count | +| -------------- | ---------- | +| Downtown | 5 | + +## WHERE vs HAVING + +While both `WHERE` and `HAVING` are used for filtering, they serve different purposes: + +- `WHERE` filters individual rows **before** they're grouped +- `HAVING` filters groups **after** they're formed + +Let's see this difference in action: + +### Example: Sales Above $25 + +Using `WHERE` (filters rows **before** grouping): + +```sql +SELECT + category, + COUNT(*) as sale_count, + SUM(amount) as total_sales +FROM sale +WHERE amount > 25 +GROUP BY category; +``` + +This shows the count and sum of only sales above $25: + +| category | sale_count | total_sales | +| ----------- | ---------- | ----------- | +| Fiction | 1 | 29.99 | +| Non-Fiction | 1 | 27.99 | +| Technical | 3 | 124.97 | + +Using `HAVING` (filters groups **after** aggregation). For example, let's find the categories where the average sale amount is above $25: + +```sql +SELECT + category, + COUNT(*) as sale_count, + SUM(amount) as total_sales +FROM sale +GROUP BY category +HAVING AVG(amount) > 25; +``` + +This shows categories where the average sale amount is above $25: + +| category | sale_count | total_sales | +| --------- | ---------- | ----------- | +| Technical | 3 | 124.97 | + +### Using Both WHERE and HAVING + +You can use both `WHERE` and `HAVING` in the same query. For example, let's find categories that had more than `2` sales in `Downtown` stores: + +```sql +SELECT + category, + COUNT(*) as sale_count, + SUM(amount) as total_sales +FROM sale +WHERE store_location = 'Downtown' +GROUP BY category +HAVING COUNT(*) > 2; +``` + +The output shows categories with more than 2 sales in Downtown: + +| category | sale_count | total_sales | +| -------- | ---------- | ----------- | +| Fiction | 4 | 92.96 | + +Let's look at another example to find the top 2 locations by total sales: + +```sql +SELECT + store_location, + COUNT(*) as sale_count, + SUM(amount) as total_sales +FROM sale +GROUP BY store_location +ORDER BY total_sales DESC +LIMIT 2; +``` + +This shows the top 2 locations by total sales: + +| store_location | sale_count | total_sales | +| -------------- | ---------- | ----------- | +| Downtown | 5 | 142.95 | +| Suburb | 3 | 77.97 | + +## Important Rules and Gotchas + +You should keep the following rules in mind when working with `GROUP BY` and `HAVING`: + +### HAVING requires GROUP BY + +You can't use `HAVING` without `GROUP BY` (except with window functions). For example, the following query will result in an error: + +```sql +SELECT + category, + SUM(amount) as total_sales +FROM sale +HAVING SUM(amount) > 75; +``` + +This will result in an error because `HAVING` requires a `GROUP BY` clause. + +### NULL in Groups + +NULL values form their own group in `GROUP BY`. + +### Multiple Conditions + +You can combine multiple conditions in `HAVING` with `AND`/`OR`. For example, the query below finds categories with more than 3 sales: + +```sql +SELECT + category, + COUNT(*) as sale_count, + SUM(amount) as total_sales +FROM sale +GROUP BY category +HAVING COUNT(*) >= 3; +``` + +The output will be: + +| category | sale_count | total_sales | +| ----------- | ---------- | ----------- | +| Fiction | 4 | 92.96 | +| Non-Fiction | 3 | 65.97 | +| Technical | 3 | 124.97 | + +We can use `AND` to combine multiple conditions in `HAVING`. For example, the query below finds categories with more than 3 sales and total sales above $100: + +```sql +SELECT + category, + COUNT(*) as sale_count, + SUM(amount) as total_sales +FROM sale +GROUP BY category +HAVING COUNT(*) >= 3 +AND SUM(amount) > 100; +``` + +The output will be: + +| category | sale_count | total_sales | +| --------- | ---------- | ----------- | +| Technical | 3 | 124.97 | + +### Aggregate Functions + +`HAVING` can only use the columns in `GROUP BY` and aggregate functions.