Add challenges for window functions

feat/course
Kamran Ahmed 1 week ago
parent 84934fb167
commit b6a9b63da7
  1. 96
      src/data/courses/sql-mastery/chapters/window-functions/lessons/author-category-sales.md
  2. 75
      src/data/courses/sql-mastery/chapters/window-functions/lessons/basic-sales-metrics.md
  3. 85
      src/data/courses/sql-mastery/chapters/window-functions/lessons/bestseller-comparison.md
  4. 104
      src/data/courses/sql-mastery/chapters/window-functions/lessons/mom-sales.md
  5. 106
      src/data/courses/sql-mastery/chapters/window-functions/lessons/price-tier-rankings.md
  6. 98
      src/data/courses/sql-mastery/chapters/window-functions/lessons/top-authors.md

@ -0,0 +1,96 @@
---
title: Author Category Sales
description: Practice using window functions with joined tables to analyze sales by category
order: 140
type: challenge
setup: |
```sql
CREATE TABLE author (
id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE book (
id INT PRIMARY KEY,
title VARCHAR(100),
author_id INT,
category VARCHAR(50),
copies_sold INT
);
INSERT INTO author (id, name) VALUES
(1, 'John Smith'),
(2, 'Emma Wilson'),
(3, 'Michael Brown'),
(4, 'Sarah Davis');
INSERT INTO book (id, title, author_id, category, copies_sold) VALUES
(1, 'SQL Basics', 1, 'Programming', 1500),
(2, 'Advanced SQL', 1, 'Programming', 2500),
(3, 'Data Science', 2, 'Data Analysis', 2000),
(4, 'Web Development', 3, 'Programming', 1800),
(5, 'Statistics 101', 2, 'Data Analysis', 1200),
(6, 'Python Basics', 4, 'Programming', 3000),
(7, 'Machine Learning', 2, 'Data Analysis', 2200);
```
---
The bookstore wants to see how each book's sales compare to the average sales in its category. This will help identify which authors are performing above or below average in each category.
Given the following data:
**author table:**
| id | name |
| --- | ------------- |
| 1 | John Smith |
| 2 | Emma Wilson |
| 3 | Michael Brown |
| 4 | Sarah Davis |
**book table:**
| id | title | author_id | category | copies_sold |
| --- | ---------------- | --------- | ------------- | ----------- |
| 1 | SQL Basics | 1 | Programming | 1500 |
| 2 | Advanced SQL | 1 | Programming | 2500 |
| 3 | Data Science | 2 | Data Analysis | 2000 |
| 4 | Web Development | 3 | Programming | 1800 |
| 5 | Statistics 101 | 2 | Data Analysis | 1200 |
| 6 | Python Basics | 4 | Programming | 3000 |
| 7 | Machine Learning | 2 | Data Analysis | 2200 |
Write a query that shows:
- Author name
- Book title
- Category
- Copies sold
- Average copies sold in that category
Order the results by category (ascending) and copies sold (descending).
## Expected Output
| author_name | title | category | copies_sold | category_avg |
| ------------- | ---------------- | ------------- | ----------- | ------------ |
| Emma Wilson | Machine Learning | Data Analysis | 2200 | 1800 |
| Emma Wilson | Data Science | Data Analysis | 2000 | 1800 |
| Emma Wilson | Statistics 101 | Data Analysis | 1200 | 1800 |
| Sarah Davis | Python Basics | Programming | 3000 | 2200 |
| John Smith | Advanced SQL | Programming | 2500 | 2200 |
| Michael Brown | Web Development | Programming | 1800 | 2200 |
| John Smith | SQL Basics | Programming | 1500 | 2200 |
## Solution
```sql
SELECT
a.name as author_name,
b.title,
b.category,
b.copies_sold,
AVG(b.copies_sold) OVER (
PARTITION BY b.category
) as category_avg
FROM book b
INNER JOIN author a ON a.id = b.author_id
ORDER BY b.category, b.copies_sold DESC;
```

@ -0,0 +1,75 @@
---
title: Basic Sales Metrics
description: Practice using basic window functions to analyze book sales
order: 120
type: challenge
setup: |
```sql
CREATE TABLE sale (
id INT PRIMARY KEY,
book_title VARCHAR(100),
category VARCHAR(50),
sale_date DATE,
quantity INT,
price DECIMAL(10,2)
);
INSERT INTO sale (id, book_title, category, sale_date, quantity, price) VALUES
(1, 'SQL Basics', 'Programming', '2024-01-15', 3, 29.99),
(2, 'Advanced SQL', 'Programming', '2024-01-15', 2, 49.99),
(3, 'Data Science', 'Data Analysis', '2024-01-15', 4, 39.99),
(4, 'Web Development', 'Programming', '2024-01-16', 1, 34.99),
(5, 'Statistics 101', 'Data Analysis', '2024-01-16', 2, 24.99),
(6, 'Python Basics', 'Programming', '2024-01-16', 5, 29.99),
(7, 'Machine Learning', 'Data Analysis', '2024-01-16', 2, 44.99);
```
---
The bookstore manager wants to compare each book's price with the average price in its category. This will help identify which books might be overpriced or underpriced within their categories.
Given the following data in table `sale`:
| id | book_title | category | sale_date | quantity | price |
| --- | ---------------- | ------------- | ---------- | -------- | ----- |
| 1 | SQL Basics | Programming | 2024-01-15 | 3 | 29.99 |
| 2 | Advanced SQL | Programming | 2024-01-15 | 2 | 49.99 |
| 3 | Data Science | Data Analysis | 2024-01-15 | 4 | 39.99 |
| 4 | Web Development | Programming | 2024-01-16 | 1 | 34.99 |
| 5 | Statistics 101 | Data Analysis | 2024-01-16 | 2 | 24.99 |
| 6 | Python Basics | Programming | 2024-01-16 | 5 | 29.99 |
| 7 | Machine Learning | Data Analysis | 2024-01-16 | 2 | 44.99 |
Write a query that shows:
- Book title
- Category
- Price
- Average price for the book's category
Order the results by category (ascending) and price (descending).
## Expected Output
| book_title | category | price | category_avg |
| ---------------- | ------------- | ----- | ------------ |
| Machine Learning | Data Analysis | 44.99 | 36.66 |
| Data Science | Data Analysis | 39.99 | 36.66 |
| Statistics 101 | Data Analysis | 24.99 | 36.66 |
| Advanced SQL | Programming | 49.99 | 36.24 |
| Web Development | Programming | 34.99 | 36.24 |
| Python Basics | Programming | 29.99 | 36.24 |
| SQL Basics | Programming | 29.99 | 36.24 |
## Solution
```sql
SELECT
book_title,
category,
price,
ROUND(AVG(price) OVER (
PARTITION BY category
), 2) as category_avg
FROM sale
ORDER BY category, price DESC;
```

@ -0,0 +1,85 @@
---
title: Bestseller Comparison
description: Practice using FIRST_VALUE and NTH_VALUE to analyze bestselling books
order: 130
type: challenge
setup: |
```sql
CREATE TABLE book (
id INT PRIMARY KEY,
title VARCHAR(100),
category VARCHAR(50),
copies_sold INT,
price DECIMAL(10,2)
);
INSERT INTO book (id, title, category, copies_sold, price) VALUES
(1, 'SQL Basics', 'Programming', 1500, 29.99),
(2, 'Advanced SQL', 'Programming', 2500, 49.99),
(3, 'Data Science', 'Data Analysis', 2000, 39.99),
(4, 'Web Development', 'Programming', 1800, 34.99),
(5, 'Statistics 101', 'Data Analysis', 1200, 24.99),
(6, 'Python Basics', 'Programming', 3000, 29.99),
(7, 'Machine Learning', 'Data Analysis', 2200, 44.99);
```
---
The bookstore wants to analyze their bestsellers in each category. For each book, they want to see how it compares to the top seller and second-best seller in its category.
Given the following data in table `book`:
| id | title | category | copies_sold | price |
| --- | ---------------- | ------------- | ----------- | ----- |
| 1 | SQL Basics | Programming | 1500 | 29.99 |
| 2 | Advanced SQL | Programming | 2500 | 49.99 |
| 3 | Data Science | Data Analysis | 2000 | 39.99 |
| 4 | Web Development | Programming | 1800 | 34.99 |
| 5 | Statistics 101 | Data Analysis | 1200 | 24.99 |
| 6 | Python Basics | Programming | 3000 | 29.99 |
| 7 | Machine Learning | Data Analysis | 2200 | 44.99 |
Write a query that shows:
- Book title
- Category
- Copies sold
- Title of the bestselling book in that category
- Title of the second-best selling book in that category
Order the results by category (ascending) and copies sold (descending).
## Expected Output
| title | category | copies_sold | category_bestseller | second_bestseller |
| ---------------- | ------------- | ----------- | ------------------- | ----------------- |
| Machine Learning | Data Analysis | 2200 | Machine Learning | Data Science |
| Data Science | Data Analysis | 2000 | Machine Learning | Data Science |
| Statistics 101 | Data Analysis | 1200 | Machine Learning | Data Science |
| Python Basics | Programming | 3000 | Python Basics | Advanced SQL |
| Advanced SQL | Programming | 2500 | Python Basics | Advanced SQL |
| Web Development | Programming | 1800 | Python Basics | Advanced SQL |
| SQL Basics | Programming | 1500 | Python Basics | Advanced SQL |
## Solution
```sql
SELECT
title,
category,
copies_sold,
FIRST_VALUE(title) OVER (
PARTITION BY category
ORDER BY copies_sold DESC
) as category_bestseller,
NTH_VALUE(title, 2) OVER (
PARTITION BY category
ORDER BY copies_sold DESC
RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) as second_bestseller
FROM book
ORDER BY category, copies_sold DESC;
```
> Note: The `RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING` clause ensures NTH_VALUE has access to all rows in the partition, not just the rows up to the current row.

@ -0,0 +1,104 @@
---
title: Month-over-Month Sales
description: Practice using LAG to compare book sales between months
order: 170
type: challenge
setup: |
```sql
CREATE TABLE sale (
id INT PRIMARY KEY,
title VARCHAR(100),
category VARCHAR(50),
sale_month DATE,
copies_sold INT,
revenue DECIMAL(10,2)
);
INSERT INTO sale (id, title, category, sale_month, copies_sold, revenue) VALUES
(1, 'SQL Basics', 'Programming', '2024-01-01', 150, 4497.00),
(2, 'SQL Basics', 'Programming', '2024-02-01', 180, 5398.20),
(3, 'SQL Basics', 'Programming', '2024-03-01', 160, 4798.40),
(4, 'Data Science', 'Data Analysis', '2024-01-01', 120, 4798.80),
(5, 'Data Science', 'Data Analysis', '2024-02-01', 140, 5598.60),
(6, 'Data Science', 'Data Analysis', '2024-03-01', 130, 5198.70),
(7, 'Python Basics', 'Programming', '2024-01-01', 200, 5998.00),
(8, 'Python Basics', 'Programming', '2024-02-01', 190, 5698.10),
(9, 'Python Basics', 'Programming', '2024-03-01', 220, 6597.80);
```
---
The bookstore manager wants to analyze month-over-month performance for each book. They need to see how sales and revenue change each month compared to the previous month.
Given the following data in table `sale`:
| id | title | category | sale_month | copies_sold | revenue |
| --- | ------------- | ------------- | ---------- | ----------- | ------- |
| 1 | SQL Basics | Programming | 2024-01-01 | 150 | 4497.00 |
| 2 | SQL Basics | Programming | 2024-02-01 | 180 | 5398.20 |
| 3 | SQL Basics | Programming | 2024-03-01 | 160 | 4798.40 |
| 4 | Data Science | Data Analysis | 2024-01-01 | 120 | 4798.80 |
| 5 | Data Science | Data Analysis | 2024-02-01 | 140 | 5598.60 |
| 6 | Data Science | Data Analysis | 2024-03-01 | 130 | 5198.70 |
| 7 | Python Basics | Programming | 2024-01-01 | 200 | 5998.00 |
| 8 | Python Basics | Programming | 2024-02-01 | 190 | 5698.10 |
| 9 | Python Basics | Programming | 2024-03-01 | 220 | 6597.80 |
Write a query that shows:
- Book title
- Sale month
- Current month's copies sold
- Previous month's copies sold
- Sales change (current month minus previous month)
- Sales change percentage (rounded to 1 decimal place)
Order the results by title (ascending) and sale month (ascending).
> **Hint:** Use `LAG()` to access the previous month's sales, and `NULLIF()` to avoid division by zero when calculating percentages. For the `pct_change` column, you can use the following formula:
>
> ```
> (copies_sold - prev_month_copies_sold) / prev_month_copies_sold * 100
> ```
## Expected Output
| title | sale_month | current_sales | prev_sales | sales_change | pct_change |
| ------------- | ---------- | ------------- | ---------- | ------------ | ---------- |
| Data Science | 2024-01-01 | 120 | null | null | null |
| Data Science | 2024-02-01 | 140 | 120 | 20 | 16.7 |
| Data Science | 2024-03-01 | 130 | 140 | -10 | -7.1 |
| Python Basics | 2024-01-01 | 200 | null | null | null |
| Python Basics | 2024-02-01 | 190 | 200 | -10 | -5.0 |
| Python Basics | 2024-03-01 | 220 | 190 | 30 | 15.8 |
| SQL Basics | 2024-01-01 | 150 | null | null | null |
| SQL Basics | 2024-02-01 | 180 | 150 | 30 | 20.0 |
| SQL Basics | 2024-03-01 | 160 | 180 | -20 | -11.1 |
## Solution
```sql
SELECT
title,
sale_month,
copies_sold as current_sales,
LAG(copies_sold) OVER (
PARTITION BY title
ORDER BY sale_month
) as prev_sales,
copies_sold - LAG(copies_sold) OVER (
PARTITION BY title
ORDER BY sale_month
) as sales_change,
ROUND(
((copies_sold - LAG(copies_sold) OVER (
PARTITION BY title
ORDER BY sale_month
)) * 100.0 / NULLIF(LAG(copies_sold) OVER (
PARTITION BY title
ORDER BY sale_month
), 0)),
1
) as pct_change
FROM sale
ORDER BY title, sale_month;
```

@ -0,0 +1,106 @@
---
title: Price Tier Rankings
description: Practice using RANK to analyze book sales across different price tiers
order: 160
type: challenge
setup: |
```sql
CREATE TABLE book (
id INT PRIMARY KEY,
title VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10,2),
copies_sold INT
);
INSERT INTO book (id, title, category, price, copies_sold) VALUES
(1, 'SQL Basics', 'Programming', 29.99, 1500),
(2, 'Advanced SQL', 'Programming', 49.99, 2500),
(3, 'Data Science', 'Data Analysis', 39.99, 2000),
(4, 'Web Development', 'Programming', 34.99, 1800),
(5, 'Statistics 101', 'Data Analysis', 24.99, 1200),
(6, 'Python Basics', 'Programming', 29.99, 3000),
(7, 'Machine Learning', 'Data Analysis', 44.99, 2200),
(8, 'Database Design', 'Programming', 49.99, 1700),
(9, 'R Programming', 'Data Analysis', 34.99, 1600);
```
---
The bookstore wants to analyze how books perform within different price ranges. They want to identify the bestsellers in each price tier:
- Budget: Less than $30
- Standard: $30 to $40
- Premium: Over $40
Given the following data in table `book`:
| id | title | category | price | copies_sold |
| --- | ---------------- | ------------- | ----- | ----------- |
| 1 | SQL Basics | Programming | 29.99 | 1500 |
| 2 | Advanced SQL | Programming | 49.99 | 2500 |
| 3 | Data Science | Data Analysis | 39.99 | 2000 |
| 4 | Web Development | Programming | 34.99 | 1800 |
| 5 | Statistics 101 | Data Analysis | 24.99 | 1200 |
| 6 | Python Basics | Programming | 29.99 | 3000 |
| 7 | Machine Learning | Data Analysis | 44.99 | 2200 |
| 8 | Database Design | Programming | 49.99 | 1700 |
| 9 | R Programming | Data Analysis | 34.99 | 1600 |
Write a query that shows:
- Book title
- Price tier (Budget/Standard/Premium)
- Price
- Copies sold
- Rank within price tier (based on copies sold)
Order the results by price tier (ascending) and rank within tier (ascending).
## Expected Output
| title | price_tier | price | copies_sold | tier_rank |
| ---------------- | ---------- | ----- | ----------- | --------- |
| Python Basics | Budget | 29.99 | 3000 | 1 |
| SQL Basics | Budget | 29.99 | 1500 | 2 |
| Statistics 101 | Budget | 24.99 | 1200 | 3 |
| Advanced SQL | Premium | 49.99 | 2500 | 1 |
| Machine Learning | Premium | 44.99 | 2200 | 2 |
| Database Design | Premium | 49.99 | 1700 | 3 |
| Data Science | Standard | 39.99 | 2000 | 1 |
| Web Development | Standard | 34.99 | 1800 | 2 |
| R Programming | Standard | 34.99 | 1600 | 3 |
> **Hint:** You can use a `CASE` statement inside the `PARTITION BY` clause for partitioning the data into different price tiers i.e.
>
> ```sql
> PARTITION BY CASE
> WHEN price < 30 THEN 'Budget'
> WHEN price <= 40 THEN 'Standard'
> ELSE 'Premium'
> END
> ```
## Solution
```sql
SELECT
title,
CASE
WHEN price < 30 THEN 'Budget'
WHEN price <= 40 THEN 'Standard'
ELSE 'Premium'
END as price_tier,
price,
copies_sold,
RANK() OVER (
PARTITION BY
CASE
WHEN price < 30 THEN 'Budget'
WHEN price <= 40 THEN 'Standard'
ELSE 'Premium'
END
ORDER BY copies_sold DESC
) as tier_rank
FROM book
ORDER BY price_tier, tier_rank;
```

@ -0,0 +1,98 @@
---
title: Top Authors
description: Practice using RANK to find the most successful authors
order: 150
type: challenge
setup: |
```sql
CREATE TABLE author (
id INT PRIMARY KEY,
name VARCHAR(100),
country VARCHAR(50)
);
CREATE TABLE book (
id INT PRIMARY KEY,
title VARCHAR(100),
author_id INT,
copies_sold INT
);
INSERT INTO author (id, name, country) VALUES
(1, 'John Smith', 'USA'),
(2, 'Emma Wilson', 'UK'),
(3, 'Michael Brown', 'Canada'),
(4, 'Sarah Davis', 'USA'),
(5, 'David Miller', 'UK'),
(6, 'Lisa Chen', 'Canada');
INSERT INTO book (id, title, author_id, copies_sold) VALUES
(1, 'SQL Basics', 1, 1500),
(2, 'Advanced SQL', 1, 2500),
(3, 'Data Science', 2, 3000),
(4, 'Web Development', 3, 1800),
(5, 'Python Basics', 4, 2800),
(6, 'Machine Learning', 2, 4000),
(7, 'Database Design', 5, 1200),
(8, 'Cloud Computing', 6, 900),
(9, 'JavaScript Guide', 4, 2200);
```
---
The bookstore wants to identify their top-selling authors. They need a report showing the top 3 authors based on total books sold.
Given the following data:
**author table:**
| id | name | country |
| --- | ------------- | ------- |
| 1 | John Smith | USA |
| 2 | Emma Wilson | UK |
| 3 | Michael Brown | Canada |
| 4 | Sarah Davis | USA |
| 5 | David Miller | UK |
| 6 | Lisa Chen | Canada |
**book table:**
| id | title | author_id | copies_sold |
| -- | ---------------- | --------- | ----------- |
| 1 | SQL Basics | 1 | 1500 |
| 2 | Advanced SQL | 1 | 2500 |
| 3 | Data Science | 2 | 3000 |
| 4 | Web Development | 3 | 1800 |
| 5 | Python Basics | 4 | 2800 |
| 6 | Machine Learning | 2 | 4000 |
| 7 | Database Design | 5 | 1200 |
| 8 | Cloud Computing | 6 | 900 |
| 9 | JavaScript Guide | 4 | 2200 |
Write a query that shows:
- Author name
- Author's country
- Total copies sold
- Author's rank (based on total copies sold)
Only include the top 3 authors, ordered by their rank (ascending).
## Expected Output
| author_name | country | total_copies | author_rank |
| ------------ | ------- | ------------ | ----------- |
| Emma Wilson | UK | 7000 | 1 |
| Sarah Davis | USA | 5000 | 2 |
| John Smith | USA | 4000 | 3 |
## Solution
```sql
SELECT
a.name as author_name,
a.country,
SUM(b.copies_sold) as total_copies,
RANK() OVER (ORDER BY SUM(b.copies_sold) DESC) as author_rank
FROM author a
INNER JOIN book b ON b.author_id = a.id
GROUP BY a.id, a.name, a.country
ORDER BY author_rank
LIMIT 3;
```
Loading…
Cancel
Save