Add challenges for window functions

3 months ago · b6a9b63da7
parent 84934fb167
commit b6a9b63da7
6 changed files with 564 additions and 0 deletions
--- a/src/data/courses/sql-mastery/chapters/window-functions/lessons/author-category-sales.md
+++ b/src/data/courses/sql-mastery/chapters/window-functions/lessons/author-category-sales.md
@ -0,0 +1,96 @@
+---
+title: Author Category Sales
+description: Practice using window functions with joined tables to analyze sales by category
+order: 140
+type: challenge
+setup: |
+  ```sql
+  CREATE TABLE author (
+      id INT PRIMARY KEY,
+      name VARCHAR(100)
+  );
+
+  CREATE TABLE book (
+      id INT PRIMARY KEY,
+      title VARCHAR(100),
+      author_id INT,
+      category VARCHAR(50),
+      copies_sold INT
+  );
+
+  INSERT INTO author (id, name) VALUES
+      (1, 'John Smith'),
+      (2, 'Emma Wilson'),
+      (3, 'Michael Brown'),
+      (4, 'Sarah Davis');
+
+  INSERT INTO book (id, title, author_id, category, copies_sold) VALUES
+      (1, 'SQL Basics', 1, 'Programming', 1500),
+      (2, 'Advanced SQL', 1, 'Programming', 2500),
+      (3, 'Data Science', 2, 'Data Analysis', 2000),
+      (4, 'Web Development', 3, 'Programming', 1800),
+      (5, 'Statistics 101', 2, 'Data Analysis', 1200),
+      (6, 'Python Basics', 4, 'Programming', 3000),
+      (7, 'Machine Learning', 2, 'Data Analysis', 2200);
+  ```
+---
+
+The bookstore wants to see how each book's sales compare to the average sales in its category. This will help identify which authors are performing above or below average in each category.
+
+Given the following data:
+
+**author table:**
+| id  | name          |
+| --- | ------------- |
+| 1   | John Smith    |
+| 2   | Emma Wilson   |
+| 3   | Michael Brown |
+| 4   | Sarah Davis   |
+
+**book table:**
+| id  | title            | author_id | category      | copies_sold |
+| --- | ---------------- | --------- | ------------- | ----------- |
+| 1   | SQL Basics       | 1         | Programming   | 1500        |
+| 2   | Advanced SQL     | 1         | Programming   | 2500        |
+| 3   | Data Science     | 2         | Data Analysis | 2000        |
+| 4   | Web Development  | 3         | Programming   | 1800        |
+| 5   | Statistics 101   | 2         | Data Analysis | 1200        |
+| 6   | Python Basics    | 4         | Programming   | 3000        |
+| 7   | Machine Learning | 2         | Data Analysis | 2200        |
+
+Write a query that shows:
+- Author name
+- Book title
+- Category
+- Copies sold
+- Average copies sold in that category
+
+Order the results by category (ascending) and copies sold (descending).
+
+## Expected Output
+
+| author_name    | title            | category      | copies_sold | category_avg |
+| ------------- | ---------------- | ------------- | ----------- | ------------ |
+| Emma Wilson   | Machine Learning | Data Analysis | 2200        | 1800         |
+| Emma Wilson   | Data Science     | Data Analysis | 2000        | 1800         |
+| Emma Wilson   | Statistics 101   | Data Analysis | 1200        | 1800         |
+| Sarah Davis   | Python Basics    | Programming   | 3000        | 2200         |
+| John Smith    | Advanced SQL     | Programming   | 2500        | 2200         |
+| Michael Brown | Web Development  | Programming   | 1800        | 2200         |
+| John Smith    | SQL Basics       | Programming   | 1500        | 2200         |
+
+## Solution
+
+```sql
+SELECT 
+    a.name as author_name,
+    b.title,
+    b.category,
+    b.copies_sold,
+    AVG(b.copies_sold) OVER (
+        PARTITION BY b.category
+    ) as category_avg
+FROM book b
+INNER JOIN author a ON a.id = b.author_id
+ORDER BY b.category, b.copies_sold DESC;
+```
--- a/src/data/courses/sql-mastery/chapters/window-functions/lessons/basic-sales-metrics.md
+++ b/src/data/courses/sql-mastery/chapters/window-functions/lessons/basic-sales-metrics.md
@ -0,0 +1,75 @@
+---
+title: Basic Sales Metrics
+description: Practice using basic window functions to analyze book sales
+order: 120
+type: challenge
+setup: |
+  ```sql
+  CREATE TABLE sale (
+      id INT PRIMARY KEY,
+      book_title VARCHAR(100),
+      category VARCHAR(50),
+      sale_date DATE,
+      quantity INT,
+      price DECIMAL(10,2)
+  );
+
+  INSERT INTO sale (id, book_title, category, sale_date, quantity, price) VALUES
+      (1, 'SQL Basics', 'Programming', '2024-01-15', 3, 29.99),
+      (2, 'Advanced SQL', 'Programming', '2024-01-15', 2, 49.99),
+      (3, 'Data Science', 'Data Analysis', '2024-01-15', 4, 39.99),
+      (4, 'Web Development', 'Programming', '2024-01-16', 1, 34.99),
+      (5, 'Statistics 101', 'Data Analysis', '2024-01-16', 2, 24.99),
+      (6, 'Python Basics', 'Programming', '2024-01-16', 5, 29.99),
+      (7, 'Machine Learning', 'Data Analysis', '2024-01-16', 2, 44.99);
+  ```
+---
+
+The bookstore manager wants to compare each book's price with the average price in its category. This will help identify which books might be overpriced or underpriced within their categories.
+
+Given the following data in table `sale`:
+
+| id  | book_title       | category      | sale_date  | quantity | price |
+| --- | ---------------- | ------------- | ---------- | -------- | ----- |
+| 1   | SQL Basics       | Programming   | 2024-01-15 | 3        | 29.99 |
+| 2   | Advanced SQL     | Programming   | 2024-01-15 | 2        | 49.99 |
+| 3   | Data Science     | Data Analysis | 2024-01-15 | 4        | 39.99 |
+| 4   | Web Development  | Programming   | 2024-01-16 | 1        | 34.99 |
+| 5   | Statistics 101   | Data Analysis | 2024-01-16 | 2        | 24.99 |
+| 6   | Python Basics    | Programming   | 2024-01-16 | 5        | 29.99 |
+| 7   | Machine Learning | Data Analysis | 2024-01-16 | 2        | 44.99 |
+
+Write a query that shows:
+
+- Book title
+- Category
+- Price
+- Average price for the book's category
+
+Order the results by category (ascending) and price (descending).
+
+## Expected Output
+
+| book_title       | category      | price | category_avg |
+| ---------------- | ------------- | ----- | ------------ |
+| Machine Learning | Data Analysis | 44.99 | 36.66        |
+| Data Science     | Data Analysis | 39.99 | 36.66        |
+| Statistics 101   | Data Analysis | 24.99 | 36.66        |
+| Advanced SQL     | Programming   | 49.99 | 36.24        |
+| Web Development  | Programming   | 34.99 | 36.24        |
+| Python Basics    | Programming   | 29.99 | 36.24        |
+| SQL Basics       | Programming   | 29.99 | 36.24        |
+
+## Solution
+
+```sql
+SELECT
+    book_title,
+    category,
+    price,
+    ROUND(AVG(price) OVER (
+        PARTITION BY category
+    ), 2) as category_avg
+FROM sale
+ORDER BY category, price DESC;
+```
--- a/src/data/courses/sql-mastery/chapters/window-functions/lessons/bestseller-comparison.md
+++ b/src/data/courses/sql-mastery/chapters/window-functions/lessons/bestseller-comparison.md
@ -0,0 +1,85 @@
+---
+title: Bestseller Comparison
+description: Practice using FIRST_VALUE and NTH_VALUE to analyze bestselling books
+order: 130
+type: challenge
+setup: |
+  ```sql
+  CREATE TABLE book (
+      id INT PRIMARY KEY,
+      title VARCHAR(100),
+      category VARCHAR(50),
+      copies_sold INT,
+      price DECIMAL(10,2)
+  );
+
+  INSERT INTO book (id, title, category, copies_sold, price) VALUES
+      (1, 'SQL Basics', 'Programming', 1500, 29.99),
+      (2, 'Advanced SQL', 'Programming', 2500, 49.99),
+      (3, 'Data Science', 'Data Analysis', 2000, 39.99),
+      (4, 'Web Development', 'Programming', 1800, 34.99),
+      (5, 'Statistics 101', 'Data Analysis', 1200, 24.99),
+      (6, 'Python Basics', 'Programming', 3000, 29.99),
+      (7, 'Machine Learning', 'Data Analysis', 2200, 44.99);
+  ```
+---
+
+The bookstore wants to analyze their bestsellers in each category. For each book, they want to see how it compares to the top seller and second-best seller in its category.
+
+Given the following data in table `book`:
+
+| id  | title            | category      | copies_sold | price |
+| --- | ---------------- | ------------- | ----------- | ----- |
+| 1   | SQL Basics       | Programming   | 1500        | 29.99 |
+| 2   | Advanced SQL     | Programming   | 2500        | 49.99 |
+| 3   | Data Science     | Data Analysis | 2000        | 39.99 |
+| 4   | Web Development  | Programming   | 1800        | 34.99 |
+| 5   | Statistics 101   | Data Analysis | 1200        | 24.99 |
+| 6   | Python Basics    | Programming   | 3000        | 29.99 |
+| 7   | Machine Learning | Data Analysis | 2200        | 44.99 |
+
+Write a query that shows:
+
+- Book title
+- Category
+- Copies sold
+- Title of the bestselling book in that category
+- Title of the second-best selling book in that category
+
+Order the results by category (ascending) and copies sold (descending).
+
+## Expected Output
+
+| title            | category      | copies_sold | category_bestseller | second_bestseller |
+| ---------------- | ------------- | ----------- | ------------------- | ----------------- |
+| Machine Learning | Data Analysis | 2200        | Machine Learning    | Data Science      |
+| Data Science     | Data Analysis | 2000        | Machine Learning    | Data Science      |
+| Statistics 101   | Data Analysis | 1200        | Machine Learning    | Data Science      |
+| Python Basics    | Programming   | 3000        | Python Basics       | Advanced SQL      |
+| Advanced SQL     | Programming   | 2500        | Python Basics       | Advanced SQL      |
+| Web Development  | Programming   | 1800        | Python Basics       | Advanced SQL      |
+| SQL Basics       | Programming   | 1500        | Python Basics       | Advanced SQL      |
+
+## Solution
+
+```sql
+SELECT
+    title,
+    category,
+    copies_sold,
+    FIRST_VALUE(title) OVER (
+        PARTITION BY category
+        ORDER BY copies_sold DESC
+    ) as category_bestseller,
+    NTH_VALUE(title, 2) OVER (
+        PARTITION BY category
+        ORDER BY copies_sold DESC
+        RANGE BETWEEN
+            UNBOUNDED PRECEDING AND
+            UNBOUNDED FOLLOWING
+    ) as second_bestseller
+FROM book
+ORDER BY category, copies_sold DESC;
+```
+
+> Note: The `RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING` clause ensures NTH_VALUE has access to all rows in the partition, not just the rows up to the current row.
--- a/src/data/courses/sql-mastery/chapters/window-functions/lessons/mom-sales.md
+++ b/src/data/courses/sql-mastery/chapters/window-functions/lessons/mom-sales.md
@ -0,0 +1,104 @@
+---
+title: Month-over-Month Sales
+description: Practice using LAG to compare book sales between months
+order: 170
+type: challenge
+setup: |
+  ```sql
+  CREATE TABLE sale (
+      id INT PRIMARY KEY,
+      title VARCHAR(100),
+      category VARCHAR(50),
+      sale_month DATE,
+      copies_sold INT,
+      revenue DECIMAL(10,2)
+  );
+
+  INSERT INTO sale (id, title, category, sale_month, copies_sold, revenue) VALUES
+      (1, 'SQL Basics', 'Programming', '2024-01-01', 150, 4497.00),
+      (2, 'SQL Basics', 'Programming', '2024-02-01', 180, 5398.20),
+      (3, 'SQL Basics', 'Programming', '2024-03-01', 160, 4798.40),
+      (4, 'Data Science', 'Data Analysis', '2024-01-01', 120, 4798.80),
+      (5, 'Data Science', 'Data Analysis', '2024-02-01', 140, 5598.60),
+      (6, 'Data Science', 'Data Analysis', '2024-03-01', 130, 5198.70),
+      (7, 'Python Basics', 'Programming', '2024-01-01', 200, 5998.00),
+      (8, 'Python Basics', 'Programming', '2024-02-01', 190, 5698.10),
+      (9, 'Python Basics', 'Programming', '2024-03-01', 220, 6597.80);
+  ```
+---
+
+The bookstore manager wants to analyze month-over-month performance for each book. They need to see how sales and revenue change each month compared to the previous month.
+
+Given the following data in table `sale`:
+
+| id  | title         | category      | sale_month | copies_sold | revenue |
+| --- | ------------- | ------------- | ---------- | ----------- | ------- |
+| 1   | SQL Basics    | Programming   | 2024-01-01 | 150         | 4497.00 |
+| 2   | SQL Basics    | Programming   | 2024-02-01 | 180         | 5398.20 |
+| 3   | SQL Basics    | Programming   | 2024-03-01 | 160         | 4798.40 |
+| 4   | Data Science  | Data Analysis | 2024-01-01 | 120         | 4798.80 |
+| 5   | Data Science  | Data Analysis | 2024-02-01 | 140         | 5598.60 |
+| 6   | Data Science  | Data Analysis | 2024-03-01 | 130         | 5198.70 |
+| 7   | Python Basics | Programming   | 2024-01-01 | 200         | 5998.00 |
+| 8   | Python Basics | Programming   | 2024-02-01 | 190         | 5698.10 |
+| 9   | Python Basics | Programming   | 2024-03-01 | 220         | 6597.80 |
+
+Write a query that shows:
+
+- Book title
+- Sale month
+- Current month's copies sold
+- Previous month's copies sold
+- Sales change (current month minus previous month)
+- Sales change percentage (rounded to 1 decimal place)
+
+Order the results by title (ascending) and sale month (ascending).
+
+> **Hint:** Use `LAG()` to access the previous month's sales, and `NULLIF()` to avoid division by zero when calculating percentages. For the `pct_change` column, you can use the following formula:
+>
+> ```
+> (copies_sold - prev_month_copies_sold) / prev_month_copies_sold * 100
+> ```
+
+## Expected Output
+
+| title         | sale_month | current_sales | prev_sales | sales_change | pct_change |
+| ------------- | ---------- | ------------- | ---------- | ------------ | ---------- |
+| Data Science  | 2024-01-01 | 120           | null       | null         | null       |
+| Data Science  | 2024-02-01 | 140           | 120        | 20           | 16.7       |
+| Data Science  | 2024-03-01 | 130           | 140        | -10          | -7.1       |
+| Python Basics | 2024-01-01 | 200           | null       | null         | null       |
+| Python Basics | 2024-02-01 | 190           | 200        | -10          | -5.0       |
+| Python Basics | 2024-03-01 | 220           | 190        | 30           | 15.8       |
+| SQL Basics    | 2024-01-01 | 150           | null       | null         | null       |
+| SQL Basics    | 2024-02-01 | 180           | 150        | 30           | 20.0       |
+| SQL Basics    | 2024-03-01 | 160           | 180        | -20          | -11.1      |
+
+## Solution
+
+```sql
+SELECT
+    title,
+    sale_month,
+    copies_sold as current_sales,
+    LAG(copies_sold) OVER (
+        PARTITION BY title
+        ORDER BY sale_month
+    ) as prev_sales,
+    copies_sold - LAG(copies_sold) OVER (
+        PARTITION BY title
+        ORDER BY sale_month
+    ) as sales_change,
+    ROUND(
+        ((copies_sold - LAG(copies_sold) OVER (
+            PARTITION BY title
+            ORDER BY sale_month
+        )) * 100.0 / NULLIF(LAG(copies_sold) OVER (
+            PARTITION BY title
+            ORDER BY sale_month
+        ), 0)),
+        1
+    ) as pct_change
+FROM sale
+ORDER BY title, sale_month;
+```
--- a/src/data/courses/sql-mastery/chapters/window-functions/lessons/price-tier-rankings.md
+++ b/src/data/courses/sql-mastery/chapters/window-functions/lessons/price-tier-rankings.md
@ -0,0 +1,106 @@
+---
+title: Price Tier Rankings
+description: Practice using RANK to analyze book sales across different price tiers
+order: 160
+type: challenge
+setup: |
+  ```sql
+  CREATE TABLE book (
+      id INT PRIMARY KEY,
+      title VARCHAR(100),
+      category VARCHAR(50),
+      price DECIMAL(10,2),
+      copies_sold INT
+  );
+
+  INSERT INTO book (id, title, category, price, copies_sold) VALUES
+      (1, 'SQL Basics', 'Programming', 29.99, 1500),
+      (2, 'Advanced SQL', 'Programming', 49.99, 2500),
+      (3, 'Data Science', 'Data Analysis', 39.99, 2000),
+      (4, 'Web Development', 'Programming', 34.99, 1800),
+      (5, 'Statistics 101', 'Data Analysis', 24.99, 1200),
+      (6, 'Python Basics', 'Programming', 29.99, 3000),
+      (7, 'Machine Learning', 'Data Analysis', 44.99, 2200),
+      (8, 'Database Design', 'Programming', 49.99, 1700),
+      (9, 'R Programming', 'Data Analysis', 34.99, 1600);
+  ```
+---
+
+The bookstore wants to analyze how books perform within different price ranges. They want to identify the bestsellers in each price tier:
+
+- Budget: Less than $30
+- Standard: $30 to $40
+- Premium: Over $40
+
+Given the following data in table `book`:
+
+| id  | title            | category      | price | copies_sold |
+| --- | ---------------- | ------------- | ----- | ----------- |
+| 1   | SQL Basics       | Programming   | 29.99 | 1500        |
+| 2   | Advanced SQL     | Programming   | 49.99 | 2500        |
+| 3   | Data Science     | Data Analysis | 39.99 | 2000        |
+| 4   | Web Development  | Programming   | 34.99 | 1800        |
+| 5   | Statistics 101   | Data Analysis | 24.99 | 1200        |
+| 6   | Python Basics    | Programming   | 29.99 | 3000        |
+| 7   | Machine Learning | Data Analysis | 44.99 | 2200        |
+| 8   | Database Design  | Programming   | 49.99 | 1700        |
+| 9   | R Programming    | Data Analysis | 34.99 | 1600        |
+
+Write a query that shows:
+
+- Book title
+- Price tier (Budget/Standard/Premium)
+- Price
+- Copies sold
+- Rank within price tier (based on copies sold)
+
+Order the results by price tier (ascending) and rank within tier (ascending).
+
+## Expected Output
+
+| title            | price_tier | price | copies_sold | tier_rank |
+| ---------------- | ---------- | ----- | ----------- | --------- |
+| Python Basics    | Budget     | 29.99 | 3000        | 1         |
+| SQL Basics       | Budget     | 29.99 | 1500        | 2         |
+| Statistics 101   | Budget     | 24.99 | 1200        | 3         |
+| Advanced SQL     | Premium    | 49.99 | 2500        | 1         |
+| Machine Learning | Premium    | 44.99 | 2200        | 2         |
+| Database Design  | Premium    | 49.99 | 1700        | 3         |
+| Data Science     | Standard   | 39.99 | 2000        | 1         |
+| Web Development  | Standard   | 34.99 | 1800        | 2         |
+| R Programming    | Standard   | 34.99 | 1600        | 3         |
+
+> **Hint:** You can use a `CASE` statement inside the `PARTITION BY` clause for partitioning the data into different price tiers i.e.
+>
+> ```sql
+> PARTITION BY CASE
+>     WHEN price < 30 THEN 'Budget'
+>     WHEN price <= 40 THEN 'Standard'
+>     ELSE 'Premium'
+> END
+> ```
+
+## Solution
+
+```sql
+SELECT
+    title,
+    CASE
+        WHEN price < 30 THEN 'Budget'
+        WHEN price <= 40 THEN 'Standard'
+        ELSE 'Premium'
+    END as price_tier,
+    price,
+    copies_sold,
+    RANK() OVER (
+        PARTITION BY
+            CASE
+                WHEN price < 30 THEN 'Budget'
+                WHEN price <= 40 THEN 'Standard'
+                ELSE 'Premium'
+            END
+        ORDER BY copies_sold DESC
+    ) as tier_rank
+FROM book
+ORDER BY price_tier, tier_rank;
+```
--- a/src/data/courses/sql-mastery/chapters/window-functions/lessons/top-authors.md
+++ b/src/data/courses/sql-mastery/chapters/window-functions/lessons/top-authors.md
@ -0,0 +1,98 @@
+---
+title: Top Authors
+description: Practice using RANK to find the most successful authors
+order: 150
+type: challenge
+setup: |
+  ```sql
+  CREATE TABLE author (
+      id INT PRIMARY KEY,
+      name VARCHAR(100),
+      country VARCHAR(50)
+  );
+
+  CREATE TABLE book (
+      id INT PRIMARY KEY,
+      title VARCHAR(100),
+      author_id INT,
+      copies_sold INT
+  );
+
+  INSERT INTO author (id, name, country) VALUES
+      (1, 'John Smith', 'USA'),
+      (2, 'Emma Wilson', 'UK'),
+      (3, 'Michael Brown', 'Canada'),
+      (4, 'Sarah Davis', 'USA'),
+      (5, 'David Miller', 'UK'),
+      (6, 'Lisa Chen', 'Canada');
+
+  INSERT INTO book (id, title, author_id, copies_sold) VALUES
+      (1, 'SQL Basics', 1, 1500),
+      (2, 'Advanced SQL', 1, 2500),
+      (3, 'Data Science', 2, 3000),
+      (4, 'Web Development', 3, 1800),
+      (5, 'Python Basics', 4, 2800),
+      (6, 'Machine Learning', 2, 4000),
+      (7, 'Database Design', 5, 1200),
+      (8, 'Cloud Computing', 6, 900),
+      (9, 'JavaScript Guide', 4, 2200);
+  ```
+---
+
+The bookstore wants to identify their top-selling authors. They need a report showing the top 3 authors based on total books sold.
+
+Given the following data:
+
+**author table:**
+| id  | name          | country |
+| --- | ------------- | ------- |
+| 1   | John Smith    | USA     |
+| 2   | Emma Wilson   | UK      |
+| 3   | Michael Brown | Canada  |
+| 4   | Sarah Davis   | USA     |
+| 5   | David Miller  | UK      |
+| 6   | Lisa Chen     | Canada  |
+
+**book table:**
+| id | title            | author_id | copies_sold |
+| -- | ---------------- | --------- | ----------- |
+| 1  | SQL Basics       | 1         | 1500        |
+| 2  | Advanced SQL     | 1         | 2500        |
+| 3  | Data Science     | 2         | 3000        |
+| 4  | Web Development  | 3         | 1800        |
+| 5  | Python Basics    | 4         | 2800        |
+| 6  | Machine Learning | 2         | 4000        |
+| 7  | Database Design  | 5         | 1200        |
+| 8  | Cloud Computing  | 6         | 900         |
+| 9  | JavaScript Guide | 4         | 2200        |
+
+Write a query that shows:
+- Author name
+- Author's country
+- Total copies sold
+- Author's rank (based on total copies sold)
+
+Only include the top 3 authors, ordered by their rank (ascending).
+
+## Expected Output
+
+| author_name  | country | total_copies | author_rank |
+| ------------ | ------- | ------------ | ----------- |
+| Emma Wilson  | UK      | 7000         | 1           |
+| Sarah Davis  | USA     | 5000         | 2           |
+| John Smith   | USA     | 4000         | 3           |
+
+## Solution
+
+```sql
+SELECT 
+    a.name as author_name,
+    a.country,
+    SUM(b.copies_sold) as total_copies,
+    RANK() OVER (ORDER BY SUM(b.copies_sold) DESC) as author_rank
+FROM author a
+INNER JOIN book b ON b.author_id = a.id
+GROUP BY a.id, a.name, a.country
+ORDER BY author_rank
+LIMIT 3;
+```