[SQL Basic] Let’s learn some advanced JOIN queries!
#20. Marketing/Data Analysis for Beginners
We already learned about the JOIN
function in lesson 13 and 14. In this tutorial, we will expand on what we’ve learned and combine multiple tables together to filter and view useful information.
#Glossary:
📌Lesson 3: SELECT / FROM / WHERE
📌Lesson 4: ORDER BY
📌Lesson 5: AND / OR / IN
📌Lesson 6: LIKE
📌Lesson 7: AS /DISTINCT / IS NULL
📌Lesson 9: COUNT / SUM
📌Lesson 10: AVG / MAX / MIN
📌Lesson 11: GROUP BY / HAVING
📌Lesson 12: CASE
📌Lesson 13: INNER JOIN
📌Lesson 14: OUTER LEFT JOIN / OUTER RIGHT JOIN
📌Lesson 15: SUBQUERIES
📌Lesson 16: DATE_FORMAT / DATEDIFF
📌Lesson 17: CONCAT/LENGTH/SUBSTR
📌Lesson 18: TRUNCATE/ROUND/MOD
📌Lesson 19: Advanced CASE
#Table of Contents
Query 1. Check the overdue fees owed by customers and the number of times they were late returning movies by combining 5 different tables.
Let’s review JOIN
first. Do you remember what this function does? JOIN
allows you to combine two or more tables using a common column (Join Key).
Different kinds of JOIN:
Depending on the information you want to see, different JOIN
functions can be used. In this example, we will user INNER JOIN
to combine two tables and view only rentals that are past due.
Join Review: INNER JOIN (Lesson 13), LEFT/RIGHT JOIN (Lesson 14)
Q1. Check the overdue fees owed by customers and the number of times they were late returning movies by combining 5 different tables.
Let’s say we want to give the customers with the least amount of late fees a coupon in our imaginary DVD store. To find who they are, we have to combine multiple tables that contain different levels of data. We specified the columns we will use to combine all five tables below:
To calculate an accurate fee for late returns, we need to know how long the movie was rented and how much it costs to rent. We also need to factor in when it was rented, and when it was returned.
The main information we want extracted from all these tables is the customer ID number, customer’s full name, email address, the number of times they were late in returning a movie, and how much they owe for being late. This tables several table combinations because we need to calculate the number of overdue payments and the rent by movie.
Here is a visual break down of what we need from each table:
Here is a breakdown of what the image means:
- First, we will join the rental table and the customer table using the
customer_id
column - Next, we will join the rental table and the payment table using the
rental_id
column - Then we will join the rental table and the inventory table using the
inventory_id
column - Last, we will join the inventory table and the film table using the
film_id
column
Now that we checked out the order we will combine the tables and the columns we need, let’s look at our query in full.
If it looks too complicated, don’t worry. We’re going to break it down into 6 simple steps. :)
1. Use JOIN
in the FROM
function to join 5 tables: rental table, customer table, payment table, inventory table, film table.
We usually start with the SELECT
function, but this time we will start with the FROM
function. The reason we’re doing this is because we want to define a nickname (alias) for all our tables first, which will make writing our SELECT
function later on much more intuitive and easy.
So let’s write the JOIN
query that combines our five tables in the FROM
function. We want all data including loan/return information, overdue information, payment information, movie information, and customer information. So we will combine the tables with INNER JOIN
, which does not generate any NULL
values.
To save time and energy so we don’t have to type out the name of the table every time we need to call on it, we will give every table a nickname. Here is the alias cheat-sheet:
Rental table alias ▶ ️ r
Customer table alias ▶ c
Payment table alias ▶ p
Film table alias ▶ f
Inventory table alias ▶ i
Combine the tables in the order of JOIN
above, and note the common column that combines each table after ON
. Great! We’ve combined all the tables to get the information we need.
2. Add an AND
function to the JOIN
query to get the data that corresponds to the overdue period.
We already joined the film table, but we also want a calculation result from the film table. We can do this using the AND
function.
To get an accurate time period of how long a customer was late in returning a movie, we’re going to use the DATEDIFF
function to calculate the difference between the day the movie was rented out and the day it was returned. If you forgot how to use DATEDIFF, check out our lesson 16 again.
⭐️ Don’t forget to utilize the ‘r’ alias for the rental table here.
3. Specify the columns we want to view for customer information (full name, email) in the SELECT
function
Using the aliases we already established, we can clarify the columns we want to see in the SELECT function.
Start with the customer ID, which is from the customer table (alias c). Next, connect the first and last name of the customers using CONCAT
. Put a space between the names using quotations and rename the column as customer_name. Last, add in the customer email.
⭐️ Keep in mind that all these columns are from the customer table, so we need to add the alias c.
before each column name.
4. Count of the number of delayed movie returns per customer and view them with the SELECT
function!
With information from the rental table, we can find the number of delayed returns of movies per customer. Using the COUNT
function, count all the rental IDs from the rental table (alias r). Rename this column overdue_amount.
🔑 Because we only joined the long rental periods through the JOIN
function, we will get the number of only rentals that are overdue.
GROUP BY
customer ID to organize your results. Don’t forget to use the alias of the customer table (c.
)
5. Find the total late fee amount owed per customer for their late returns in the SELECT
function.
To calculate the late fee, we need to calculate the difference between the actual rent paid for each movie that is past due and the specified standard rent. Use SUM
to calculate the amount from the payment table (p.amount
) subtracted (-
) by the rental rate from the films table (f.rental_rate
).
The GROUP BY
function is also critical to this calculation because it helps better organize the total sum of past due fees per customer ID.
6. Order the total amount of late fees and late payment calculated per customer in the order of low fees to high fees owed.
In the ORDER BY
function, sort by the overdue_number column we created in step 4. Don’t forget to close the query with a semicolon ( ;
).
We can now hit F5 and view our data!
Here is the full query if you want to try copy + pasting it~
SELECT c.customer_id, CONCAT(c.first_name, “, “, c.last_name) AS customer_name, c.email, COUNT(r.rental_id) as overdue_number, SUM(p.amount-f.rental_rate) as overdue_amount FROM rental r INNER JOIN customer c ON r.customer_id = c.customer_id INNER JOIN payment p ON r.rental_id = p.rental_id INNER JOIN inventory i ON r.inventory_id = i.inventory_id INNER JOIN film f ON i.film_id = f.film_id AND f.rental_duration < DATEDIFF(r.return_date, r.rental_date) GROUP BY c.customer_id ORDER BY overdue_number;
Though it might seem complicated at first, this is a good way to join several tables and calculate information we need from all of them. Using the JOIN
function, we’re able to accurately view the information of each customer and how much they owe our store in late fees. We can also see how many times they were late in returning a movie!
⭐️BONUS
Let’s say we want to give a warning to our customers that they owe some fees, or reward the ones with the smallest number of late returns with a coupon. We can just click and drag over the emails we want from our list and copy/paste it into a mass email!
If you like more visual guidance, check out our video tutorial below:
# Tutorial Video
# [SQL Basic Series] — Marketing/Data Analysis for Beginners
1. What is SQL? — Database Introduction
2. Downloading SQLGate and Connecting to a Database
3. What kind of data is in the ‘film’ table? — SELECT/FROM/WHERE
4. How to sort film lists by price (rental rate)? — ORDER BY
5. Can I rent a 3 hour movie for $0.99? — AND/OR/IN
6. How can I find a movie that’s thrilling? — LIKE
7. How to rename columns and see unique values? — AS/DISTINCT/IS NULL
8. Let’s see Sample Database tables!
9. What is the total length of all the films added together? — COUNT/SUM
10. What is the average film length? — AVG/MIN/MAX
11. How many movies have the same rating? — GROUP BY/HAVING
12. Let’s designate a value based on the rental duration. — CASE
13. Let’s find out the movie genres by joining tables! — INNER JOIN
14. Let’s check movie rentals by the inventory list! — LEFT/RIGHT JOIN
15. Let’s find customers that have rented movies priced $9.99! — Subquery
16. How to work with Date Functions in SQL — DATE_FORMAT, DATEDIFF
17. How to work with String Functions in SQL — CONCAT, LENGTH, SUBSTR
18. Let’s use some Numeric SQL functions — TRUNCATE, ROUND, MOD
19. Let’s learn some advanced CASE queries!
20. Let’s learn some advanced JOIN queries!