Intriguing SQL Riddles to Sharpen Your Skills

Chapter 1: Introduction to SQL Riddles

SQL, while appearing straightforward, can often lead to unexpected outcomes. Its various dialects allow users to interact with databases using syntax that resembles English. However, things are not always as they seem. Occasionally, I stumble upon queries that yield surprising results, revealing subtle aspects of the language. In this article, I've gathered three recent puzzles, presenting them as riddles for added intrigue. Challenge yourself to find the solution before checking the end of each section!

Additionally, I've included brief common table expressions (CTEs) to create the necessary tables for each example, so you won’t have to query live production tables. For a deeper understanding of SQL, I highly recommend setting up your own database and tables for experimentation. You can read more about that in this post.

Keep in mind that all examples are in Postgres, and results may differ in other SQL dialects. The data and topics in these queries are merely illustrative.

Section 1.1: Riddle 1 - Timestamp Precision

Consider a table named purchases that holds purchase IDs, amounts, and timestamps of transactions. Here’s a glimpse of how the data might appear:

In CTE format, this would look like:

WITH purchases(id, amount, dt) AS (

VALUES

(1::bigint, 0.99::float, '2023-02-15 00:00:00 GMT'::timestamp),

(2, 9.99, '2023-02-15 07:15:00 GMT'),

(3, 15.99, '2023-02-15 23:01:15 GMT'),

(4, 7.99, '2023-02-16 14:22:09 GMT')

)

Now, let's determine the total of purchases made on February 15. We can execute a query as follows:

SELECT

SUM(amount) AS sum

FROM purchases

WHERE

dt = '2023-02-15'

Surprisingly, we receive the following output. What occurred?

There were three transactions on February 15: IDs 1, 2, and 3, which should sum to $26.97. Instead, only the initial purchase was counted.

Hint: If you adjust the filter to 2023-02-16, no rows will be returned.

Answer: The dt column is a timestamp that includes both date and time. When we only specify the date in our WHERE clause, Postgres reformats the date string to 2023-02-15 00:00:00, matching only the first transaction. To select all transactions from February 15, we should cast the timestamp to a date:

SELECT

SUM(amount) AS sum

FROM purchases

WHERE

DATE(dt) = '2023-02-15'

With this adjustment, we now obtain the expected result.

Video Description: In this tutorial, we dive into a SQL take-home challenge, exploring essential concepts and practical solutions.

Section 1.2: Riddle 2 - Filters: Dependent vs. Independent

Next, we explore a table named users. Our goal is to filter out any rows meeting one of three conditions. For instance, we want to return only tenured and active users—those who have logged in during the last 28 days, have made prior posts, and are not new accounts.

As shown in the table below, our query should ideally return only user 8, who has False values for no_login_l28, has_never_posted, and is_new_account.

WITH users(id, no_login_l28, has_never_posted, is_new_account) AS (

VALUES

(1, True, True, True),

(2, True, True, False),

(3, True, False, True),

(4, True, False, False),

(5, False, True, True),

(6, False, True, False),

(7, False, False, True),

(8, False, False, False)

)

How should we structure our WHERE clause?

Options:

Multiple AND NOT
Multiple OR NOT
NOT + grouped OR
NOT + grouped AND

Hint: Consider how conditions in a filter are evaluated.

Answer: Option 1 is the correct choice. Although it initially confused me, the independent evaluation of conditions means that this filter correctly removes users with False values across all three columns.

Video Description: Test your logic with a series of challenging riddles that will put your critical thinking skills to the test!

Section 1.3: Riddle 3 - Left Joins Mimicking Inner Joins

In our final riddle, we examine two tables: customers and reviews. The customers table includes customer IDs along with their total spending.

WITH customers(id, total_spend) AS (

VALUES

(100, 1583.49),

(200, 8739.03),

(300, 431.00),

(400, 1.00),

(500, 22.27)

reviews(id, customer_id, reported_as_spam) AS (

VALUES

(1, 100, False),

(2, 100, False),

(3, 400, True),

(4, 400, True),

(5, 500, False)

)

Let’s say we want to investigate the correlation between a customer’s total spending and the number of non-spam reviews they provide. A left join between reviews and customers would typically look like this:

SELECT

c.id,

c.total_spend,

COALESCE(COUNT(r.id), 0) AS n_reviews

FROM customers c

LEFT JOIN reviews r

ON c.id = r.customer_id

WHERE

NOT r.reported_as_spam

GROUP BY

1, 2

ORDER BY

1

But wait—why have users 200, 300, and 400 vanished?

Hint: If we create a CTE for reviews with spam reviews filtered out and join on that CTE, will we see the same result?

Answer: The issue arises because WHERE clauses are processed after joins. The left join should maintain null values for users without reviews, but the subsequent WHERE filter excludes them. To correct this, we can add the filter within the LEFT JOIN:

SELECT

c.id,

c.total_spend,

COALESCE(COUNT(r.id), 0) AS n_reviews

FROM customers c

LEFT JOIN reviews r

ON c.id = r.customer_id

AND NOT r.reported_as_spam

GROUP BY

1, 2

ORDER BY

1

With this adjustment, we achieve the expected results.

Chapter 2: Conclusion

In this article, we explored three SQL challenges that can produce unexpected results: timestamp specificity, the difference between dependent and independent filters, and the nuances of left joins acting like inner joins. While I provided simple examples for clarity, these types of SQL intricacies can often arise in larger, more complex queries.

Identifying these issues can be daunting, particularly in intricate queries. When faced with confusion, I recommend breaking down the query into its individual components to verify each result. If you're still uncertain, creating simple CTEs with test data is a great way to validate your expectations.

Happy querying!

Matt

arsalandywriter.com

Intriguing SQL Riddles to Sharpen Your Skills

Chapter 1: Introduction to SQL Riddles

Section 1.1: Riddle 1 - Timestamp Precision

Section 1.2: Riddle 2 - Filters: Dependent vs. Independent

Section 1.3: Riddle 3 - Left Joins Mimicking Inner Joins

Chapter 2: Conclusion

Share the page:

Recent Post:

The Unsolved Mystery of the Long Island Serial Killer

Navigating Midlife: Embracing Change and Finding Passion

Taylor Series Expansions: A Guide to Function Approximation

Engaging Reads for Adults: 12 Must-Read Book Recommendations

Unlock the Power of the Swiss Ball for Enhanced Workouts

Unlocking Insights with Google Cloud Natural Language API

Exploring Time Travel Through Meditation: My Journey Begins

# How to Identify and Evade a 'Writing Pyramid Scheme'