Why Is My Query Slow? 3 possible causes
Making queries faster is like going to the DMV — we all procrastinate until things slow down so much it’s painful.
Assume you have a query that's in crisis mode, and we need to figure out what we can do now to make it faster.
Let’s also assume you don’t control the schema — no adding indexes, no changing data types, no tweaking where the data lives.
So what can you do? Is this where the article turns into clustering strategies, B-trees, and half the audience starts to nod off?
Not yet. I do plan to write in-depth articles about performance tuning later, but 1) that doesn’t feel like a very welcoming “first post,” and 2) it’ll take time to do it justice.
Instead, let’s look at a few practical query patterns you can watch for. Everything I touch on here has legitimate uses — I just want to help you understand why they can have performance impacts. To do that, I’m going to explain them in a very non-technical way.
I like to think of queries as if it's me, not the engine, running them. (Credit to Brett Ozar for using this kind of analogy — it resonated with me, and I hope it does with you.)
Imagine the database you're querying is a room, the tables are drawers, and the records are individual paper files.
1. Unnecessary Aggregations
It’s very common to see queries that select a lot of columns and include a DISTINCT
or a GROUP BY
. It seems harmless — maybe there are duplicates, and deduping “just in case” feels safe.
Sometimes, you do need this because the source data is flawed — but let’s assume that’s not the case here.
File drawer analogy: If I ask you to get me every file in the room, it sounds like a decent amount of work — but doable. But if I add, “And while you’re at it, make sure not to bring me any duplicates,” you now have to stop, compare contents, and then decide what to bring. That’s extra effort.
2. Using UNION Without ALL
Aggregations like GROUP BY
or DISTINCT
are fairly easy to spot. But there's a sneakier culprit: the UNION
.
Quick recap: a UNION
stacks two queries with the same column structure — unlike a JOIN
, which merges them side-by-side.
UNION
brings data together vertically, while JOIN
brings data together horizontally.
Here’s the catch: UNION
(by itself) adds an implicit DISTINCT
step — meaning it removes duplicates between the two result sets. This comes at a cost.
As a general rule: use UNION ALL
unless you specifically need to remove duplicates. UNION ALL
just appends the rows without deduplication — often faster, especially with large datasets.
3. Calculations in Predicates
This one’s subtle, but important. In general, human-readable SQL is good SQL — but not always.
Let’s start with a quick definition: a predicate is the part of your query where you filter results — typically the WHERE
clause. When I say "don’t use calculations on columns in predicates," I mean: try to keep functions on the right-hand side of the equality, not on the column itself.
For example, suppose you have a date column (DATECOL
) and want records from this quarter.
Here’s a common, but less efficient, approach:
WHERE DATE_TRUNC('QUARTER', DATECOL) = DATE_TRUNC('QUARTER', CURRENT_DATE)
What’s the problem? Even though this is readable, it transforms the column, which can block the query engine from using any underlying data organization (like partitions or indexes) efficiently.
If you want to nerd out on this, the concept is called SARGability — short for "Search ARGument-able."
Here’s a better, more performance-friendly version:
WHERE DATECOL >= DATE_TRUNC('QUARTER', CURRENT_DATE)
Now we’re saying: "Tell me when the current quarter starts, and fetch anything dated after that." Much more engine-friendly.
Wrap-Up
This is a taste of my teaching style, and the kind of problem-solving I enjoy. I came into data engineering from a background in fine arts and economics — and like many, I wrestled with imposter syndrome early on. What helped me most was finding intuitive ways to understand SQL — less about compilers and more about how the engine thinks.
I’ll be sharing more beginner-friendly SQL content, as well as deeper dives into performance tuning soon.
Have a query you’d like help with? Or want to structure your SQL for performance? Message me after May 21st — I’d love to help.
Note: I’ll be unavailable for mentoring between now and May 20th, but feel free to reach out — I’ll reply as soon as I return!