SQL is a fundamental topic in data engineering interviews at all levels.
Below, we've compiled a list of example SQL interview questions indicative of the types of questions you can expect in real data engineering interviews.
We created this list with insights from data engineers, data scientists, and machine learning engineers at Meta, Google, Dropbox, Amazon, and top startups.
Early in your data career, your SQL interviews will likely be focused on basic querying, syntax, and data retrieval, such as:
WHERE
clauses to filter rows based on specific conditions.INNER JOIN
and LEFT JOIN
.COUNT()
, SUM()
, AVG()
, MIN()
, and MAX()
to aggregate.GROUP BY
to aggregate data into summary rows.ORDER BY
to sort query results.Here are some examples of common SQL questions you might hear:
You are given a LinkedIn table of applicants and their skills. You are tasked with finding candidates with experience in Python, Tableau, and PostgreSQL.
The table contains candidate_id
and skill
columns.
SELECT candidate_id
FROM candidates
WHERE skill IN ('Python', 'Tableau', 'PostgreSQL');
SELECT
the candidate_id
where the skill is Python, Tableau, or PostgreSQL. You are given two tables with Facebook Pages and their respective likes.
You are tasked with getting the pages with no likes.
The pages
table has page_id
and page_name
columns, while the page_likes
table has user_id
, page_id
, and liked_date
columns.
SELECT page_id
FROM pages WHERE page_id
NOT IN (SELECT page_id FROM page_likes);
SELECT
page_id
that is not in the page_likes
table. You are given a Tesla parts_assembly
table with the following columns: part
, finish_date
, and assembly_step
. You are tasked with finding parts that have begun the production process but are not yet finished.
Assuming the unfished parts lack a finish_date
, the answer is:
SELECT part, assembly_step
FROM parts_assembly
WHERE finish_date IS NULL;
You are given a Facebook table with users who posted at least twice in 2021.
Find the number of days between each user’s first post of the year and the last post in 2021.
The output should be the days between each user’s first and last posts. The posts table has the following columns: user_id
, post_id
, post_content
, and post_date
.
SELECT
user_id,
DATE_PART('day',
MAX(post_date) - MIN(post_date)) AS days_between_posts
FROM posts
WHERE EXTRACT(YEAR FROM post_date) = 2021
GROUP BY user_id
HAVING COUNT(*) >= 2;
user_id
to calculate values for each group. MIN(post_date)
gives the date of the first post and MAX(post_date)
gives the date for the last post. DATE_PART('day', ...)
calculates the days between the first and last posts. HAVING COUNT(*) >= 2
ensures that only users with at least 2 posts in 2021 are returned. Given a Microsoft messages
table with the following columns, message_id
, sender_id
, receiver_id
, content
, and sent_date
, find the two power users who sent the most messages in Microsoft Teams in August 2022.
The output should be ordered in descending order based on the number of messages.
SELECT sender_id, COUNT(sender_id) AS message_count
FROM messages
WHERE sent_date BETWEEN '08/01/2022' AND '08/31/2022'
GROUP BY sender_id
ORDER BY message_count DESC
LIMIT 2;
SELECT
sender_id
and count of messages as message_count
.sent_date
between the first and last day of August 2022.GROUP
the results by sender_id
.LIMIT
to 2 results. Find duplicate job listings in a LinkedIn job_listings
table with the following columns: job_id
, company_id
, title
, and description
.
A duplicate has the same title
and description
. Return the number of companies that have posted duplicate jobs.
SELECT COUNT(DISTINCT company_id) AS company_count
FROM (
SELECT company_id
FROM job_listings
GROUP BY company_id, title, description
HAVING COUNT(*) > 1
) AS duplicates;
company_id
, title
, and description
.COUNT(DISTINCT company_id)
counts the number of companies with duplicate job listings. You are given two tables from Robinhood's stock trading app: trades
and users
.
The trades
table has the following columns: order_id
, user_id
, quantity
, status ('Completed', 'Cancelled')
, date
, and price
.
The users
table has user_id
, city
, email
, and signup_date
.
Write a query to find the three cities with the highest number of orders arranged in descending order.
SELECT city, COUNT(t.order_id) as total_orders FROM trades t
INNER JOIN
users u ON
t.user_id=u.user_id
WHERE t.status = 'Completed'
GROUP BY u.city
ORDER BY total_orders DESC
LIMIT 3;
users
table and the trades
table on the user_id
.'Completed’
status. Given the events
table for Facebook app analytics, calculate the app's click-through rate for 2022 in 2 decimals.
The events
table has the following columns: app_id
, event_type
, and timestamp
.
SELECT
app_id,
ROUND(
(100.0 * COUNT(CASE WHEN event_type = 'click' THEN 1 END) /
COUNT(CASE WHEN event_type = 'impression' THEN 1 END)),
2
) AS ctr
FROM events
WHERE EXTRACT(YEAR FROM timestamp) = 2022
GROUP BY app_id;
Given two TikTok tables, texts
, and emails
, find the users who confirmed their sign-up on the second day.
The texts
table contains the following columns: text_id
, email_id
, signup_action ('Confirmed', 'Not confirmed')
, and action_date
.
The emails table has email_id
, user_id
, and signup_date
columns.
SELECT user_id FROM emails e
INNER JOIN
texts t ON
e.email_id = t.email_id
WHERE t.action_date = e.signup_date + INTERVAL '1 day'
AND t.signup_action = 'Confirmed';
texts
and emails
table on the email_id
. action_date
and signup_date
having an interval of 1 day and sign-up action being Confirmed. As your data engineer career advances, your SQL interviews will evolve to include more complex data manipulation, transformation, and preprocessing tasks.
Expect to be tested on larger datasets and more advanced SQL concepts.
At this stage, you should be comfortable answering SQL questions involving:
ROW_NUMBER()
, RANK()
, DENSE_RANK()
, LEAD()
, and LAG()
.CASE WHEN
for conditional logic within queries.WITH
clauses to define temporary result sets.UNION
, INTERSECT
, and EXCEPT
.CONCAT()
, SUBSTRING()
, and REPLACE()
.Here are some intermediate-level SQL questions to help you practice:
You are given a table of Uber transactions with columns: user_id
, spend
, and transaction_date
. Find the third transaction made by every user. The output should include the user_id
, spend
, and transaction_date
.
SELECT
user_id,
spend,
transaction_date
FROM (
SELECT
user_id,
spend,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) AS rn
FROM transactions
) AS ranked_transactions
WHERE rn = 3;
ROW_NUMBER()
to assign a row number (rn
) to each transaction within each user_id
, ordered by transaction_date
.rn = 3
, corresponding to each user's third transaction.Given an employee table with columns: employee_id
, name
, salary
, department_id
, and manager_id
, find the second highest salary.
WITH ranked_salaries AS (
SELECT
salary,
RANK() OVER (ORDER BY salary DESC) AS rank
FROM employee
)
SELECT salary
FROM ranked_salaries
WHERE rank = 2;
WITH
clause) and the RANK()
window function to rank employees by salary.rank = 2
).You are given two tables: activities
with columns activity_id
, user_id
, activity_type
('send', 'open', 'chat'), time_spent
, and activity_date
, and age_breakdown
with user_id
and age_bucket
. Calculate the percentage of time spent sending vs. opening snaps, grouped by age group, rounded to two decimal places.
SELECT
ab.age_bucket,
ROUND(
100.0 * SUM(CASE WHEN a.activity_type = 'send' THEN a.time_spent ELSE 0 END) /
SUM(CASE WHEN a.activity_type IN ('send', 'open') THEN a.time_spent ELSE 0 END),
2
) AS send_perc,
ROUND(
100.0 * SUM(CASE WHEN a.activity_type = 'open' THEN a.time_spent ELSE 0 END) /
SUM(CASE WHEN a.activity_type IN ('send', 'open') THEN a.time_spent ELSE 0 END),
2
) AS open_perc
FROM activities a
JOIN age_breakdown ab ON a.user_id = ab.user_id
WHERE a.activity_type IN ('send', 'open')
GROUP BY ab.age_bucket;
JOIN
to combine the activities
and age_breakdown
tables on user_id
.CASE
statements to calculate the percentage of time spent on each activity type.age_bucket
.You are given a tweets
table with columns: user_id
, tweet_date
, and tweet_count
. Calculate the 3-day rolling average of tweets for each user. The output should include user_id
, tweet_date
, and the rolling average rounded to two decimal places.
SELECT
user_id,
tweet_date,
ROUND(
AVG(tweet_count) OVER (
PARTITION BY user_id
ORDER BY tweet_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2
) AS rolling_avg_3d
FROM tweets
ORDER BY user_id, tweet_date;
AVG()
window function to calculate the rolling average of tweet_count
for each user.PARTITION BY user_id
ensures the rolling average is calculated separately for each user.ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
defines the window for the rolling average, considering the current row and the two previous rows.The product_spend
table contains data about customers' spending in different Amazon categories.
You are tasked with finding the two highest-grossing items per category in 2022.
The table has the following columns: category
, product
, user_id
, spend
, and transaction_date
. Output the category
, product
, and total_spend
.
WITH ranking_cte AS (
SELECT
category,
product,
SUM(spend) AS total_spend,
RANK() OVER (PARTITION BY category ORDER BY SUM(spend) DESC) AS rank
FROM product_spend
WHERE EXTRACT(YEAR FROM transaction_date) = 2022
GROUP BY category, product
)
SELECT
category,
product,
total_spend
FROM ranking_cte
WHERE rank <= 2
ORDER BY category, rank;
WITH
clause to compute the ranking of each product within its category based on total spend.RANK()
to assign a rank, with the highest-spending product ranked first.EXTRACT()
function and select the top two products per category.You are given two tables: employee
with columns employee_id
, name
, salary
, department_id
, and manager_id
, and department
with columns department_id
and department_name
.
Find the top three employees by salary in each department. Output the department_name
, name
, and salary
.
WITH salary_cte AS (
SELECT
d.department_name,
e.name,
e.salary,
DENSE_RANK() OVER (PARTITION BY d.department_name ORDER BY e.salary DESC) AS rank
FROM employee e
INNER JOIN department d ON d.department_id = e.department_id
)
SELECT department_name, name, salary
FROM salary_cte
WHERE rank <= 3;
DENSE_RANK()
to rank employees by salary in each department, ensuring that ties don't create gaps in the ranking.employee
and department
tables on department_id
.You are given three tables: artists
(with columns artist_id
, artist_name
, label_owner
), songs
(with columns song_id
, artist_id
, name
), and global_song_rank
(with columns day
, song_id
, and rank
).
Find the top five artists whose songs appear most frequently in the Top 10 of the global chart. Output the artist_name
and artist_rank
.
WITH songs_cte AS (
SELECT a.artist_name,
DENSE_RANK() OVER (ORDER BY COUNT(s.song_id) DESC) AS artist_rank
FROM artists a
INNER JOIN songs s ON a.artist_id = s.artist_id
INNER JOIN global_song_rank g ON g.song_id = s.song_id
WHERE g.rank <= 10
GROUP BY a.artist_name
)
SELECT artist_name, artist_rank
FROM songs_cte
WHERE artist_rank <= 5;
DENSE_RANK()
to rank the artists based on how frequently their songs appear in the Top 10.You are given emails
and texts
tables from TikTok. The emails
table contains information about user sign-ups, while the texts
table contains information about user activations. The columns are as follows: emails
table (email_id
, user_id
, signup_date
), and texts
table (text_id
, email_id
, signup_action
).
Calculate the activation rate of users, rounded to two decimal places.
SELECT
ROUND(
COUNT(DISTINCT CASE WHEN t.signup_action = 'Confirmed' THEN e.user_id END) * 1.0 /
COUNT(DISTINCT e.user_id),
2
) AS confirm_rate
FROM emails e
LEFT JOIN texts t ON e.email_id = t.email_id;
LEFT JOIN
to ensure all users from the emails
table are included.COUNT(DISTINCT CASE WHEN ...)
.You are given two tables: customer_contracts
(with columns customer_id
, product_id
, and amount
) and products
(with columns product_id
, product_category
, and product_name
).
Find the customer IDs of customers who have purchased at least one product from every product category.
SELECT
cc.customer_id
FROM customer_contracts cc
JOIN products p ON cc.product_id = p.product_id
GROUP BY cc.customer_id
HAVING COUNT(DISTINCT p.product_category) = (SELECT COUNT(DISTINCT product_category) FROM products);
customer_contracts
and products
tables on product_id
.customer_id
and use HAVING
to filter for customers who have purchased at least one product from each category.Senior data engineers are trusted to handle extremely large datasets.
Large companies like Meta, Microsoft, TikTok, and Google need data engineers who can clean, synthesize, and interpret large amounts of information.
In combination with machine learning, senior engineers help to make sense of trends and make predictions for the future.
Before stepping into a senior SQL interview in your data engineer loops, consider brushing up on concepts like:
RANK
, DENSE_RANK
, ROW_NUMBER
, and LAG
for complex calculations and analytics.COMMIT
, ROLLBACK
, and SAVEPOINT
.Here are some questions you can practice:
You are given the Wayfair user_transactions
table with the following columns, transaction_id
, product_id
, spend
, and transaction_date
. Compute the year-on-year growth rate for the total spend of each product, grouping the results by product ID. Include the year in ascending order, product ID, current year's spend, previous year's spend, and year-on-year growth percentage, rounded to 2 decimal places.
WITH yearly_spend AS (
SELECT
EXTRACT(YEAR FROM transaction_date) AS year,
product_id,
SUM(spend) AS curr_year_spend
FROM user_transactions
GROUP BY year, product_id
),
spend_with_prev AS (
SELECT
ys1.year,
ys1.product_id,
ys1.curr_year_spend,
LAG(ys1.curr_year_spend) OVER (PARTITION BY ys1.product_id ORDER BY ys1.year) AS prev_year_spend
FROM yearly_spend ys1
)
SELECT
year,
product_id,
curr_year_spend,
prev_year_spend,
ROUND(
CASE
WHEN prev_year_spend IS NULL THEN NULL
ELSE ((curr_year_spend - prev_year_spend) / prev_year_spend) * 100
END, 2) AS yoy_rate
FROM spend_with_prev
ORDER BY product_id, year;
yearly_spend
CTE extracts the year from transaction_date
and calculates the total spend (curr_year_spend
) for each product by year.spend_with_prev
CTE uses the LAG()
window function to retrieve the previous year's spend for each product. This function helps in comparing the current year's spend with the previous year's spend by partitioning the data by product_id
and ordering it by year.SELECT
computes the YoY growth rate. The CASE
statement ensures that if there is no previous year's spend, the YoY rate is shown as NULL
. The ROUND()
function rounds the YoY growth rate to 2 decimal places. Results are ordered by product_id
and year in ascending order.Write a query to find the maximum number of Amazon Prime and non-Prime batches that can be stored in Amazon’s 500,000 square feet warehouse, prioritizing Prime items.
You are given an inventory
table with item_id
, item_type
, item_category
, and square_footage
columns. Return the item_type
with Prime-eligible items listed first, followed by non-Prime items, along with the maximum number of batches that can be stocked.
WITH summary AS (
-- Summarize total square footage and item count for both prime_eligible and not_prime items
SELECT item_type,
SUM(square_footage) AS total_sqft,
COUNT(*) AS item_count
FROM inventory
GROUP BY item_type
),
prime_occupied_area AS (
-- Calculate the number of prime batches we can store and how much space they occupy
SELECT
item_type,
total_sqft,
FLOOR(500000 / total_sqft) AS prime_batch_count,
FLOOR(500000 / total_sqft) * item_count AS prime_item_count
FROM summary
WHERE item_type = 'prime_eligible'
)
-- Output the maximum number of prime and non-prime items that can be stored
SELECT
item_type,
CASE
-- For prime_eligible, calculate maximum items using the full available warehouse space
WHEN item_type = 'prime_eligible' THEN (FLOOR(500000 / total_sqft) * item_count)
-- For not_prime, calculate remaining space and determine maximum items fitting in that space
WHEN item_type = 'not_prime' THEN FLOOR((500000 - (SELECT prime_batch_count * total_sqft FROM prime_occupied_area)) / total_sqft) * item_count
END AS item_count
FROM summary
ORDER BY item_type DESC;
prime_eligible
and not_prime
items, grouping by item_type
from the inventory
table.SUM(square_footage)
calculates the total square footage taken by all items of a given item_type
(e.g., total space used by all prime_eligible
items).prime_occupied_area
CTE calculates the space occupied by Prime-eligible items and determines how many full Prime item batches can be stored in the warehouse, given the total warehouse size of 500,000 square feet.item_type
.You are given a search_frequency
table with the following columns: searches
and num_users
. Find the median number of searches made by a user last year.
WITH searches_expanded AS (
SELECT searches
FROM search_frequency
CROSS JOIN GENERATE_SERIES(1, num_users)
)
SELECT
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY searches)::numeric, 1) AS median
FROM searches_expanded;
searches_expanded
CTE implements CROSS JOIN GENERATE_SERIES(1, num_users)
which generates a series of numbers from 1 to num_users
(inclusive). For each unique value in the search_frequency
table, it generates a row for every user.PERCENTILE_CONT(0.5)
computes the median of the searches
column.WITHIN GROUP (ORDER BY searches)
specifies the ordering of the data to compute the percentile. It orders the searches
column before calculating the median.::numeric
casts the result of PERCENTILE_CONT
to a numeric type because the ROUND()
function expects a numeric type.You are provided with advertiser
and daily_pay
tables from Facebook. The advertiser
table has information about advertisers, while the daily_pay
table has information about their payments. Write a query to update the payment status for Facebook advertisers using data from the daily_pay
table. The results should display the user ID and their updated payment status, sorted by user ID.
Advertisers' payment statuses can be classified into these categories:
SELECT
COALESCE(advertiser.user_id, daily_pay.user_id) AS user_id,
CASE
WHEN daily_pay.paid IS NULL THEN 'CHURN'
WHEN daily_pay.paid IS NOT NULL AND advertiser.status IN ('NEW', 'EXISTING', 'RESURRECT') THEN 'EXISTING'
WHEN daily_pay.paid IS NOT NULL AND advertiser.status = 'CHURN' THEN 'RESURRECT'
WHEN daily_pay.paid IS NOT NULL AND advertiser.status IS NULL THEN 'NEW'
END AS new_status
FROM advertiser
FULL OUTER JOIN daily_pay
ON advertiser.user_id = daily_pay.user_id
ORDER BY user_id;
advertisers
and payments
using a FULL OUTER JOIN
on user_id
to ensure the inclusion of all advertisers and payments, even if there are no corresponding matches in the other table.user_id
from either table using COALESCE
.paid
is NULL
, the status is set to CHURN.NULL
), the updated status is NEW.user_id
.You are given a pizza_toppings
table with the following columns: topping_name
and ingredient_cost
.
Write a query to find all possible three-topping combinations and their total cost, sorted by the highest total cost.
Ingredients should be listed in alphabetical order.
WITH pizza_combinations AS (
-- Generate combinations of 3 toppings
SELECT
t1.topping_name AS topping1,
t2.topping_name AS topping2,
t3.topping_name AS topping3,
t1.ingredient_cost AS cost1,
t2.ingredient_cost AS cost2,
t3.ingredient_cost AS cost3
FROM pizza_toppings t1
JOIN pizza_toppings t2 ON t1.topping_name < t2.topping_name
JOIN pizza_toppings t3 ON t2.topping_name < t3.topping_name
),
pizza_costs AS (
-- Calculate total cost and format the pizza name
SELECT
CONCAT(topping1, ',', topping2, ',', topping3) AS pizza,
(cost1 + cost2 + cost3) AS total_cost
FROM pizza_combinations
)
SELECT pizza, total_cost
FROM pizza_costs
ORDER BY total_cost DESC, pizza;
pizza_combinations
CTE performs a self-join to generate all possible unique combinations of three different toppings.<
to avoid repeating the same combination in a different order.pizza_costs
CTE concatenates the toppings into a pizza name and calculates the total cost.SELECT
retrieves the pizza name and total cost, ordering results first by the highest total cost and then alphabetically by pizza name.You are given employee
and salary
tables. You are tasked with finding out how the average salary in each department compares to the company's overall average salary for March 2024.
Provide the comparison results as 'higher', 'lower', or 'same' for each department.
Include the department ID, payment month (formatted as MM-YYYY), and the comparison result in the output.
WITH company_avg AS (
-- Calculate the company's overall average salary for March 2024
SELECT AVG(amount) AS avg_salary
FROM salary
WHERE EXTRACT(MONTH FROM payment_date) = 3 AND EXTRACT(YEAR FROM payment_date) = 2024
),
department_avg AS (
-- Calculate the average salary per department for March 2024
SELECT e.department_id, AVG(s.amount) AS avg_salary
FROM employee e
JOIN salary s ON e.employee_id = s.employee_id
WHERE EXTRACT(MONTH FROM s.payment_date) = 3 AND EXTRACT(YEAR FROM s.payment_date) = 2024
GROUP BY e.department_id
),
comparison AS (
-- Compare each department's average salary with the company's overall average salary
SELECT d.department_id, '03-2024' AS payment_date,
CASE
WHEN d.avg_salary > c.avg_salary THEN 'higher'
WHEN d.avg_salary < c.avg_salary THEN 'lower'
ELSE 'same'
END AS comparison
FROM department_avg d CROSS JOIN company_avg c
)
SELECT department_id, payment_date, comparison
FROM comparison
ORDER BY department_id;
company_avg
CTE calculates the company's overall average salary for March 2024.department_avg
CTE calculates the average salary per department for March 2024.comparison
CTE compares each department's average salary with the company's average salary.CROSS JOIN
ensures that each department's salary is compared against the company's overall average.CASE
statement determines whether the department’s salary is higher, lower, or same compared to the company's average.You are given a monthly_cards_issued
table from JPMorgan Chase. The table has the following columns: issue_month
, issue_year
, card_name
, and issued_amount
.
Find the name of the credit card and the number of cards issued during its launch month. The launch month is the earliest record in the monthly_cards_issued
table.
The results should be ordered from the largest issued amount.
WITH ranked_cards AS (
-- Assign a rank to each card based on the issue date (earliest issue_month and issue_year)
SELECT card_name, issue_month, issue_year, issued_amount,
ROW_NUMBER() OVER (PARTITION BY card_name ORDER BY issue_year, issue_month) AS rn
FROM monthly_cards_issued
)
-- Select only the launch month (rank = 1) records
SELECT card_name, issued_amount
FROM ranked_cards
WHERE rn = 1
ORDER BY issued_amount DESC;
ranked_cards
CTE assigns a row number to each card based on the issue date, using ROW_NUMBER()
to rank the earliest issue dates.PARTITION BY card_name
ensures row numbers are assigned independently for each card.WHERE rn = 1
clause filters for the first (launch) month for each card.SELECT
retrieves the card name and the number of cards issued during the launch, ordered by the highest issued amount.Finally, let’s look at performance optimization queries.
WHERE
clause. This means the index only includes rows that satisfy a certain condition, unlike a full index, which includes every table row.CREATE INDEX idx_recent_orders
ON orders (customer_id, order_date)
WHERE order_date >= NOW() - INTERVAL '1 month';
Write a query to analyze the query execution in question 25 (see above).
EXPLAIN ANALYZE
command.EXPLAIN ANALYZE
WITH ranked_cards AS (
-- Assign a rank to each card based on the issue date (earliest issue_month and issue_year)
SELECT
card_name,
issue_month,
issue_year,
issued_amount,
ROW_NUMBER() OVER (PARTITION BY card_name ORDER BY issue_year, issue_month) AS rn
FROM
monthly_cards_issued
)
-- Select only the launch month (rank = 1) records
SELECT
card_name,
issued_amount
FROM
ranked_cards
WHERE
rn = 1 -- Filter to only include the launch month (the earliest)
ORDER BY
issued_amount DESC; -- Order by issued amount in descending order
The output obtained is:
Sort (cost=21.30..21.31 rows=1 width=520) (actual time=0.047..0.048 rows=2 loops=1)
Sort Key: ranked_cards.issued_amount DESC
Sort Method: quicksort Memory: 25kB
-> Subquery Scan on ranked_cards (cost=16.39..21.29 rows=1 width=520) (actual time=0.037..0.044 rows=2 loops=1)
Filter: (ranked_cards.rn = 1)
Rows Removed by Filter: 7
-> WindowAgg (cost=16.39..19.54 rows=140 width=536) (actual time=0.035..0.042 rows=9 loops=1)
-> Sort (cost=16.39..16.74 rows=140 width=528) (actual time=0.028..0.029 rows=9 loops=1)
Sort Key: monthly_cards_issued_174.card_name, monthly_cards_issued_174.issue_year, monthly_cards_issued_174.issue_month
Sort Method: quicksort Memory: 25kB
-> Seq Scan on monthly_cards_issued_174 (cost=0.00..11.40 rows=140 width=528) (actual time=0.014..0.016 rows=9 loops=1)
Planning Time: 0.239 ms
Execution Time: 0.079 ms
The output from the EXPLAIN ANALYZE
query provides insight into how PostgreSQL executes the query, showing each step, the cost estimates, and actual execution times:
ranked_cards
sub-query took around 0.037 to 0.044 milliseconds.ROW_NUMBER()
window function calculation took 0.035 to 0.042 milliseconds.Write an optimized version of the query in Question 24.
WITH
-- Calculate the company's overall average salary for March 2024
company_avg AS (
SELECT
AVG(amount) AS avg_salary
FROM
salary
WHERE
payment_date >= '2024-03-01'
AND payment_date < '2024-04-01'
),
-- Calculate the average salary per department for March 2024
department_avg AS (
SELECT
e.department_id,
AVG(s.amount) AS avg_salary
FROM
employee e
JOIN
salary s ON e.employee_id = s.employee_id
WHERE
s.payment_date >= '2024-03-01'
AND s.payment_date < '2024-04-01'
GROUP BY
e.department_id
),
-- Compare each department's average salary with the company's average salary
comparison AS (
SELECT
d.department_id,
'03-2024' AS payment_date,
CASE
WHEN d.avg_salary > c.avg_salary THEN 'higher'
WHEN d.avg_salary < c.avg_salary THEN 'lower'
ELSE 'same'
END AS comparison
FROM
department_avg d
CROSS JOIN
company_avg c
)
-- Select the results
SELECT
department_id,
payment_date,
comparison
FROM
comparison
ORDER BY
department_id;
EXTRACT(MONTH FROM payment_date)
with Date Range Filtering, the database can use any potential index in the payment_date
column.payment_date
, employee.employee_id
, and salary.employee_id
.Hopefully, these questions have given you a glimpse into what to expect in your data engineering interviews.
Good luck with your upcoming interview!
Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.
Create your free account