Top 25 SQL Data Science Interview Questions

Data Science
Exponent TeamExponent TeamLast updated

SQL is a fundamental topic in data science interviews at all levels. Querying, manipulating, and interpreting data are key skills for any data scientist.

For junior roles, you will likely be tested on basic querying and aggregation concepts. For more senior interviews, you should be familiar with advanced concepts like window functions, CTEs, and complex data transformations.

Below, we've compiled a list of example SQL interview questions indicative of the types of questions you can expect in real data science interviews. We created this list with input from data analysts, data scientists, and machine learning engineers at Meta, Google, Dropbox, Amazon, and top consulting firms.

Each question includes a sample answer and explanation of the underlying SQL concepts.

👋
This guide contains excerpts from Exponent's complete data science interview course and SQL interview course.

Sneak peek:
- Watch a senior SQL developer answer: "Explain SQL stored procedures."
- Watch a Meta data scientist answer: "Find employees who earn more than their managers."
- Watch a venture capitalist answer: "Calculate e-commerce orders from yesterday."

This guide was written and compiled by Derrick Mwiti, a senior data scientist and course instructor.

Key Takeaways

  • Junior Data Scientists: Expect SQL questions on basic querying, syntax, and data retrieval using functions like WHERE, INNER JOIN, ORDER BY, GROUP BY, and aggregations like COUNT(), AVG(), and MAX().
  • Mid-level Candidates: Know different types of joins, nested and correlated subqueries, CASE WHEN, date and time manipulation, WITH clauses, UNION, INTERSECT, EXCEPT, and string functions like CONCAT() and REPLACE().
  • Senior Candidates: Be comfortable with window functions, pivot tables, data transformation, normalization, and handling complex data use-cases.

Beginner SQL

Early in your data career, your SQL interviews will likely be focused on basic querying, syntax, and data retrieval, such as:

Here are some examples of common SQL questions you might hear:

1. Employees Earning More Than Managers

You are given an employee table with the following columns: employee_id, name, salary, department_id, and manager_id.

Identify the employees who earn more than their direct managers. The expected output is: employee_id, employee_name.

Here is a sample solution:

SQL
SELECT e.employee_id, e.name AS employee_name FROM employee e JOIN employee m ON e.manager_id = m.employee_id WHERE e.salary > m.salary;
  • First, the employee table is aliased as e for employees and m for managers.
  • Managers and employees are joined by the formula e.manager_id = m.employee_id.
  • Employees whose salary exceeds that of their manager are filtered out by the WHERE clause e.salary > m.salary.
  • The employees who meet the condition have their employee_name and employee_id retrieved using SELECT.
ℹ️
Practice this question: Employee Earnings

2. Average Unmatched Bookings on Lyft

You work for Lyft, a ride-sharing company and are tasked with finding the average number of unmatched bookings per user. The expected output is: user_id, user_name, email, and avg_unmatched_bookings (rounded to the nearest 2 decimal places).

You have a users table with user_id, user_name, email. You also have a bookings table with booking_id, user_id, driver_id, booking_time, status.

Here's a sample solution:

SQL
SELECT u.user_id, u.user_name, u.email, COALESCE(ROUND(AVG(CASE WHEN b.status = 'Unmatched' THEN 1 ELSE 0 END), 2), 0) AS avg_unmatched_bookings FROM users u LEFT JOIN bookings b ON u.user_id = b.user_id GROUP BY u.user_id, u.user_name, u.email;
  • First, use the LEFT JOIN to combine the users table with the bookings table on the user_id column. This ensures that even users with no bookings are included in the result.
  • Then, use the CASE statement within the AVG() function to compute the average number of bookings for each user. Unmatched bookings are counted as 1, or 0 otherwise.
  • Use the COALESCE function together with the ROUND function to ensure there are no null values. Then, the average is rounded to two decimal places.
ℹ️
Practice this question: Lyft Ride Requests

3. Successful Posts Per User Type

Show the total amount of successful posts a particular user_type made in November 2023. Then, order your output by descending success rate.

Your output should include: user_type, post_success (number of successful posts), post_attempt (number of total posts), and post_success_rate (ranging from 0.00 to 1.00).

You've been given a post table with post_id, post_date, user_id, interface, and is_successful_post. You also have a post_user table with user_id, user_type, and age.

A sample solution looks like:

SQL
SELECT pu.user_type, COUNT(CASE WHEN p.is_successful_post = 1 THEN 1 END) AS post_success, COUNT(p.post_id) AS post_attempt, ROUND(COUNT(CASE WHEN p.is_successful_post = 1 THEN 1 END) * 1.0 / COUNT(p.post_id), 2) AS post_success_rate FROM post p JOIN post_user pu ON p.user_id = pu.user_id WHERE p.post_date >= '2023-11-01' AND p.post_date < '2023-12-01' GROUP BY pu.user_type ORDER BY post_success_rate DESC;
  • Use the user_id column to join the post table with the post_user table to get the user type for each post.
  • Use the WHERE function to filter posts from November 2023.
  • Count the number of successful posts using COUNT(CASE WHEN p.is_successful_post = 1 THEN 1 END).
  • Round the answer to two decimal places using ROUND. To guarantee that the division function is performed in floating-point, multiply the result by 1.0.
  • Arrange the results using GROUP BY pu.user_type.
  • Sort post_success_rate in descending order with DESC.

4. Average Duration of a Marketing Campaign

Calculate the average duration of marketing campaigns, given three tables.

First, you have a campaign table with an upsell_campaign_id, date_start, and date_end. Then, you have a user table with user_id, name, and is_eligible_for_upsell_campaign. Finally, there's a transaction table withtransaction_id, user_id, product_id, transaction_date, and quantity.

Here is a sample solution:

SQL
SELECT AVG(DATEDIFF(date_end, date_start)) AS avg_campaign_length FROM campaign;
  • Find the difference between the start date and end date of a campaign using DATEDIFF.
  • Then, calculate the average using AVG.

5. Total Viewership on Laptops and Mobile Devices

Imagine you work at Netflix. Calculate the total viewership from laptops compared to mobile devices.

There are three types of devices that viewers use:phone, tablet, and laptop. Mobile viewership is computed by summing the tablet and phone viewership. You also have a viewership table with user_id, device_type, and view_time.

Here's a sample solution:

SQL
SELECT SUM(CASE WHEN device_type = 'laptop' THEN 1 ELSE 0 END) AS laptop_views, SUM(CASE WHEN device_type IN ('phone', 'tablet') THEN 1 ELSE 0 END) AS mobile_views FROM viewership;
  • Use SUM with CASE statements to conditionally count the number of viewership records for each device_type.

6. Average Star Rating of Product by Month

Compute the average star rating for each product grouped by the month given a reviews table with these columns: review_id, user_id, submit_date, product_id, and stars.

Display the month as a numerical value, product ID, and average star rating rounded to two decimal places. Sort the results by the first month followed by the product ID.

For example:

SQL
SELECT EXTRACT(MONTH FROM submit_date) AS mth, product_id, ROUND(AVG(stars), 2) AS avg_stars FROM reviews GROUP BY EXTRACT(MONTH FROM submit_date), product_id ORDER BY mth, product_id;
  • First, EXTRACT the month from the submit_date column.
  • Then, ROUND computes the average to 2 decimal places.
  • Finally, GROUP BY and ORDER BY order the data by month and product ID.

7. Discount Product Catalog

You work for Amazon. The company likes to provide seasonal discounts. This year, electronics are discounted at 10%, clothing at 20%, groceries at 5%, and books at 15%.

You are given a products table with name, type, and price. Generate a discounted product catalog that displays the name, type, and discounted_price for each item. Round the discounted prices to the nearest cent.

SQL
SELECT name, type, ROUND(price * CASE WHEN type = 'Electronic' THEN 0.90 WHEN type = 'Clothing' THEN 0.80 WHEN type = 'Grocery' THEN 0.95 WHEN type = 'Book' THEN 0.85 ELSE 1.00 END, 2) AS discounted_price FROM products;
  • CASE applies the appropriate discount based on the product. 
  • ROUND (..., 2)  rounds the price to the nearest cent. 
  • SELECT picks the name, type, and discounted price.

8. Differences in Credit Cards Issued

You work for a bank like Capital One that wants to launch a new credit card. You are given a table with the number of credit cards issued each month that includes card_name, issued_amount, issue_month, and issue_year.

Output the name of each credit card and the difference in the total number of credit cards between the highest and lowest issuance month. Arrange the output based on the highest disparity.

SQL
SELECT card_name, MAX(issued_amount) - MIN(issued_amount) AS difference FROM monthly_cards_issued GROUP BY card_name ORDER BY difference DESC;
  • MAX(issued_amount) - MIN(issued_amount) computes the difference between the highest and lowest issuance amounts for each card.
  • GROUP BY card_name groups the results by card name to calculate the difference for each card.
  • ORDER BY difference DESC sorts the results by the difference in descending order.

9. Transactions By City

Fetch the number of transactions made by users in specific cities. Then, order by descending number of transactions.

You're given 2 tables. First, a users table with id, first_name, last_name, user_city, and email. You also have a transactions table with id, customer_id, product_id, currency_code, date, and amount.

Here's a sample solution:

SQL
SELECT u.user_city, COUNT(t.id) AS number_of_transactions FROM users u JOIN transactions t ON u.id = t.customer_id GROUP BY u.user_city ORDER BY number_of_transactions DESC;
  • The users table is joined with the transactions table on the customer_id to get the transactions for each user.
  • COUNT(t.id) counts the number of transactions for each user city.
  • The results are grouped by user_city to aggregate the transaction counts per city.
  • The results are ordered by the number_of_transactions in descending order.
ℹ️
Practice this question: Sales by Customer City

Intermediate SQL

Data scientists use SQL to clean, transform, and preprocess data. As your data science career advances, so too will your SQL interviews.

Your roles will begin to require you to be able to process, manipulate, and summarize larger datasets.

For this reason, mid-level data scientists should expect to get asked SQL questions about:

  • Using different types of joins in more complex scenarios.
  • Writing nested and correlated subqueries.
  • Using functions like ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), and LAG().
  • Using CASE WHEN for conditional logic in queries.
  • Manipulating and calculating dates and times.
  • Using WITH clauses to define temporary result sets.
  • Using UNION, INTERSECT, and EXCEPT to combine results from multiple queries.
  • Manipulating string data using functions like CONCAT(), SUBSTRING(), REPLACE().
ℹ️
At every interview level, you should first understand the data, ask clarifying questions, discuss your approach out loud, and then implement your code.

Here are some questions you can practice:

10. Revenue by Country

Given a table called purchase_orders, with quantity and price columns, compute the total_revenue. Group the total revenue by country, then order by the total revenue.

Here's a sample solution:

SQL
SELECT SUM(quantity * price) AS total_revenue FROM purchase_orders GROUP BY country ORDER BY total_revenue DESC;
  • The table doesn’t have a total revenue column. Compute it by multiplying the quantity and price columns.
  • SUM them to find the total revenue.

11. Categorize Orders by Size

Given a table of orders with quantity, categorize each order into "small" and "large" orders with a new size column using the CASE WHEN function.

Large orders have quantity >= 2.

SQL
SELECT *, CASE WHEN quantity >= 2 THEN 'large' ELSE 'small' END AS size FROM orders;
  • CASE WHEN is similar to if-then-else statements. It's used to evaluate one or more conditions and returns a value based on its evaluation.

12. Orders Per Month

Given a table of orders with timestamps in the order_date column, aggregate and return the number of orders per month.

SQL
SELECT DATE(order_date, 'start of month') AS order_month, COUNT(*) AS no_orders FROM orders GROUP BY order_month;
  • For SQLite, use DATE(column, 'start of month').
  • For PostgreSQL, use DATE_TRUNC('month', column).
  • For MySQL, use EXTRACT(MONTH FROM column).

13. Retention Based on Signup and Churn Dates

Imagine you work at a company with monthly subscriptions. Given a table of users, compute the average retention based on the month of the signup_date and churn_date. Group the results by signup_month.

SQL
SELECT date_trunc('month', signup_date) AS signup_month, AVG(datediff('day', signup_date, churn_date)) AS avg_retention FROM users GROUP BY signup_month;
  • Find the difference in days between the signup_date and the churn_date using datediff while also finding the signup month.

14. Orders with Customer Data from March

You're given tables for customers and orders. There's a foreign key customer_id, which references customers.id. Return all orders with their associated customer data from March 2022.

Here's a sample solution:

SQL
SELECT orders.id as order_id, customers.* FROM orders INNER JOIN customers ON orders.customer_id = customers.id WHERE orders.order_date BETWEEN '2022-03-01' and '2022-04-01';
  • orders.id as order_id renames the id column from the orders table to order_id for clarity.customers.* selects all columns from the customers table.
  • INNER JOIN joins the orders and customers tables. The join condition is that the customer_id in the orders table matches the id in the customers table.
  • WHERE and BETWEEN filter orders between March and April of 2022.

15. Customers with At Least One Order Using Subqueries

Given customers and orders tables with id and customer_id columns, use subqueries to return all customers who have at least one order.

Here's a sample solution:

SQL
SELECT * FROM customers WHERE (SELECT COUNT(*) FROM orders WHERE customers.id = orders.customer_id) > 0;
  • The subquery SELECT COUNT() counts the number of orders for each customer.
  • The WHERE clause ensures that only customers with at least one order are counted.

Alternatively:

SQL
SELECT * FROM customers WHERE id IN (SELECT customer_id FROM orders);
  • The subquery SELECT customer_id FROM orders retrieves all customer_id values from the orders table.
  • The main query selects all columns from the customers table where the id matches any of the customer_id values returned by the subquery.
ℹ️

16. Customers with At Least One Order Using CTEs

Solve the same problem as above. This time, use Common Table Expressions (CTEs) instead of subqueries.

CTEs are used for breaking down complex queries into more manageable parts and are similar to subqueries, however they are defined before the main query. Subqueries, on the other hand, are defined within the main query.

CTEs can be referenced multiple times in a single query, whereas subqueries can't. However, subqueries are more efficient because the database engine easily optimizes them.

Here's a sample solution:

SQL
WITH customer_orders AS ( SELECT customer_id, COUNT(*) as num_orders FROM orders GROUP BY customer_id ) SELECT customers.* FROM customers JOIN customer_orders ON customers.id = customer_orders.customer_id WHERE customer_orders.num_orders > 0;
  • The WITH keyword is used to define the CTE. The AS keyword is used to define the query for populating it. 
  • The name of the CTE is customer_orders. The CTE is then used in the main query as any other table.
  • The customer_orders CTE in the main query finds customers with at least one order.

17. Orders Ranked by Total Price

Given an orders table with price and quantity columns, write a query to return a list of orders ranked by total price.

A sample answer might look like:

SQL
SELECT orders.*, SUM(quantity * price) AS total_price, RANK() OVER (ORDER BY SUM(quantity * price) DESC) AS rank FROM orders GROUP BY id, customer_id;
  • Multiply the price and quantity to get the total price.
  • Use the RANK window function. It's general syntax is:RANK() OVER ( [PARTITION BY partition_expression] [ORDER BY order_expression] )
  • The PARTITION BY and ORDER BY clauses determine how the rows will be partitioned and ordered for processing by the RANK function.
  • The RANK function assigns the same rank when there is a tie.
  • DENSE RANK doesn’t skip rank values for ties. If two rows have the same value, they will receive the same rank, but the next row will get the next rank instead of skipping the rank value like in RANK.

Advanced SQL

Senior data scientists are trusted to handle extremely large datasets.

Large companies like Meta, Microsoft, TikTok, and Google need data scientists who can clean, synthesize, and interpret large amounts of information.

In combination with machine learning, senior scientists help to make sense of trends and make predictions for the future.

Before stepping into a senior SQL interview in your data science loops, consider brushing up on concepts like:

  • Window functions like RANK, DENSE RANK, ROW_NUMBER, and LAG for complex calculations and analytics.
  • Efficient queries and and execution plans.
  • Using recursive CTEs for hierarchical or recursive data structures.
  • Transforming rows into columns with pivot tables.
  • Implementing intricate data transformations and aggregations.
  • Understanding and using transactions, including COMMIT, ROLLBACK, and SAVEPOINT.
  • Understanding normalization, indexing, and schema design principles.

Here are some questions you can practice:

18. Top Earning Employee by Department

Write a SQL query to fetch the top-earning employee by department, ordered by department name.

You've been given the following schema.

employees                             projects
+---------------+---------+           +---------------+---------+
| id            | int     |<----+  +->| id            | int     |
| first_name    | varchar |     |  |  | title         | varchar |
| last_name     | varchar |     |  |  | start_date    | date    |
| salary        | int     |     |  |  | end_date      | date    |
| department_id | int     |--+  |  |  | budget        | int     |
+---------------+---------+  |  |  |  +---------------+---------+
                             |  |  |
departments                  |  |  |  employees_projects
+---------------+---------+  |  |  |  +---------------+---------+
| id            | int     |<-+  |  +--| project_id    | int     |
| name          | varchar |     +-----| employee_id   | int     |
+---------------+---------+           +---------------+---------+

Your answer should be in the following format:

department_name | employee_id | first_name | last_name | salary 
----------------+-------------+------------+-----------+--------
varchar         | int         | varchar    | varchar   | int

Here's a sample solution:

SQL
WITH ranked_employees AS ( SELECT e.id AS employee_id, e.first_name, e.last_name, e.salary, d.name AS department_name, ROW_NUMBER() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS rank FROM employees e JOIN departments d ON e.department_id = d.id ) SELECT department_name, employee_id, first_name, last_name, salary FROM ranked_employees WHERE rank = 1 ORDER BY department_name;
  • The CTE ranked_employees ranks employees within each department based on their salary in descending order.
  • The ROW_NUMBER() function is used to assign a rank to each employee within their department.
  • The main query selects the top-ranked employee (rank = 1) from each department. This results in only one top earner in each department.
  • The employees table is joined with the departments table to get department _name. The result is then ordered by department name.
ℹ️
Practice this question: Top Salaries by Department

19. Group Users by Number of Tweets in 2022

Given a tweets table with tweet_id, user_id, msg, and tweet_date, group the users by the number of tweets they posted in 2022. Count the number of users in each group.

SQL
WITH tweet_cte AS ( SELECT user_id, COUNT(*) as tweet_bucket FROM tweets WHERE EXTRACT(year from tweet_date) = 2022 GROUP BY user_id ) SELECT tweet_bucket, COUNT(*) as users_num FROM tweet_cte GROUP BY tweet_bucket;
  • The tweet_cte counts tweets per user for 2022, resulting in user_id and tweet_bucket.
  • The main query then groups and counts the users by tweet_bucket.

20. Monthly Active Users in July 2022

You have a table of Facebook user_actions. Obtain the number of monthly active users in July 2022, including the month in numerical format "1, 2, 3."

The user_actions table has user_id, event_id, event_type, and event_date.

Here's a sample solution:

SQL
WITH june_actions AS ( SELECT DISTINCT user_id FROM user_actions WHERE event_date BETWEEN '2022-06-01' AND '2022-06-30' AND event_type IN ('sign-in', 'like', 'comment') ), july_actions AS ( SELECT DISTINCT user_id FROM user_actions WHERE event_date BETWEEN '2022-07-01' AND '2022-07-31' AND event_type IN ('sign-in', 'like', 'comment') ) SELECT 7 AS month, COUNT(*) AS monthly_active_users FROM july_actions WHERE user_id IN (SELECT user_id FROM june_actions);
  • The june_actions and july_actions CTEs select distinct user_ids who performed relevant actions in June and July 2022, respectively.
  • The main query counts the number of users who appear in both june_actions and july_actions, thus identifying users active in both months.
ℹ️
Watch an Adobe data scientist answer, "Analyze monthly customer transactions."

21. Success Rate of Post When Previous Post Failed

Given post and post_user tables, find the success rate of posts (%) when the user's previous post had failed.

The user table contains post_id, post_date, user_id, interface, and is_successful_post. The post_user table contains user_id, user_type, and age.

Your output should have the following columns: user_id and next_post_sc_rate (success rate of the post when the user’s previous post had failed). Order results by increasing next_post_sc_rate.

Here's a sample solution:

SQL
WITH post_seq AS ( SELECT p.user_id, p.post_id, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY post_date) AS post_seq_id, is_successful_post FROM post as p ), post_pairings AS ( SELECT ps.user_id, ps.post_seq_id AS fail_post_id, ps.post_seq_id + 1 AS next_post_id FROM post_seq AS ps WHERE ps.is_successful_post = 0 ) SELECT pp.user_id, ROUND(SUM(p2.is_successful_post) * 1.0 / COUNT(p2.is_successful_post), 2) AS next_post_sc_rate FROM post_pairings AS pp JOIN post AS p2 ON pp.next_post_id = p2.post_id GROUP BY 1 ORDER BY next_post_sc_rate ASC;
  • The post_seq CTE assigns a sequential ID post_seq_id to each post per user based on the post_date.
  • The post_pairings CTE identifies pairs of posts where the previous post was unsuccessful. The next_post_id is the sequential ID of the post following the unsuccessful post.
  • The final SELECT joins the post_pairings with the post table to get details about the next post. It computes the success rate of the next posts following an unsuccessful post. 
  • SUM() calculates the ratio of successful next posts to the total number of next posts.
  • ROUND(..., 2) rounds the success rate to two decimal places.
  • GROUP BY groups by user_id and orders the results by next_post_sc_rate in ascending order.

22. Identify Repeat Payments

Identify any payments made at the same merchant using the same credit card with the same amount within 10 minutes of each other. Such transactions should be considered as repeat payments.

The first transaction of such payments shouldn't be considered as a repeat payment. This means that if there are three transactions by a merchant with the same credit card, for the same amount within 10 minutes, only 2 payments will be considered as repeated.

The transactions table contains the following columns: transaction_id, merchant_id, credit_card_id, amount, and transaction_timestamp.

Here's a sample solution:

SQL
WITH PreviousTransactions AS ( SELECT transaction_id, merchant_id, credit_card_id, amount, transaction_timestamp, LAG(transaction_timestamp) OVER ( PARTITION BY merchant_id, credit_card_id, amount ORDER BY transaction_timestamp ) AS previous_transaction FROM transactions ), RepeatPayments AS ( SELECT transaction_id, merchant_id, credit_card_id, amount, transaction_timestamp FROM PreviousTransactions WHERE previous_transaction IS NOT NULL AND transaction_timestamp &lt;= previous_transaction + INTERVAL '10 MINUTE' ) SELECT COUNT(*) AS payment_count FROM RepeatPayments;
  • The PreviousTransactions CTE uses LAG() to get the timestamp of the previous transaction for each partition of merchant_id, credit_card_id, and amount. It orders them by transaction_timestamp to ensure that LAG() correctly identifies the previous transaction.
  • RepeatPayments filters transactions where there is a valid previous_transaction and the transaction_timestamp of the current transaction is within 10 minutes of the previous_transaction.

23. Top Customers by Order Count in Last 5 Years

Given orders, departments, and customers tables, identify the customers who placed the most orders for each of the last 5 years.

The orders table has order_id, customer_id, order_date, order_amount, and department_id. The departments table has department_id and department_name. And the customers table has customer_id, first_name, and last_name.

Output a list of the 5 top customers. Your output should have the following columns: customer_id, first_name, last_name, and years.

SQL
WITH orders_per_year AS ( SELECT c.customer_id, c.first_name, c.last_name, EXTRACT(YEAR FROM o.order_date) AS years, -- PostgreSQL function for extracting the year COUNT(o.order_id) AS total_orders, rank() OVER (PARTITION BY EXTRACT(YEAR FROM o.order_date) ORDER BY COUNT(o.order_id) DESC) AS order_rankings FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE EXTRACT(YEAR FROM o.order_date) IN (2019, 2020, 2021, 2022, 2023) GROUP BY EXTRACT(YEAR FROM o.order_date), c.customer_id, c.first_name, c.last_name ) SELECT first_name, last_name, total_orders, years FROM orders_per_year WHERE order_rankings = 1 ORDER BY years ASC;

24. Calculate Total Server Uptime

A cloud service provider like AWS wants to optimize the usage of its fleet of servers. Calculate the total time that the fleet of servers was running in days. The total time a server fleet is running is the sum of the server's uptime.

A server can start and stop several times. The server_utilization table has the following columns: server_id, status_time, and session_status (start, stop).

The expected output is total_uptime_days.

SQL
WITH SessionDurations AS ( SELECT server_id, status_time AS start_time, LEAD(status_time) OVER (PARTITION BY server_id ORDER BY status_time) AS stop_time, session_status FROM server_utilization ), FilteredSessions AS ( SELECT server_id, start_time, stop_time FROM SessionDurations WHERE session_status = 'start' AND stop_time IS NOT NULL ), CalculatedDurations AS ( SELECT server_id, EXTRACT(EPOCH FROM (stop_time - start_time)) AS uptime_seconds FROM FilteredSessions ) SELECT ROUND(SUM(uptime_seconds) / (24 * 60 * 60), 0) AS total_uptime_days FROM CalculatedDurations;
  • In the SessionDurations CTE, LEAD pairs each start time with the next stop time for the same server_id.
  • In FilteredSessions CTE, we filter rows where session_status = 'start' and stop_time is NOT NULL.
  • The CalculatedDurations CTE gets the duration of each session in seconds. 
  • The final SELECT sums the uptime_seconds and converts them to days by dividing by 24 * 60 * 60.

25. Top 2 Players by Highest Score

You work for a leading game development company where players can team up and compete. Each player's performance in different game sessions is recorded as distinct score entries in the database.

You're provided a players table with player_id, player_name, and team_id columns and a scores table with score_id, player_id, and game_score.

Return the top 2 players from each team based on their single highest score across all sessions. If multiple players share the same highest score, include all of them, which may result in more than two top players for some teams.

Here's an example solution:

SQL
WITH PlayerMaxScores AS ( SELECT p.team_id, p.player_name, MAX(s.game_score) AS max_score FROM players p JOIN scores s ON p.player_id = s.player_id GROUP BY p.team_id, p.player_name ), RankedPlayers AS ( SELECT team_id, player_name, max_score, DENSE_RANK() OVER (PARTITION BY team_id ORDER BY max_score DESC) AS rank FROM PlayerMaxScores ) SELECT team_id, player_name, max_score FROM RankedPlayers WHERE rank &lt;= 2 ORDER BY team_id, max_score DESC, player_name;
  • The PlayerMaxScores CTE aggregates the maximum score for each player. 
  • The DENSE_RANK() window function in the RankedPlayers CTE assigns a rank to each player within their team based on their maximum score.
  • DENSE_RANK() ensures that players with the same score get the same rank. 
  • The final SELECT picks the top two players from each team.

SQL Interview Tips

It is impossible to cover all the possible questions since SQL is a broad and varied technology!

Hopefully, these questions have given you a glimpse into what to expect in your data science interviews.

Good luck with your upcoming interview!

Learn everything you need to ace your data science interviews.

Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.

Create your free account