Top 25 SQL Data Science Interview Questions

SQL is a fundamental topic in data science interviews at all levels. Querying, manipulating, and interpreting data are key skills for any data scientist.

For junior roles, you will likely be tested on basic querying and aggregation concepts. For more senior interviews, you should be familiar with advanced concepts like window functions, CTEs, and complex data transformations.

Below, we've compiled a list of example SQL interview questions indicative of the types of questions you can expect in real data science interviews. We created this list with input from data analysts, data scientists, and machine learning engineers at Meta, Google, Dropbox, Amazon, and top consulting firms.

Each question includes a sample answer and explanation of the underlying SQL concepts.

👋

This guide contains excerpts from Exponent's complete data science interview course and SQL interview course.

Sneak peek:
- Watch a senior SQL developer answer: "Explain SQL stored procedures."
- Watch a Meta data scientist answer: "Find employees who earn more than their managers."
- Watch a venture capitalist answer: "Calculate e-commerce orders from yesterday."

This guide was written and compiled by Derrick Mwiti, a senior data scientist and course instructor.

Key Takeaways

Junior Data Scientists: Expect SQL questions on basic querying, syntax, and data retrieval using functions like WHERE, INNER JOIN, ORDER BY, GROUP BY, and aggregations like COUNT(), AVG(), and MAX().
Mid-level Candidates: Know different types of joins, nested and correlated subqueries, CASE WHEN, date and time manipulation, WITH clauses, UNION, INTERSECT, EXCEPT, and string functions like CONCAT() and REPLACE().
Senior Candidates: Be comfortable with window functions, pivot tables, data transformation, normalization, and handling complex data use-cases.

Beginner SQL

Early in your data career, your SQL interviews will likely be focused on basic querying, syntax, and data retrieval, such as:

Using WHERE clauses to filter rows based on specific conditions.
Combining data from multiple tables using different types of joins such as INNER JOIN and LEFT JOIN.
Using functions like COUNT(), SUM(), AVG(), MIN(), and MAX() to aggregate.
Using GROUP BY to aggregate data into summary rows.
Using ORDER BY to sort query results.

ℹ️

Refresh the basics of relationships and relational database concepts.

Here are some examples of common SQL questions you might hear:

1. Employees Earning More Than Managers

You are given an employee table with the following columns: employee_id, name, salary, department_id, and manager_id.

Identify the employees who earn more than their direct managers. The expected output is: employee_id, employee_name.

Here is a sample solution:

SQL

SELECT 
    e.employee_id, 
    e.name AS employee_name
FROM 
    employee e
JOIN 
    employee m ON e.manager_id = m.employee_id
WHERE 
    e.salary &gt; m.salary;

First, the employee table is aliased as e for employees and m for managers.
Managers and employees are joined by the formula e.manager_id = m.employee_id.
Employees whose salary exceeds that of their manager are filtered out by the WHERE clause e.salary > m.salary.
The employees who meet the condition have their employee_name and employee_id retrieved using SELECT.

ℹ️

Practice this question: Employee Earnings

2. Average Unmatched Bookings on Lyft

You work for Lyft, a ride-sharing company and are tasked with finding the average number of unmatched bookings per user. The expected output is: user_id, user_name, email, and avg_unmatched_bookings (rounded to the nearest 2 decimal places).

You have a users table with user_id, user_name, email. You also have a bookings table with booking_id, user_id, driver_id, booking_time, status.

Here's a sample solution:

SQL

SELECT 
    u.user_id, 
    u.user_name, 
    u.email,
    COALESCE(ROUND(AVG(CASE WHEN b.status = 'Unmatched' THEN 1 ELSE 0 END), 2), 0) AS avg_unmatched_bookings
FROM users u
LEFT JOIN bookings b ON u.user_id = b.user_id
GROUP BY u.user_id, u.user_name, u.email;

First, use the LEFT JOIN to combine the users table with the bookings table on the user_id column. This ensures that even users with no bookings are included in the result.
Then, use the CASE statement within the AVG() function to compute the average number of bookings for each user. Unmatched bookings are counted as 1, or 0 otherwise.
Use the COALESCE function together with the ROUND function to ensure there are no null values. Then, the average is rounded to two decimal places.

ℹ️

Practice this question: Lyft Ride Requests

3. Successful Posts Per User Type

Show the total amount of successful posts a particular user_type made in November 2023. Then, order your output by descending success rate.

Your output should include: user_type, post_success (number of successful posts), post_attempt (number of total posts), and post_success_rate (ranging from 0.00 to 1.00).

You've been given a post table with post_id, post_date, user_id, interface, and is_successful_post. You also have a post_user table with user_id, user_type, and age.

A sample solution looks like:

SQL

SELECT 
    pu.user_type,
    COUNT(CASE WHEN p.is_successful_post = 1 THEN 1 END) AS post_success,
    COUNT(p.post_id) AS post_attempt,
    ROUND(COUNT(CASE WHEN p.is_successful_post = 1 THEN 1 END) * 1.0 / COUNT(p.post_id), 2) AS post_success_rate
FROM 
    post p
JOIN 
    post_user pu ON p.user_id = pu.user_id
WHERE 
    p.post_date &gt;= '2023-11-01' AND p.post_date &lt; '2023-12-01'
GROUP BY 
    pu.user_type
ORDER BY 
    post_success_rate DESC;

Use the user_id column to join the post table with the post_user table to get the user type for each post.
Use the WHERE function to filter posts from November 2023.
Count the number of successful posts using COUNT(CASE WHEN p.is_successful_post = 1 THEN 1 END).
Round the answer to two decimal places using ROUND. To guarantee that the division function is performed in floating-point, multiply the result by 1.0.
Arrange the results using GROUP BY pu.user_type.
Sort post_success_rate in descending order with DESC.

4. Average Duration of a Marketing Campaign

Calculate the average duration of marketing campaigns, given three tables.

First, you have a campaign table with an upsell_campaign_id, date_start, and date_end. Then, you have a user table with user_id, name, and is_eligible_for_upsell_campaign. Finally, there's a transaction table withtransaction_id, user_id, product_id, transaction_date, and quantity.

Here is a sample solution:

SQL

SELECT AVG(DATEDIFF(date_end, date_start)) AS avg_campaign_length
FROM campaign;

Find the difference between the start date and end date of a campaign using DATEDIFF.
Then, calculate the average using AVG.

5. Total Viewership on Laptops and Mobile Devices

Imagine you work at Netflix. Calculate the total viewership from laptops compared to mobile devices.

There are three types of devices that viewers use:phone, tablet, and laptop. Mobile viewership is computed by summing the tablet and phone viewership. You also have a viewership table with user_id, device_type, and view_time.

Here's a sample solution:

SQL

SELECT 
    SUM(CASE WHEN device_type = 'laptop' THEN 1 ELSE 0 END) AS laptop_views,
    SUM(CASE WHEN device_type IN ('phone', 'tablet') THEN 1 ELSE 0 END) AS mobile_views
FROM 
    viewership;

Use SUM with CASE statements to conditionally count the number of viewership records for each device_type.

6. Average Star Rating of Product by Month

Compute the average star rating for each product grouped by the month given a reviews table with these columns: review_id, user_id, submit_date, product_id, and stars.

Display the month as a numerical value, product ID, and average star rating rounded to two decimal places. Sort the results by the first month followed by the product ID.

For example:

SQL

SELECT 
    EXTRACT(MONTH FROM submit_date) AS mth,
    product_id,
    ROUND(AVG(stars), 2) AS avg_stars
FROM 
    reviews
GROUP BY 
    EXTRACT(MONTH FROM submit_date), product_id
ORDER BY 
    mth, product_id;

First, EXTRACT the month from the submit_date column.
Then, ROUND computes the average to 2 decimal places.
Finally, GROUP BY and ORDER BY order the data by month and product ID.

7. Discount Product Catalog

You work for Amazon. The company likes to provide seasonal discounts. This year, electronics are discounted at 10%, clothing at 20%, groceries at 5%, and books at 15%.

You are given a products table with name, type, and price. Generate a discounted product catalog that displays the name, type, and discounted_price for each item. Round the discounted prices to the nearest cent.

SQL

SELECT 
    name, 
    type, 
    ROUND(price * 
        CASE 
            WHEN type = 'Electronic' THEN 0.90
            WHEN type = 'Clothing' THEN 0.80
            WHEN type = 'Grocery' THEN 0.95
            WHEN type = 'Book' THEN 0.85
            ELSE 1.00 
        END, 2) AS discounted_price
FROM 
    products;

CASE applies the appropriate discount based on the product.
ROUND (..., 2) rounds the price to the nearest cent.
SELECT picks the name, type, and discounted price.

8. Differences in Credit Cards Issued

You work for a bank like Capital One that wants to launch a new credit card. You are given a table with the number of credit cards issued each month that includes card_name, issued_amount, issue_month, and issue_year.

Output the name of each credit card and the difference in the total number of credit cards between the highest and lowest issuance month. Arrange the output based on the highest disparity.

SQL

SELECT 
    card_name,
    MAX(issued_amount) - MIN(issued_amount) AS difference
FROM 
    monthly_cards_issued
GROUP BY 
    card_name
ORDER BY 
    difference DESC;

MAX(issued_amount) - MIN(issued_amount) computes the difference between the highest and lowest issuance amounts for each card.
GROUP BY card_name groups the results by card name to calculate the difference for each card.
ORDER BY difference DESC sorts the results by the difference in descending order.

9. Transactions By City

Fetch the number of transactions made by users in specific cities. Then, order by descending number of transactions.

You're given 2 tables. First, a users table with id, first_name, last_name, user_city, and email. You also have a transactions table with id, customer_id, product_id, currency_code, date, and amount.

Here's a sample solution:

SQL

SELECT 
    u.user_city,
    COUNT(t.id) AS number_of_transactions
FROM 
    users u
JOIN 
    transactions t ON u.id = t.customer_id
GROUP BY 
    u.user_city
ORDER BY 
    number_of_transactions DESC;

The users table is joined with the transactions table on the customer_id to get the transactions for each user.
COUNT(t.id) counts the number of transactions for each user city.
The results are grouped by user_city to aggregate the transaction counts per city.
The results are ordered by the number_of_transactions in descending order.

ℹ️

Practice this question: Sales by Customer City

Intermediate SQL

Data scientists use SQL to clean, transform, and preprocess data. As your data science career advances, so too will your SQL interviews.

Your roles will begin to require you to be able to process, manipulate, and summarize larger datasets.

For this reason, mid-level data scientists should expect to get asked SQL questions about:

Using different types of joins in more complex scenarios.
Writing nested and correlated subqueries.
Using functions like ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), and LAG().
Using CASE WHEN for conditional logic in queries.
Manipulating and calculating dates and times.
Using WITH clauses to define temporary result sets.
Using UNION, INTERSECT, and EXCEPT to combine results from multiple queries.
Manipulating string data using functions like CONCAT(), SUBSTRING(), REPLACE().

ℹ️

At every interview level, you should first understand the data, ask clarifying questions, discuss your approach out loud, and then implement your code.

Here are some questions you can practice:

10. Revenue by Country

Given a table called purchase_orders, with quantity and price columns, compute the total_revenue. Group the total revenue by country, then order by the total revenue.

Here's a sample solution:

SQL

SELECT SUM(quantity * price) AS total_revenue
FROM purchase_orders
GROUP BY country
ORDER

 BY total_revenue DESC;

The table doesn’t have a total revenue column. Compute it by multiplying the quantity and price columns.
SUM them to find the total revenue.

11. Categorize Orders by Size

Given a table of orders with quantity, categorize each order into "small" and "large" orders with a new size column using the CASE WHEN function.

Large orders have quantity >= 2.

SQL

SELECT
   *,
   CASE
       WHEN quantity &gt;= 2 THEN 'large'
       ELSE 'small'
   END AS size
FROM
   orders;

CASE WHEN is similar to if-then-else statements. It's used to evaluate one or more conditions and returns a value based on its evaluation.

12. Orders Per Month

Given a table of orders with timestamps in the order_date column, aggregate and return the number of orders per month.

SQL

SELECT 
  DATE(order_date, 'start of month') AS order_month, 
  COUNT(*) AS no_orders 
FROM orders 
GROUP BY order_month;

For SQLite, use DATE(column, 'start of month').
For PostgreSQL, use DATE_TRUNC('month', column).
For MySQL, use EXTRACT(MONTH FROM column).

Imagine you work at a company with monthly subscriptions. Given a table of users, compute the average retention based on the month of the signup_date and churn_date. Group the results by signup_month.

SQL

SELECT 
  date_trunc('month', signup_date) AS signup_month, 
  AVG(datediff('day', signup_date, churn_date)) AS avg_retention 
FROM users
GROUP BY signup_month;

Find the difference in days between the signup_date and the churn_date using datediff while also finding the signup month.

14. Orders with Customer Data from March

You're given tables for customers and orders. There's a foreign key customer_id, which references customers.id. Return all orders with their associated customer data from March 2022.

Here's a sample solution:

SQL

SELECT orders.id as order_id, customers.*
FROM orders
INNER JOIN customers
ON orders.customer_id = customers.id
WHERE orders.order_date BETWEEN '2022-03-01' and '2022-04-01';

orders.id as order_id renames the id column from the orders table to order_id for clarity.customers.* selects all columns from the customers table.
INNER JOIN joins the orders and customers tables. The join condition is that the customer_id in the orders table matches the id in the customers table.
WHERE and BETWEEN filter orders between March and April of 2022.

15. Customers with At Least One Order Using Subqueries

Given customers and orders tables with id and customer_id columns, use subqueries to return all customers who have at least one order.

Here's a sample solution:

SQL

SELECT *
FROM customers
WHERE (SELECT COUNT(*)
       FROM orders
       WHERE customers.id = orders.customer_id) &gt; 0;

The subquery SELECT COUNT() counts the number of orders for each customer.
The WHERE clause ensures that only customers with at least one order are counted.

Alternatively:

SQL

SELECT * 
FROM customers 
WHERE id IN (SELECT customer_id FROM orders);

The subquery SELECT customer_id FROM orders retrieves all customer_id values from the orders table.
The main query selects all columns from the customers table where the id matches any of the customer_id values returned by the subquery.

ℹ️

Learn more about subqueries and derived tables.

16. Customers with At Least One Order Using CTEs

Solve the same problem as above. This time, use Common Table Expressions (CTEs) instead of subqueries.

CTEs are used for breaking down complex queries into more manageable parts and are similar to subqueries, however they are defined before the main query. Subqueries, on the other hand, are defined within the main query.

CTEs can be referenced multiple times in a single query, whereas subqueries can't. However, subqueries are more efficient because the database engine easily optimizes them.

Here's a sample solution:

SQL

WITH customer_orders AS (
  SELECT customer_id, COUNT(*) as num_orders
  FROM orders
  GROUP BY customer_id
)
SELECT customers.*
FROM customers
JOIN customer_orders
ON customers.id = customer_orders.customer_id
WHERE customer_orders.num_orders &gt; 0;

The WITH keyword is used to define the CTE. The AS keyword is used to define the query for populating it.
The name of the CTE is customer_orders. The CTE is then used in the main query as any other table.
The customer_orders CTE in the main query finds customers with at least one order.

17. Orders Ranked by Total Price

Given an orders table with price and quantity columns, write a query to return a list of orders ranked by total price.

A sample answer might look like:

SQL

SELECT 
    orders.*,
    SUM(quantity * price) AS total_price,
    RANK() OVER (ORDER BY SUM(quantity * price) DESC) AS rank
FROM 
    orders
GROUP BY 
    id, customer_id;

Multiply the price and quantity to get the total price.
Use the RANK window function. It's general syntax is:RANK() OVER ( [PARTITION BY partition_expression] [ORDER BY order_expression] )
The PARTITION BY and ORDER BY clauses determine how the rows will be partitioned and ordered for processing by the RANK function.
The RANK function assigns the same rank when there is a tie.
DENSE RANK doesn’t skip rank values for ties. If two rows have the same value, they will receive the same rank, but the next row will get the next rank instead of skipping the rank value like in RANK.

Advanced SQL

Senior data scientists are trusted to handle extremely large datasets.

Large companies like Meta, Microsoft, TikTok, and Google need data scientists who can clean, synthesize, and interpret large amounts of information.

In combination with machine learning, senior scientists help to make sense of trends and make predictions for the future.

Before stepping into a senior SQL interview in your data science loops, consider brushing up on concepts like:

Window functions like RANK, DENSE RANK, ROW_NUMBER, and LAG for complex calculations and analytics.
Efficient queries and and execution plans.
Using recursive CTEs for hierarchical or recursive data structures.
Transforming rows into columns with pivot tables.
Implementing intricate data transformations and aggregations.
Understanding and using transactions, including COMMIT, ROLLBACK, and SAVEPOINT.
Understanding normalization, indexing, and schema design principles.

Here are some questions you can practice:

18. Top Earning Employee by Department

Write a SQL query to fetch the top-earning employee by department, ordered by department name.

You've been given the following schema.

employees                             projects
+---------------+---------+           +---------------+---------+
| id            | int     |<----+  +->| id            | int     |
| first_name    | varchar |     |  |  | title         | varchar |
| last_name     | varchar |     |  |  | start_date    | date    |
| salary        | int     |     |  |  | end_date      | date    |
| department_id | int     |--+  |  |  | budget        | int     |
+---------------+---------+  |  |  |  +---------------+---------+
                             |  |  |
departments                  |  |  |  employees_projects
+---------------+---------+  |  |  |  +---------------+---------+
| id            | int     |<-+  |  +--| project_id    | int     |
| name          | varchar |     +-----| employee_id   | int     |
+---------------+---------+           +---------------+---------+

Your answer should be in the following format:

department_name | employee_id | first_name | last_name | salary 
----------------+-------------+------------+-----------+--------
varchar         | int         | varchar    | varchar   | int

Here's a sample solution:

SQL

WITH ranked_employees AS (
    SELECT 
        e.id AS employee_id,
        e.first_name,
        e.last_name,
        e.salary,
        d.name AS department_name,
        ROW_NUMBER() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS rank
    FROM 
        employees e
    JOIN 
        departments d ON e.department_id = d.id
)
SELECT 
    department_name,
    employee_id,
    first_name,
    last_name,
    salary
FROM 
    ranked_employees
WHERE 
    rank = 1
ORDER BY 
    department_name;

The CTE ranked_employees ranks employees within each department based on their salary in descending order.
The ROW_NUMBER() function is used to assign a rank to each employee within their department.
The main query selects the top-ranked employee (rank = 1) from each department. This results in only one top earner in each department.
The employees table is joined with the departments table to get department _name. The result is then ordered by department name.

ℹ️

Practice this question: Top Salaries by Department

19. Group Users by Number of Tweets in 2022

Given a tweets table with tweet_id, user_id, msg, and tweet_date, group the users by the number of tweets they posted in 2022. Count the number of users in each group.

SQL

WITH tweet_cte AS (
    SELECT user_id, COUNT(*) as tweet_bucket 
    FROM tweets
    WHERE EXTRACT(year from tweet_date) = 2022
    GROUP BY user_id
)
SELECT tweet_bucket, COUNT(*) as users_num 
FROM tweet_cte 
GROUP BY tweet_bucket;

The tweet_cte counts tweets per user for 2022, resulting in user_id and tweet_bucket.
The main query then groups and counts the users by tweet_bucket.

20. Monthly Active Users in July 2022

You have a table of Facebook user_actions. Obtain the number of monthly active users in July 2022, including the month in numerical format "1, 2, 3."

The user_actions table has user_id, event_id, event_type, and event_date.

Here's a sample solution:

SQL

WITH june_actions AS (
    SELECT DISTINCT user_id
    FROM user_actions
    WHERE event_date BETWEEN '2022-06-01' AND '2022-06-30'
      AND event_type IN ('sign-in', 'like', 'comment')
),

july_actions AS (
    SELECT DISTINCT user_id
    FROM user_actions
    WHERE event_date BETWEEN '2022-07-01' AND '2022-07-31'
      AND event_type IN ('sign-in', 'like', 'comment')
)
SELECT 7 AS month, COUNT(*) AS monthly_active_users
FROM july_actions
WHERE user_id IN (SELECT user_id FROM june_actions);

The june_actions and july_actions CTEs select distinct user_ids who performed relevant actions in June and July 2022, respectively.
The main query counts the number of users who appear in both june_actions and july_actions, thus identifying users active in both months.

ℹ️

Watch an Adobe data scientist answer, "Analyze monthly customer transactions."

21. Success Rate of Post When Previous Post Failed

Given post and post_user tables, find the success rate of posts (%) when the user's previous post had failed.

The user table contains post_id, post_date, user_id, interface, and is_successful_post. The post_user table contains user_id, user_type, and age.

Your output should have the following columns: user_id and next_post_sc_rate (success rate of the post when the user’s previous post had failed). Order results by increasing next_post_sc_rate.

Here's a sample solution:

SQL

WITH post_seq AS (
    SELECT
        p.user_id,
        p.post_id,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY post_date) AS post_seq_id,
        is_successful_post
    FROM post as p
),

post_pairings AS (
    SELECT
        ps.user_id,
        ps.post_seq_id AS fail_post_id,
        ps.post_seq_id + 1 AS next_post_id
    FROM post_seq AS ps
    WHERE ps.is_successful_post = 0
)

SELECT
    pp.user_id,
    ROUND(SUM(p2.is_successful_post) * 1.0 / COUNT(p2.is_successful_post), 2) AS next_post_sc_rate
FROM post_pairings AS pp
JOIN post AS p2
    ON pp.next_post_id = p2.post_id
GROUP BY 1
ORDER BY next_post_sc_rate ASC;

The post_seq CTE assigns a sequential ID post_seq_id to each post per user based on the post_date.
The post_pairings CTE identifies pairs of posts where the previous post was unsuccessful. The next_post_id is the sequential ID of the post following the unsuccessful post.
The final SELECT joins the post_pairings with the post table to get details about the next post. It computes the success rate of the next posts following an unsuccessful post.
SUM() calculates the ratio of successful next posts to the total number of next posts.
ROUND(..., 2) rounds the success rate to two decimal places.
GROUP BY groups by user_id and orders the results by next_post_sc_rate in ascending order.

22. Identify Repeat Payments

Identify any payments made at the same merchant using the same credit card with the same amount within 10 minutes of each other. Such transactions should be considered as repeat payments.

The first transaction of such payments shouldn't be considered as a repeat payment. This means that if there are three transactions by a merchant with the same credit card, for the same amount within 10 minutes, only 2 payments will be considered as repeated.

The transactions table contains the following columns: transaction_id, merchant_id, credit_card_id, amount, and transaction_timestamp.

Here's a sample solution:

SQL

WITH PreviousTransactions AS (
    SELECT 
        transaction_id,
        merchant_id, 
        credit_card_id, 
        amount, 
        transaction_timestamp,
        LAG(transaction_timestamp) OVER (
            PARTITION BY merchant_id, credit_card_id, amount 
            ORDER BY transaction_timestamp
        ) AS previous_transaction 
    FROM transactions
),
RepeatPayments AS (
    SELECT
        transaction_id,
        merchant_id,
        credit_card_id,
        amount,
        transaction_timestamp
    FROM PreviousTransactions
    WHERE previous_transaction IS NOT NULL
    AND transaction_timestamp &lt;= previous_transaction + INTERVAL '10 MINUTE'
)
SELECT 
    COUNT(*) AS payment_count
FROM RepeatPayments;

The PreviousTransactions CTE uses LAG() to get the timestamp of the previous transaction for each partition of merchant_id, credit_card_id, and amount. It orders them by transaction_timestamp to ensure that LAG() correctly identifies the previous transaction.
RepeatPayments filters transactions where there is a valid previous_transaction and the transaction_timestamp of the current transaction is within 10 minutes of the previous_transaction.

23. Top Customers by Order Count in Last 5 Years

Given orders, departments, and customers tables, identify the customers who placed the most orders for each of the last 5 years.

The orders table has order_id, customer_id, order_date, order_amount, and department_id. The departments table has department_id and department_name. And the customers table has customer_id, first_name, and last_name.

Output a list of the 5 top customers. Your output should have the following columns: customer_id, first_name, last_name, and years.

SQL

WITH orders_per_year AS (
  SELECT
    c.customer_id,
    c.first_name,
    c.last_name,


    EXTRACT(YEAR FROM o.order_date) AS years, -- PostgreSQL function for extracting the year
    COUNT(o.order_id) AS total_orders,
    rank() OVER (PARTITION BY EXTRACT(YEAR FROM o.order_date) ORDER BY COUNT(o.order_id) DESC) AS order_rankings
  FROM orders o
    JOIN customers c ON o.customer_id = c.customer_id
  WHERE EXTRACT(YEAR FROM o.order_date) IN (2019, 2020, 2021, 2022, 2023)
  GROUP BY EXTRACT(YEAR FROM o.order_date), c.customer_id, c.first_name, c.last_name
)

SELECT 
  first_name, 
  last_name, 
  total_orders, 
  years
FROM orders_per_year
WHERE order_rankings = 1
ORDER BY years ASC;

24. Calculate Total Server Uptime

A cloud service provider like AWS wants to optimize the usage of its fleet of servers. Calculate the total time that the fleet of servers was running in days. The total time a server fleet is running is the sum of the server's uptime.

A server can start and stop several times. The server_utilization table has the following columns: server_id, status_time, and session_status (start, stop).

The expected output is total_uptime_days.

SQL

WITH SessionDurations AS (
    SELECT
        server_id,
        status_time AS start_time,
        LEAD(status_time) OVER (PARTITION BY server_id ORDER BY status_time) AS stop_time,
        session_status
    FROM server_utilization
),
FilteredSessions AS (
    SELECT
        server_id,
        start_time,
        stop_time
    FROM SessionDurations
    WHERE session_status = 'start' AND stop_time IS NOT NULL
),
CalculatedDurations AS (
    SELECT
        server_id,
        EXTRACT(EPOCH FROM (stop_time - start_time)) AS uptime_seconds
    FROM FilteredSessions
)

SELECT
    ROUND(SUM(uptime_seconds) / (24 * 60 * 60), 0) AS total_uptime_days
FROM
    CalculatedDurations;

In the SessionDurations CTE, LEAD pairs each start time with the next stop time for the same server_id.
In FilteredSessions CTE, we filter rows where session_status = 'start' and stop_time is NOT NULL.
The CalculatedDurations CTE gets the duration of each session in seconds.
The final SELECT sums the uptime_seconds and converts them to days by dividing by 24 * 60 * 60.

25. Top 2 Players by Highest Score

You work for a leading game development company where players can team up and compete. Each player's performance in different game sessions is recorded as distinct score entries in the database.

You're provided a players table with player_id, player_name, and team_id columns and a scores table with score_id, player_id, and game_score.

Return the top 2 players from each team based on their single highest score across all sessions. If multiple players share the same highest score, include all of them, which may result in more than two top players for some teams.

Here's an example solution:

SQL

WITH PlayerMaxScores AS (
    SELECT
        p.team_id,
        p.player_name,
        MAX(s.game_score) AS max_score
    FROM
        players p
    JOIN
        scores s ON p.player_id = s.player_id
    GROUP BY
        p.team_id, p.player_name
),
RankedPlayers AS (
    SELECT
        team_id,
        player_name,
        max_score,
        DENSE_RANK() OVER (PARTITION BY team_id ORDER BY max_score DESC) AS rank
    FROM
        PlayerMaxScores
)
SELECT
    team_id,
    player_name,
    max_score
FROM
    RankedPlayers
WHERE
    rank &lt;= 2
ORDER BY
    team_id, max_score DESC, player_name;

The PlayerMaxScores CTE aggregates the maximum score for each player.
The DENSE_RANK() window function in the RankedPlayers CTE assigns a rank to each player within their team based on their maximum score.
DENSE_RANK() ensures that players with the same score get the same rank.
The final SELECT picks the top two players from each team.

SQL Interview Tips

It is impossible to cover all the possible questions since SQL is a broad and varied technology!

Hopefully, these questions have given you a glimpse into what to expect in your data science interviews.

Explore dozens of mock interviews and practice lessons in our data science interview course.
Schedule a free mock interview session to practice answering questions with other peers.
Get data science interviewing coaching from scientists at top companies.

Good luck with your upcoming interview!

Top 25 SQL Data Science Interview Questions

Key Takeaways

Beginner SQL

1. Employees Earning More Than Managers

2. Average Unmatched Bookings on Lyft

3. Successful Posts Per User Type

4. Average Duration of a Marketing Campaign

5. Total Viewership on Laptops and Mobile Devices

6. Average Star Rating of Product by Month

7. Discount Product Catalog

8. Differences in Credit Cards Issued

9. Transactions By City

Intermediate SQL

10. Revenue by Country

11. Categorize Orders by Size

12. Orders Per Month

13. Retention Based on Signup and Churn Dates

14. Orders with Customer Data from March

15. Customers with At Least One Order Using Subqueries

16. Customers with At Least One Order Using CTEs

17. Orders Ranked by Total Price

Advanced SQL

18. Top Earning Employee by Department

19. Group Users by Number of Tweets in 2022

20. Monthly Active Users in July 2022

21. Success Rate of Post When Previous Post Failed

22. Identify Repeat Payments

23. Top Customers by Order Count in Last 5 Years

24. Calculate Total Server Uptime

25. Top 2 Players by Highest Score

SQL Interview Tips

Browse data scientist interview questions

Book time with a Data Scientist coach

Learn everything you need to ace your data science interviews.

Related Courses

Data Science Interview Prep

Data Communications Questions

Related Blog Posts

Top 25 Python Data Science Interview Questions (2025 Guide)

Top 25 Statistics Data Science Interview Questions (2025 Guide)

Data Science Career Path: Your Complete Guide

Complete Data Scientist Resume Guide (with FAANG Templates)