Top 15 SQL Interview Questions and Answers (with Examples)

🗄️

Hey there! Learn how to answer system design questions with in-depth video examples and fundamental SQL concepts in our System Design Interview Course.

Sneak Peek: The three most common system design questions:
- Design Instagram. Watch an answer to this question here.
- How would you build TinyURL? Watch answer here.
- Design YouTube. Watch a sample answer to this question here.

Software companies collect an enormous amount of data, which is then used for different analytics purposes to help future-proof their solutions. This data needs to be stored somewhere, and databases, especially relational databases, are often the answer.

Just like every application has programming language requirements, databases also have programming languages used for creating, manipulating, and accessing data. Some widely used database languages are MySQL, OracleSQL, NoSQL, and PostgreSQL. As you might have guessed, the base of all these languages is SQL, which stands for Structured Query Language, and is sometimes pronounced "sequel".

SQL has been around for over forty years, and it still captures most of the database plane due to its maintainability and adaptation of new features.

The structured query language is not limited to creating databases, storing, and retrieving data. You can also perform operations on the data, such as performing a logical operation or searching data.

SQL is a requisite technology for data scientists, machine learning engineers, and data analysts, as well as for data engineers and database administrators. If you're interviewing for a role in this area, it's almost certain that you'll be asked questions about SQL, and in this article, you're going to go over some of the most commonly asked interview questions and answers in SQL interviews.

Why is SQL so Important?

SQL is widely adapted by different organizations working on handling a huge amount of data. It's a necessary skill required for all kinds of analytics roles. There are a number of reasons that SQL is so important for data handling.

Enables working with a huge amount of data: As technology advances, so does the need to store a large amount of data. If you want to analyze this data, hardware resource limitations often mean that you can't read the entire data set. SQL enables you to SQL query the subset of data you want to work with, making it a faster and more reliable way to analyze.
An essential skill for data analysts: Analyzing a huge amount of data and drawing relevant information from it is the primary goal of data analysts. SQL is the most in-demand skill to become a good data analyst, as it helps them to manipulate a huge amount of data easily.
Necessary for data science: Data science is all about knowing your data well and creating statistical models to make predictions or recommendations. POC (Proof of Concept) and experimentation normally use file-based storage, such as CSV files and Excel files. When you move to real-world use cases, though, things change; you need an actual relational database, and to manage that, you need SQL.
Widely used: Learning SQL alone can open doors for working with other database technologies. Languages like Oracle SQL, T-SQL, NoSQL, and PostgreSQL use the same underlying base, but have developed their own feature set on top of it. With a solid command of SQL, you can master others as well.

What is RDBMS?

RDBMS, which stands for Relational Database Management System, is a type of data storage mechanism that stores data in tables that are made up of rows and columns.

Each column in a table is called a feature, while each row in a relational database management system is referred to as a record. These records can be manipulated with different relational operators, such as =, >=, and <>, to get the data in the desired form. Many famous databases, such as SQL, Microsoft SQL, Oracle SQL, and MySQL, are types of RDBMS.

What is Normalization?

Normalization is the process of removing the data redundancy and enhancing the data integrity in the database tables. It helps in removing duplicate rows and organizing rows in the tables. Using relationships, normalization rules break up large tables into smaller ones.

SQL normalization serves the dual purposes of removing unnecessary (duplicated) data and ensuring logical data storage.

What is a View?

In SQL, views are a subset of virtual tables. The rows and columns of a view are identical to those in a database's actual table. Views are built by choosing fields from one or more database tables. A view may include all table rows, or may filter rows by user-defined criteria and return only the rows that match that criteria.

The most important use-case for a view is for security. It creates a searchable object that can be queried instead of the table itself. It will only return data that's available from the creation of the view.

Below is an example of how to create a new view

CREATE VIEW view_name AS  
SELECT column_lists FROM table_name  
WHERE condition;

We can create a view by using the following syntax:

What is a Primary Key?

A primary key in the database table is a field, composed of a column or combination of columns, that uniquely identifies each row or record. A single-column key is simply called a primary key, while a combination of columns is a composite key, which is a type of primary key.

The primary key ensures that each record has a unique identifier. A primary key has some characteristics associated with it:

Each value in a primary key field should be unique.
Primary keys are not allowed to have NULL, or missing, values.
A table must always have one primary key.

For example, in an insurance company, each customer must have a customer ID and no two customers are allowed to have the same ID. For this reason, customer ID can be treated as a primary key.

A primary key can be defined as follows:

/* Create a table Student with ID as primary key */
/*Specifying the PRIMARY KEY while creating the table*/
CREATE TABLE Students(
ID INT NOT NULL
Name VARCHAR(255)
PRIMARY KEY (ID)
);

/* Alter table add ID as primary key*/
/*Specifying the PRIMARY KEY when the table is already created*/
ALTER TABLE Students   
ADD PRIMARY KEY (ID);

What are Constraints?

Constraints are the way to specify rules for the data in a database table. They ensure the consistency of the data by specifying what type or types of data can be entered in the table. If there is a violation of constraint while performing any operation on the database, the whole operation is aborted.

Constraints can be defined at the time of table creation, or if the table is already created, they can be defined using the ALTER TABLE command. Let's check the list of these constraints that are applied on column level or table level:

NOT NULL: Ensures no null values are entered in a column.
UNIQUE: Ensures the value provided by the user is unique to the column.
CHECK: Checks that values in a column satisfy a condition or conditions that have been specified by the database administrator.
DEFAULT: Assigns a default value for a field if no value is specified.
PRIMARY KEY: Uniquely identifies each row in a table. It is the combination of NOT NULL and UNIQUE constraints.
FOREIGN KEY: Ensures the link between tables by identifying the relationship among them. In this relationship, the foreign key of one table, referred to as the child table, points to the primary key of another table, referred to as the parent table.
INDEX: Creating this constraint ensures faster access to the data. When you create an index on a SQL table, you create a pointer to where the data is stored. By using this index when you need to access the data, access becomes faster.

What is a Unique Key?

A unique key is a column or set of columns that can only have distinct entries, which results in the unique identification of each row/record.

This key is similar to the primary key, except that it can have a single null value, as one null value is still a unique value. A table can have more than one unique key, but only one primary key.

Unique keys are among the most popular concepts for SQL interview questions.

A unique key can be defined as follows:

/* Create a table Student with ID as unique key */
/* Specify UNIQUE key at the time of table creation*/
CREATE TABLE Students(
ID INT NOT NULL UNIQUE
Name VARCHAR(255)
);

/* Alter table add ID as unique key*/
/* Specify UNIQUE key when the tabe is already created*/
ALTER TABLE Students   
ADD UNIQUE (ID);

What are the Different Subsets of SQL?

There are many commands that the SQL language uses to manage the entire database. These commands can be classified into four different categories:

Data Definition Language (DDL): DDL consists of commands used to create or modify the structure of database objects in a database, such as creating a new table in a database or dropping some rows from a table. These commands are handled by someone who manages the entire database, and are not available to the end user. The CREATE, DROP, ALTER, TRUNCATE, COMMENT, and RENAME commands are part of this.
Data Query Language (DQL): DQL commands are used to select and query the data from the database. DQL commands allow you to specify conditions that limit what kind of data you want returned from the database. The SELECT statement is one example of DQL. End users have access to DQL for querying the data.
Data Manipulation Language (DML): DML commands are used to manipulate the data present in the database. This manipulation of data includes inserting data in the table, updating the data in the table, and deleting the data from the table. This language comprises statements such as INSERT, UPDATE, DELETE, and LOCK. Access to these commands is granted only to developers, and are used to manage the databases of a specific project.
Data Control Language (DCL): DCL deals with access rights and permissions of a database. Not all the developers in a project are provided with the facility to make changes in the database tables—only the database administrators can do so. GRANT and REVOKE are two significant commands used for data control.

To learn more about the subsets of SQL, you can refer to this link.

What is a Self-Join?

A self-join is a regular join that merges a table to itself based on a relation between its columns. For the table to join with itself, it needs to have multiple aliases. The syntax of self-join looks like this:

SELECT a.column_name, b.column_name...
FROM table_name a, table_name b
WHERE a.common_field = b.common_field;

What are the UNION, MINUS, and INTERSECT Commands?

UNION, MINUS, and INTERSECT are operators between two or more SELECT statements. They work as follows:

The UNION command combines the results from two or more SELECT statements. It combines all the columns from one statement to the columns obtained from another statement.

/* Fetch the results from the students and subjects table and get the union*/
SELECT name FROM Students
UNION
SELECT name FROM Subjects;

The MINUS command returns all the non-duplicated data from the result of two SELECT statements, ie, it checks the statement results and removes the common data entries. It works the same as applying MINUS operator on a set, for example {a, b, c} - {a, b} = {c}.

/* Fetch names from students table that are not part of subjects table*/
SELECT name FROM Students   
MINUS     
SELECT name FROM Subjects;

The INTERSECT command, as the name implies, returns the common entries from two or more SELECT statements.

/* Get the common name entries from students and subjects table */
SELECT name FROM Students   
INTERSECT    
SELECT name FROM Subjects;

List the Different Types of Relationships in SQL

A relational database is a combination of multiple tables that have some kind of relationship. There are four different types of relationships between tables in a database:

One-to-one relationship: In this kind of relationship, one row or record from the first table can only be associated with one record in the second table, and vice versa.

One-to-many relationship: This relationship implies that one row from the first table is associated with one or many rows of the second table.

Many-to-one relationship: Multiple rows from one table can be associated with a single row on another table.

Many-to-many relationship: Finally, in a many-to-many relationship, multiple rows from one table can be associated with the numerous rows of another table.

What is a UNIQUE Constraint?

A UNIQUE constraint ensures uniqueness in a column, meaning that every value in the table should be unique, without any duplicates. This is similar to the constraints of a primary key, except that it can have one NULL value, as a single null value is still unique.

UNIQUE KEY can be achieved by specifying the UNIQUE constraint. You can make a column unique with the following statement:

/* Create a table Student with ID as unique column */
/* Specifying UNIQUE constraint at the time of table creation*/
CREATE TABLE Students(
ID INT UNIQUE
Name VARCHAR(255)
);

/* Alter table add ID as unique column*/
/* Specifying UNIQUE constraint when the table is already created*
ALTER TABLE Students   
ADD UNIQUE (ID);

What is a Cursor and How Do You Use It?

A cursor points to a single row in a set of multiple rows, allowing you to retrieve rows one at a time and manipulate their data. It is usually used with the SELECT statement in places where you need to perform logic to the selected data row by row.

Using a cursor in SQL:

DECLARE a cursor by specifying its name with the datatype, which must be followed by a SELECT statement that defines the output of the cursor:

DECLARE cursor_name CURSOR  
FOR select_statement;

OPEN the cursor to store the data that was retrieved as the result of the SELECT statement:

OPEN cursor_name;

FETCH the result and move on to the next row. Some database operations, such as INSERT, UPDATE, and DELETE, can be applied for each row, after which the cursor moves to the next row:

FETCH NEXT FROM cursor INTO list_of_variable;

CLOSE the cursor when you are finished with all the operations:

CLOSE cursor_name;

DEALLOCATE the cursor, which destroys the cursor, releasing all the resources it had been using:

DEALLOCATE cursor_name;

What are Joins in SQL?

Joins are SQL statements that are used to combine two or more database tables based on related columns in these tables.

There are four different types of join that are very much important when working on a real-world use case.

INNER JOIN: This is the most common type of join, and is used to retrieve the common entries from the tables on which this join is applied:

SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;

LEFT JOIN: This type of join retrieves all the records from the left table, as well as any matching records from the right table:

SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;

RIGHT JOIN: This retrieves all the records from the right table, as well as the matching records from the left table:

SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;

FULL JOIN: A full join retrieves all the records when there is a match in either the left or the right table, as well as the remaining rows from the tables:

SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;

What is an Index?

An index is a data structure that helps you quickly look up data in the database tables.

It's similar in concept to the index of a book, allowing you to find any chapter quickly and easily. Indexes in SQL enhance the speed of accessing the data, as they eliminate the need to read through the whole table to locate the information needed. With an index, you just need to look at the index of the data you want to access.

CREATE INDEX index_name
ON Table_Name (Column_Name);

There are several types of index that you can define for a table. All these indexes can be used individually and combined, making it a multicolumn index. However, it is preferred to have only one index over multiple columns. Using these indexes totally depends on the use case you are working on.

Unique index: This guarantees that no duplicate values can exist in the index.
Clustered and non-clustered indexes: Clustered indexes are indexes in which the order of the rows in the index and the database coincide—that is, it specifies the physical arrangement of a database's table records. A non-clustered index does not define the physical order of database tables, and is stored separately from the main data table. Because it's stored separately, a table can have multiple non-clustered indexes.
Column store index: These are regular indexes to store and retrieve data from a table.
Filtered index: This type of index is created when a column has very few distinct values.
Hash index: A hash index is a bucket of different slots, each of which contains a pointer that points to each row in the bucket.

To look at these indexes in more detail, you can refer to this guide.

What is the ACID Property in a Database?

A single logical order of data is a transaction. It keeps a database consistent before and after the transaction. It makes sure that database systems can properly process data transactions.

Atomicity, Consistency, Isolation, and Durability all make up the ACID attribute.

Atomicity: This requirement assures that every statement or action inside the transaction must be successfully carried out. The entire transaction fails if even one step of it fails. The database state remains unaltered. COMMIT, ROLLBACK, and AUTO-COMMIT can be called.
Consistency: The database only changes state when a transaction is successfully committed. It shields data against crashes.
Isolation: It guarantees that statements are mutually transparent. To manage concurrency in a database, isolation is primarily intended.
Durability: This attribute ensures that a transaction, once committed, endures indefinitely regardless of system failure, power outages, or crashes.

What is OLTP?

Online Transaction Processing (OLTP) are data processing systems that execute transaction-focused tasks and enable many people to perform real-time operations on many databases over the internet. OLTP is architectured to prioritize concurrency and decentralization—they're highly available, and often support millions of transactions a day. Crucially, these OLTP systems use relational databases at their core. You can read more about OLTP systems and their potential benefits here.

Conclusion

SQL is an essential skill when working with databases, which are the core components of many real-world applications. SQL also plays an essential role in data science and machine learning, and helps professionals perform data preprocessing and feature engineering tasks with even basic SQL queries.

SQL is a robust technology with many functions, features, and concepts. Make sure you're interview-ready by reviewing the most frequently asked SQL interview questions. Exponent can help you prepare. Exponent is a learning platform that helps you prepare for tech interviews in product management and engineering roles.

Credit: This article was written by Gourav Singh Bais. Gourav is an Applied Machine Learning Engineer and is skilled in developing Machine Learning/Deep learning pipelines, retraining systems, and transforming Data Science prototypes to production-grade solutions.