Software companies collect an enormous amount of data, which is then used for different analytics purposes to help future-proof their solutions. This data needs to be stored somewhere, and databases, especially relational databases, are often the answer.
Just like every application has programming language requirements, databases also have programming languages used for creating, manipulating, and accessing data. Some widely used database languages are MySQL, OracleSQL, NoSQL, and PostgreSQL. As you might have guessed, the base of all these languages is SQL, which stands for Structured Query Language, and is sometimes pronounced "sequel".
SQL has been around for over forty years, and it still captures most of the database plane due to its maintainability and adaptation of new features.
The structured query language is not limited to creating databases, storing, and retrieving data. You can also perform operations on the data, such as performing a logical operation or searching data.
SQL is a requisite technology for data scientists, machine learning engineers, and data analysts, as well as for data engineers and database administrators. If you're interviewing for a role in this area, it's almost certain that you'll be asked questions about SQL, and in this article, you're going to go over some of the most commonly asked interview questions and answers in SQL interviews.
SQL is widely adapted by different organizations working on handling a huge amount of data. It's a necessary skill required for all kinds of analytics roles. There are a number of reasons that SQL is so important for data handling.
RDBMS, which stands for Relational Database Management System, is a type of data storage mechanism that stores data in tables that are made up of rows and columns.
Each column in a table is called a feature, while each row in a relational database management system is referred to as a record. These records can be manipulated with different relational operators, such as =
, >=
, and <>
, to get the data in the desired form. Many famous databases, such as SQL, Microsoft SQL, Oracle SQL, and MySQL, are types of RDBMS.
Normalization is the process of removing the data redundancy and enhancing the data integrity in the database tables. It helps in removing duplicate rows and organizing rows in the tables. Using relationships, normalization rules break up large tables into smaller ones.
SQL normalization serves the dual purposes of removing unnecessary (duplicated) data and ensuring logical data storage.
In SQL, views are a subset of virtual tables. The rows and columns of a view are identical to those in a database's actual table. Views are built by choosing fields from one or more database tables. A view may include all table rows, or may filter rows by user-defined criteria and return only the rows that match that criteria.
The most important use-case for a view is for security. It creates a searchable object that can be queried instead of the table itself. It will only return data that's available from the creation of the view.
Below is an example of how to create a new view
CREATE VIEW view_name AS
SELECT column_lists FROM table_name
WHERE condition;
We can create a view by using the following syntax:
A primary key in the database table is a field, composed of a column or combination of columns, that uniquely identifies each row or record. A single-column key is simply called a primary key, while a combination of columns is a composite key, which is a type of primary key.
The primary key ensures that each record has a unique identifier. A primary key has some characteristics associated with it:
NULL
, or missing, values.For example, in an insurance company, each customer must have a customer ID and no two customers are allowed to have the same ID. For this reason, customer ID can be treated as a primary key.
A primary key can be defined as follows:
/* Create a table Student with ID as primary key */
/*Specifying the PRIMARY KEY while creating the table*/
CREATE TABLE Students(
ID INT NOT NULL
Name VARCHAR(255)
PRIMARY KEY (ID)
);
/* Alter table add ID as primary key*/
/*Specifying the PRIMARY KEY when the table is already created*/
ALTER TABLE Students
ADD PRIMARY KEY (ID);
Constraints are the way to specify rules for the data in a database table. They ensure the consistency of the data by specifying what type or types of data can be entered in the table. If there is a violation of constraint while performing any operation on the database, the whole operation is aborted.
Constraints can be defined at the time of table creation, or if the table is already created, they can be defined using the ALTER TABLE
command. Let's check the list of these constraints that are applied on column level or table level:
NOT NULL
: Ensures no null values are entered in a column.UNIQUE
: Ensures the value provided by the user is unique to the column.CHECK
: Checks that values in a column satisfy a condition or conditions that have been specified by the database administrator.DEFAULT
: Assigns a default value for a field if no value is specified.PRIMARY KEY
: Uniquely identifies each row in a table. It is the combination of NOT NULL
and UNIQUE
constraints.FOREIGN KEY
: Ensures the link between tables by identifying the relationship among them. In this relationship, the foreign key of one table, referred to as the child table, points to the primary key of another table, referred to as the parent table.INDEX
: Creating this constraint ensures faster access to the data. When you create an index on a SQL table, you create a pointer to where the data is stored. By using this index when you need to access the data, access becomes faster.A unique key is a column or set of columns that can only have distinct entries, which results in the unique identification of each row/record.
This key is similar to the primary key, except that it can have a single null value, as one null value is still a unique value. A table can have more than one unique key, but only one primary key.
Unique keys are among the most popular concepts for SQL interview questions.
A unique key can be defined as follows:
/* Create a table Student with ID as unique key */
/* Specify UNIQUE key at the time of table creation*/
CREATE TABLE Students(
ID INT NOT NULL UNIQUE
Name VARCHAR(255)
);
/* Alter table add ID as unique key*/
/* Specify UNIQUE key when the tabe is already created*/
ALTER TABLE Students
ADD UNIQUE (ID);
There are many commands that the SQL language uses to manage the entire database. These commands can be classified into four different categories:
CREATE
, DROP
, ALTER
, TRUNCATE
, COMMENT
, and RENAME
commands are part of this.SELECT
statement is one example of DQL. End users have access to DQL for querying the data.INSERT
, UPDATE
, DELETE
, and LOCK
. Access to these commands is granted only to developers, and are used to manage the databases of a specific project.GRANT
and REVOKE
are two significant commands used for data control.To learn more about the subsets of SQL, you can refer to this link.
A self-join is a regular join that merges a table to itself based on a relation between its columns. For the table to join with itself, it needs to have multiple aliases. The syntax of self-join looks like this:
SELECT a.column_name, b.column_name...
FROM table_name a, table_name b
WHERE a.common_field = b.common_field;
UNION
, MINUS
, and INTERSECT
are operators between two or more SELECT
statements. They work as follows:
The UNION
command combines the results from two or more SELECT statements. It combines all the columns from one statement to the columns obtained from another statement.
/* Fetch the results from the students and subjects table and get the union*/
SELECT name FROM Students
UNION
SELECT name FROM Subjects;
The MINUS
command returns all the non-duplicated data from the result of two SELECT
statements, ie, it checks the statement results and removes the common data entries. It works the same as applying MINUS
operator on a set, for example {a, b, c} - {a, b} = {c}.
/* Fetch names from students table that are not part of subjects table*/
SELECT name FROM Students
MINUS
SELECT name FROM Subjects;
The INTERSECT
command, as the name implies, returns the common entries from two or more SELECT statements.
/* Get the common name entries from students and subjects table */
SELECT name FROM Students
INTERSECT
SELECT name FROM Subjects;
A relational database is a combination of multiple tables that have some kind of relationship. There are four different types of relationships between tables in a database:
One-to-one relationship: In this kind of relationship, one row or record from the first table can only be associated with one record in the second table, and vice versa.
One-to-many relationship: This relationship implies that one row from the first table is associated with one or many rows of the second table.
Many-to-one relationship: Multiple rows from one table can be associated with a single row on another table.
Many-to-many relationship: Finally, in a many-to-many relationship, multiple rows from one table can be associated with the numerous rows of another table.
A UNIQUE
constraint ensures uniqueness in a column, meaning that every value in the table should be unique, without any duplicates. This is similar to the constraints of a primary key, except that it can have one NULL
value, as a single null value is still unique.
UNIQUE KEY
can be achieved by specifying the UNIQUE
constraint. You can make a column unique with the following statement:
/* Create a table Student with ID as unique column */
/* Specifying UNIQUE constraint at the time of table creation*/
CREATE TABLE Students(
ID INT UNIQUE
Name VARCHAR(255)
);
/* Alter table add ID as unique column*/
/* Specifying UNIQUE constraint when the table is already created*
ALTER TABLE Students
ADD UNIQUE (ID);
A cursor points to a single row in a set of multiple rows, allowing you to retrieve rows one at a time and manipulate their data. It is usually used with the SELECT
statement in places where you need to perform logic to the selected data row by row.
Using a cursor in SQL:
DECLARE
a cursor by specifying its name with the datatype, which must be followed by a SELECT
statement that defines the output of the cursor:
DECLARE cursor_name CURSOR
FOR select_statement;
OPEN
the cursor to store the data that was retrieved as the result of the SELECT
statement:
OPEN cursor_name;
FETCH
the result and move on to the next row. Some database operations, such as INSERT
, UPDATE
, and DELETE
, can be applied for each row, after which the cursor moves to the next row:
FETCH NEXT FROM cursor INTO list_of_variable;
CLOSE
the cursor when you are finished with all the operations:
CLOSE cursor_name;
DEALLOCATE
the cursor, which destroys the cursor, releasing all the resources it had been using:
DEALLOCATE cursor_name;
Joins are SQL statements that are used to combine two or more database tables based on related columns in these tables.
There are four different types of join that are very much important when working on a real-world use case.
INNER JOIN
: This is the most common type of join, and is used to retrieve the common entries from the tables on which this join is applied:
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;
LEFT JOIN
: This type of join retrieves all the records from the left table, as well as any matching records from the right table:
SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;
RIGHT JOIN
: This retrieves all the records from the right table, as well as the matching records from the left table:
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;
FULL JOIN
: A full join retrieves all the records when there is a match in either the left or the right table, as well as the remaining rows from the tables:
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;
An index is a data structure that helps you quickly look up data in the database tables.
It's similar in concept to the index of a book, allowing you to find any chapter quickly and easily. Indexes in SQL enhance the speed of accessing the data, as they eliminate the need to read through the whole table to locate the information needed. With an index, you just need to look at the index of the data you want to access.
CREATE INDEX index_name
ON Table_Name (Column_Name);
There are several types of index that you can define for a table. All these indexes can be used individually and combined, making it a multicolumn index. However, it is preferred to have only one index over multiple columns. Using these indexes totally depends on the use case you are working on.
To look at these indexes in more detail, you can refer to this guide.
A single logical order of data is a transaction. It keeps a database consistent before and after the transaction. It makes sure that database systems can properly process data transactions.
Atomicity, Consistency, Isolation, and Durability all make up the ACID attribute.
COMMIT
, ROLLBACK
, and AUTO-COMMIT
can be called.Online Transaction Processing (OLTP) are data processing systems that execute transaction-focused tasks and enable many people to perform real-time operations on many databases over the internet. OLTP is architectured to prioritize concurrency and decentralization—they're highly available, and often support millions of transactions a day. Crucially, these OLTP systems use relational databases at their core. You can read more about OLTP systems and their potential benefits here.
SQL is an essential skill when working with databases, which are the core components of many real-world applications. SQL also plays an essential role in data science and machine learning, and helps professionals perform data preprocessing and feature engineering tasks with even basic SQL queries.
SQL is a robust technology with many functions, features, and concepts. Make sure you're interview-ready by reviewing the most frequently asked SQL interview questions. Exponent can help you prepare. Exponent is a learning platform that helps you prepare for tech interviews in product management and engineering roles.
Credit: This article was written by Gourav Singh Bais. Gourav is an Applied Machine Learning Engineer and is skilled in developing Machine Learning/Deep learning pipelines, retraining systems, and transforming Data Science prototypes to production-grade solutions.
Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.
Create your free account