NoSQL databases have gained enormous popularity and a mass of advocates. MongoDB is the most popular NoSQL database with a continuously expanding user base at almost the same rate as other RDBs (Relational Databases).
MongoDB is known to be the next-generation document storage system due to its high performance, flexibility, and scalability when working with large sets of distributed data.
Below, we'll go over the most common MongoDB interview questions to help you prepare for your system design, NoSQL Databases, or MongoDB interview. We will start with the basics by simulating real-world interviews with recruiters and then gradually increase the complexity.
Jump to:
MongoDB is an open-source document-oriented NoSQL database that was created in 2007 by Dwight Merriman, Eliot Horowitz, and Kevin Ryan. Rather than tables, columns, and rows, MongoDB is based on collections and JSON-like documents instead of databases.
MongoDB is known to be the best NoSQL database because of its following key features:
NoSQL stands for "Not Only SQL." NoSQL is a database that can handle all types of structured, large, and complex data. NoSQL database types include:
A Document in the MongoDB database is an object that represents a single record, and it is analogous to a row in a SQL table. MongoDB document has a key/value structure.
A Collection is a set of related documents; it acts as the equivalent of RDBs (relational database) tables. MongoDB database is simply a group of collections that hold a set of similar or partially similar documents.
db.createCollection(name,options) // create collection in MongoDB
db.collection.drop() //drop collection in MongoDB
db.collection.insertOne({data}) // create single document at a time
db.collection.insertMany({data}) // insert many documents at once
Syntax to update one document in MongoDB is:
db.collection.updateOne({filter}, {update})
Syntax to insert many documents MongoDB is:
db.collection.updateMany({filter}, {update})
MongoDB NoSQL database stores documents in BSON format, the binary encoded format of JSON. BSON offers different data types over JSON data. However, it uses more space as compared to JSON.
Some data types are:
The combination of a database name with a collection or an index name using the .
separator is called a namespace.
[database-name].[collection-or-index-name]
By default, MongoDB doesn't support primary key-foreign key relationships. Every document in MongoDB contains an _id
key field that uniquely identifies the document. However, this concept can be implemented by embedding one copy inside another.
MongoDB provides the following types of data models:
_id
field.Data points that are related to one another are stored and accessible in a relational database.
The relational model, an easy-to-understand method of representing data in tables, is the foundation of a relational database management system.
Here are some general guidelines linked to schema design in a document-oriented database that highlights important considerations one should consider while modeling data relationships:
In MongoDB, aggregation is a multi-stage data processing pipeline used to run a series of complex operations on a collection of documents.
Each stage in the aggregation pipeline will receive input documents, transform them, and then forward the results as input to the next step down the pipeline until our goal is achieved. The stages in a pipeline can filter, sort, group, reshape and modify documents that pass through the pipeline.
$lookup
aggregation operation.Some of the aggregate stages of MongoDB are:
$match
$count
$lookup
$unwind
$sort
$project
$limit
$merge
$facet
MongoDB eliminates the need for manual database creation by automatically creating one whenever a value is first saved into a defined collection.
We use dot notation in MongoDB to retrieve the array elements and fields of an embedded document.
Mongo is not a relational database; however, doing join is now possible with MongoDB 3.2+. The new $lookup
aggregation operator works in the same way as a left outer join:
{
$lookup:
{
from: <foreign collection>,
localField: <field from local collection's documents>,
foreignField: <field from foreign collection's documents>,
let: { <var_1>: <expression>, …, <var_n>: <expression> },
pipeline: [ <pipeline to run> ],
as: <output array field>
}
}
Indexes are special data structures that hold a subset of the collection's data. The index can store the value of a field or set of fields, sorted by field value.
We use indexes to ensure the efficient execution of queries.
If we don’t have Indexes, MongoDB will do a whole collection scan to match documents requested by a query. If a suitable index exists, MongoDB will only scan through that index, limiting the number of documents it should examine. In addition, MongoDB can return sorted results efficiently by using the ordering in the index.
The syntax for creating an index in MongoDB is :
db.people.createIndex( { fieldName : 1} ) // creates an ascending index
db.people.createIndex( { fieldName: -1} ) // creates a descending index
Following are the various kinds of Indexes in MongoDB:
When an index is too large to be stored in RAM, MongoDB reads the data from disk, which is substantially slower than from RAM.
For a query to be covered, all the fields used in that query should be part of an index, and all the fields returned as a result of that query should be in the same index.
Because all the fields used in the query are part of an index, MongoDB will need to scan only that index to match the query conditions and return the result. Since indexes are saved in RAM, grabbing data from indexes is much faster than grabbing data by scanning all documents.
Data replication is a horizontal scaling technique provided by most databases to ensure high availability, data protection, and increased fault tolerance. Essentially, replication refers to copying the same data from one database to another, creating a cluster of synchronized databases or nodes. If one of the nodes goes down, the application will still be available to users because the other nodes in the cluster are available and can respond to user requests.
Another advantage of data replication across multiple databases or nodes is database load balancing. With replication, read and write requests can be distributed across all available nodes in the cluster instead of exhausting a single node.
In MongoDB, a replica set is a cluster of replicated nodes. The master node in a replica set is the primary node, the only node that can perform write operations. The other nodes in the replica set are called secondary nodes, and they can perform only read operations. Any updates to the primary node are then replicated to the other nodes to ensure data consistency.
A Replica Set needs a minimum of three nodes, a primary node and two secondary nodes. If the primary node goes down, a secondary node will be selected to take the primary node's role by a process called Replica Set Elections.
A MongoDB replica set can hold up to 50 nodes.
The maximum size of a single MongoDB document is 16MB with a maximum nested depth of 100 levels.
In MongoDB, Sharding is a horizontal scaling technique for partitioning or breaking up data records across multiple machines, placing a subset of that data on each shard. Each partition is referred to as a database shard. Each shard can be a different replica set on its own. Sharding is mainly used in highly available systems to handle big data and large workloads.
The shard key, which controls how evenly the collection's documents are distributed throughout the cluster's shards, can either be a single indexed field or several fields covered by a compound index.
The optimum shard key enables MongoDB to support common query patterns while distributing documents uniformly across the cluster.
Sharding Benefits include:
GridFS is a MongoDB file system specification for dealing with large files that exceed the document size limit of 16MB, such as images, audio, files, video files, etc. GridFS can store and retrieve large files by breaking them into chunks and holding each in a separate document. Each piece can be up to 255k in size.
Journaling is temporary storage that keeps the write operation logs in a journaling subdirectory created by MongoDB on your machine until it gets flushed to the core data directory. So, instead of MongoDB immediately writing data to the disk, it logs the write operation and the index modifications in an on-disk journal file first, then write it to the core data directory on an interval basis.
One advantage of journaling is that the records or journals are saved in consecutive tracks, meaning accessing data from the disk will be faster than accessing randomly distributed records (read about disk seek time ). Creating safe backups in case of system failure is another benefit.
In general, Journaling in MongoDB increases database durability and availability.
In MongoDB, data is lazily pushed to disk. The data that was immediately written to the journal is updated. However, writing the data from the journal to disk is done lazily.
MongoDB writes updates to the disk every 60 seconds by default. However, this can be altered using the parameters commitIntervalMs and syncPeriodSecs.
MongoDB has two storage engines: WiredTiger and MMAPv1.
We cannot configure the cache in MongoDB. MongoDB uses memory-mapped files to utilize free spaces on the system automatically.
MongoDB has provided several utilities for accomplishing database backups and restoring databases in bulk. These utility scripts are:
When multiple clients attempt to read or write the same data simultaneously, it becomes crucial to protect data consistency and avoid conflicts. MongoDB handles concurrent operations using multi-granular locking with reader-writer locks at the database or collection level that provide concurrent readers with shared access to a resource but exclusive access to a single write operation.
For example, suppose one write operation acquires the database lock. In that case, all other write operations to the same database (even if they are to a separate collection) are blocked, waiting for the lock to be released.
There are four modes of locking:
Yes. More than one database can be locked during operations like db.copyDatabase()
and db.repairDatabase()
, etc.
By default, MongoDB writes operations are atomic (i.e., provide an "all-or-nothing" proposition) only at the level of a single document.
However, for use cases that demand atomicity of reads and writes to multiple documents, MongoDB 4.0+ supports multi-document ACID transactions even on distributed sharded clusters or replica sets.
So, a transaction is a process of modifying multiple documents as part of a single logical operation that will only succeed if every operation within the transaction has been executed correctly.
Poor schema or query design, improper index usage, or even flaws in the query itself can result in very slow queries. Because test datasets are often small, these performance issues are difficult to detect during the development phase. Additionally, manually evaluating the performance of each question is a highly tedious operation.
MongoDB offers a handy tool called Profiler that can evaluate operations based on specific criteria and log information about how all database operations are executed. The database profiler stores this data in a capped collection called “system.profile”.
Profiler provides three profiling levels.
Capped collections are fixed-size collections. They support high-throughput operations by inserting and retrieving documents based on insertion order.
Capped collections are like circular buffers in the way they work. Capped collections automatically make room for new documents by overwriting their oldest entries.
Cassandra, CouchDB, Redis, Riak, and Hbase are excellent alternatives to MongoDB.
Capped collections ensure insertion order preservation. As a result, queries can return documents in insertion order without needing an index. Capped collections can enable higher insertion speed without this indexing overhead:
The syntax to create a capped collection is as follows:
db.createCollection(<collection_name>, {
capped: Boolean,
autoIndexId: Boolean,
size: Number,
max : Number,
})
MongoDB does not support 32-bit systems because they can only use 2GB of RAM, while MongoDB needs a lot of RAM to store data in caches. So, this restriction is insufficient for MongoDB to be used in production.
Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.
Create your free account