MongoDB Transaction a walk through

In this blog I will walk through the Transaction in MongoDB. It’s quite an interesting topic to discuss, nowadays most of the NoSQL technology adopting transactions in their database system and MongoDB also not an exception in this.

The WiredTiger storage engine plays a very vital role in accomplishing the transaction in MongoDB. The interesting fact here’s, Wiredtiger has transaction capabilities in it even before MongoDB integrates into it.

unnamed

What is transaction ?

The main objective of the transaction is to provide strong data consistency to the end user.For achieving the data consistency by using ACID properties in MongoDB.

ACID stands for Atomicity, Consistency, Isolation, Durability

  • Atomicity – It ensures that either all the operations of a transaction reflect in the database or none. By default in mongoDB an operation on a single document is atomic, even if the operation modifies multiple embedded documents within a single document.
  • Consistency – To Maintain the consistency of the database, the execution of transactions should take place in isolation. In MongoDB read & write goes to primary server by default (W:1). If the write operation is committed in primary it doesn’t mind weather the committed data is replicated to other servers or not. So we have to use W : majority for making data consistency.
  • Isolation – For every pair of transactions, one transaction should start execution only when the other finishes execution. We can use CRUD operation against single collection or different collection inside the transaction. The modification of each document is atomic, but the operation as a whole is not atomic.
  • Durability – Once a transaction completes successfully, the changes it has made into the database should be permanent even if there is a system failure. 

Why a NoSQL(MongoDB) needs transaction ?

Non-relational databases like MongoDB, datas are stored in a single collection in the format of key-value pairs. we can use Embedded & sub documents to store different data in the same document. If we perform an Insert or update operation it will affect multiple fields in the same documents. So it doesn’t affect multiple tables.

In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document. The single-document atomicity obviates the need for multi-document transactions for many practical use cases.

If we perform reads and writes to multiple documents (in a single or multiple collections) on that time we required transaction for data consistency. We can achieve it by using multi-document transactions in MongoDB

The prime reason that Transactions were not implemented by design in NoSQL systems is because of the following reason.

  • Adds complexity in implementation and it affects performance.
  • NoSQL systems encourages to denormalise data and store in flat structures, In this case, Having many collections and multi-document transaction will become an anti-pattern.
  • Lack of robust storage engine implementation to ensure ACID.

MongoDB supports multi-document transactions based on the below scenarios. 

  • It must be a replica set.( Even it’s single node replica set)
  • It’s supported only for wired tiger storage engines.
  • In MongoDB 4.2 there is new featured called distributed transactions, which adds support for multi-document transactions on sharded clusters 

Multi-document transactions

  • Multi-document transactions can be used across multiple operations, collections, databases, and documents. It supports an “all-or-nothing” proposition.
  • When a transaction commits, all data changes made in the transaction are saved. 
  • If any operation in the transaction fails, the transaction aborts and all data changes made in the transaction are discarded without ever becoming visible.
  • Until a transaction commits, no write operations in the transaction are visible outside the transaction.
  • In MongoDB Multi-document transactions are available on replica sets only.

Limitations of Transaction

  • Transactions cannot make writes to a capped collection
  • MongoDB doesn’t support drop collection, drop databases inside the transaction.
  • MongoDB transaction size limits to 16MB.  So bulk updates or inserts cannot be processed within a single transaction.
  • We can’t mention the user creation statement inside the transaction.
  • Not able to read or write in config, admin, and local databases inside the transaction
  • We cannot write to the system.* collections inside the transaction.

Transaction scenarios -I ( Atomicity )

I have configured a single node replica set. First we have to create mydbops collection under test collection and insert some documents on it.

I have configured a single node replica set. First we have to create mydbops collection under test collection and insert some documents on it. 

mydbops_trx:PRIMARY> use test
switched to db test
mydbops_trx:PRIMARY> db.mydbops.insertMany([{"_id": 1, "NAME": "GEORGE", "DEPT": "DBA"},{"_id": 2, "NAME": "TITUS",  "DEPT": "LINUX ADMIN"}])
{ "acknowledged" : true, "insertedIds" : [ 1, 2 ] }

Now we are going  to insert some documents by using transactions in session 1.

mydbops_trx:PRIMARY> var session1 = db.getMongo().startSession();
mydbops_trx:PRIMARY> var session1mydbopsColl = session1.getDatabase('test').getCollection('mydbops');
mydbops_trx:PRIMARY> session1.startTransaction({readConcern: {level: 'snapshot'}, writeConcern: {w: 'majority'}});
mydbops_trx:PRIMARY> session1mydbopsColl.insert({"_id": 3, "NAME": "HARAN", "DEPT": "DBA"});
WriteResult({ "nInserted" : 1 })
mydbops_trx:PRIMARY>

We can view only the newly inserted documents only inside the  session 1.

mydbops_trx:PRIMARY> session1mydbopsColl.find()
{ "_id" : 1, "NAME" : "GEORGE", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }
{ "_id" : 3, "NAME" : "HARAN", "DEPT" : "DBA" }

We can’t see the newly inserted document from outside the session  ( a new session ).

mydbops_trx:PRIMARY> db.mydbops.find()
{ "_id" : 1, "NAME" : "GEORGE", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }

Let’s see after committing the transaction.

mydbops_trx:PRIMARY> session1.commitTransaction()
mydbops_trx:PRIMARY> session1.endSession()
mydbops_trx:PRIMARY> db.mydbops.find()
{ "_id" : 1, "NAME" : "GEORGE", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }
{ "_id" : 3, "NAME" : "HARAN", "DEPT" : "DBA" }

Now we can able to see the newly inserted document in other sessions too.

Initiate two concurrent transactions

We are going to create two sessions and initiate transactions on each session.

Session1: (Before committing transaction)

mydbops_trx:PRIMARY> db.mydbops.find()
{ "_id" : 1, "NAME" : "GEORGE", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }

mydbops_trx:PRIMARY> var session1 = db.getMongo().startSession();
mydbops_trx:PRIMARY> var session1mydbopsColl = session1.getDatabase('test').getCollection('mydbops');
mydbops_trx:PRIMARY> session1.startTransaction({readConcern: {level: 'snapshot'}, writeConcern: {w: 'majority'}});

Update “_id” document to change “NAME” field value as JOHNSON

mydbops_trx:PRIMARY> session1mydbopsColl.updateOne({"_id": 1}, {"$set": {"NAME": "JOHNSON"}} );
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }

We can see the updated document only from inside the  session

mydbops_trx:PRIMARY> session1mydbopsColl.find()
{ "_id" : 1, "NAME" : "JOHNSON", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }

Not able to see the updated document from outside the transaction.

mydbops_trx:PRIMARY> db.mydbops.find()
{ "_id" : 1, "NAME" : "GEORGE", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }
mydbops_trx:PRIMARY>

Session2: (Before committing transaction)

mydbops_trx:PRIMARY> db.mydbops.find()
{ "_id" : 1, "NAME" : "GEORGE", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }
mydbops_trx:PRIMARY> var session2 = db.getMongo().startSession();
mydbops_trx:PRIMARY> var session2mydbopsColl = session2.getDatabase('test').getCollection('mydbops');
mydbops_trx:PRIMARY> session2.startTransaction({readConcern: {level: 'snapshot'}, writeConcern: {w: 'majority'}});

Insert one new document in mydbops collections.

mydbops_trx:PRIMARY> session2mydbopsColl.insert({"_id": 3, "NAME": "HARAN", "DEPT": "DBA"});
WriteResult({ "nInserted" : 1 })

We have seen only the changes done in session2 transaction.

mydbops_trx:PRIMARY> session2mydbopsColl.find()
{ "_id" : 1, "NAME" : "GEORGE", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }
{ "_id" : 3, "NAME" : "HARAN", "DEPT" : "DBA" }

We can’t able to see the changes done in session1 & session2 from outside the transaction

mydbops_trx:PRIMARY> db.mydbops.find()
{ "_id" : 1, "NAME" : "GEORGE", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }

Session1: (After committing transaction)

mydbops_trx:PRIMARY> session1.commitTransaction()
mydbops_trx:PRIMARY> session1.endSession()

We can able to see the session1 changes from outside the transaction after committing the session1 transaction 

mydbops_trx:PRIMARY> db.mydbops.find()
{ "_id" : 1, "NAME" : "JOHNSON", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }

Session2: (After committing transaction)

mydbops_trx:PRIMARY> session2.commitTransaction()
mydbops_trx:PRIMARY> session2.endSession()

We are able to see all the changes which are done in session1 & session2 from outside those transactions post the commit ( Atomicity ).

mydbops_trx:PRIMARY> db.mydbops.find()
{ "_id" : 1, "NAME" : "JOHNSON", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }
{ "_id" : 3, "NAME" : "HARAN", "DEPT" : "DBA" }

Conflicts in Transaction: ( Deadlocks )

When two or more concurrent transactions modify (update or delete) the same documents, on that time we are getting conflict in the transaction. MongoDB can detect a conflict immediately, even if the transactions are not yet committed. 

In session1 we have deleted one document (_id:1) in mydbops collection but still not committed

mydbops_trx:PRIMARY> db.mydbops.find()
{ "_id" : 1, "NAME" : "GEORGE", "DEPT" : "DBA" }
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }

mydbops_trx:PRIMARY> var session1 = db.getMongo().startSession();
mydbops_trx:PRIMARY> var session1mydbopsColl = session1.getDatabase('test').getCollection('mydbops');
mydbops_trx:PRIMARY> session1.startTransaction({readConcern: {level: 'snapshot'}, writeConcern: {w: 'majority'}});

mydbops_trx:PRIMARY> session1mydbopsColl.deleteOne({"_id": 1});
{ "acknowledged" : true, "deletedCount" : 1 }

mydbops_trx:PRIMARY> session1mydbopsColl.find()
{ "_id" : 2, "NAME" : "TITUS", "DEPT" : "LINUX ADMIN" }

In session2 we are going to update the Name field value on the same document (_id:1) in mydbops collection. Let’s see what happens.

mydbops_trx:PRIMARY> session2mydbopsColl.updateOne({"_id": 1}, {"$set": {"NAME": "JOHNSON"}}  );
2020-07-16T00:30:56.513+0530 E  QUERY    [js] uncaught exception: WriteCommandError({
"errorLabels" : [
"TransientTransactionError"
],
"operationTime" : Timestamp(1594839650, 1),
"ok" : 0,
"errmsg" : "WriteConflict",
"code" : 112,
"codeName" : "WriteConflict",
"$clusterTime" : {
"clusterTime" : Timestamp(1594839650, 1),
"signature" : {
"hash" : BinData(0,"e0pNzplcdMSFE0Z2ZYmHwU7mEKk="),
"keyId" : NumberLong("6846834767892054020")
}}}) :

We are getting the error “TransientTransactionError“. Now we have to retry session 1 transaction.
Hope this blog gives a basic idea about MongoDB transactions.

We will see more about transaction in upcoming blog.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s