5 Criteria to choose between RDBMS and NoSQL Databases
Choosing the right database is very important for your application. This article will give you criteria and comparisons to decide the best one.
In my career, I have always faced the dilemma of choosing the right database for the next project/service. I always had to prove my choice to my seniors. Based on those experiences I have come up with certain criteria that help me make this decision faster and make it easily understandable.
Also, this helps you answer most of the database-related questions in your Interviews across various rounds.
Before we start, just a simple refresher:
RDBMS = Relational Database e.g. Postgresql, Mysql. Arranges data in tabular format with rows and columns.
NoSQL = Non-Relational Database e.g. MongoDB, Cassandra. Arranges data in special formats like key-value pairs, documents, or columnar format.
Let's get into the criteria to use to choose a perfect database for your use case.
Criteria
1.Structure of the Data
RDBMS:
When the structure of data is clear and entities can be described with some relationships between them.
You need to design everything (schema, relationships, and constraints) ahead of time.
- The best practice is to design a highly optimized schema so that data redundancy can be avoided.
Supports ALTER table and column operations but these operations are a little inefficient (may involve locking the whole table).
- some of the databases support online schema change - background copy and rename.
NoSQL:
When the data is loosely coupled and entities are heavily read or written together.
No need to define the schema ahead of time.
Model the data according to the need of the application - read/write optimized. Generally, there are different databases for different requirements.
But no table-level joins are supported. Most of them do not support row/table level constraints as well. This needs additional work at the application level.
2.Querying capabilities
RDBMS
Supports various types of queries like point, range, and text search across single or multiple columns with the help of the different types of indexes.
Supports join capabilities across multiple tables.
I feel it wins here with a natural query language - SQL. (Debatable but yeah I feel this is a very important feature.)
NoSQL
Each NoSQL database has specific querying capabilities based on the type of data storage.
Limited with index supports.
Supports aggregates using Map-Reduce.
Limited with join capabilities, so if you need data together, good practices suggest keeping those data items together for access.
3.Need for transactions
RDBMS
It has been known for this feature :D. Supports ACID properties for transactions.
Transactions are supported for single as well as multiple rows.
NoSQL
It is known to drop this feature in favor of better scalability.
Transactions are mostly supported at single row level.
There are some exceptions like the google cloud store which supports transactions at the entity group level as well as cross-entity group level.
- This involves 2 phase commit so this doesn't scale well & has a limitation of 5 entity groups.
4.Consistency of data (read what you have just written)
RDBMS
Supports strong consistency if only one node is involved.
When multiple nodes are involved it mostly uses master-replica architecture and with it,
if async replication is used it may cause data loss if the master fails before complete replication.
if sync replication is used it can cause latency for writes.
NoSQL
Mostly involves multiple nodes with a peer-to-peer architecture having parallel transaction log replication, but this causes eventual consistency.
- Of course, there are exceptions, but this is the most common expectation.
5.Scalability
RDBMS
Supports read scaling with multiple read replicas.
The only way to scale writes is vertical scaling (using bigger/better machines).
- There are ways to shard the data for write scaling, but sharding is not supported out of the box and needs additional setup.
NoSQL
Naturally lends itself to read-write scaling with support for horizontal scaling (adding more machines linearly).
- This is mostly achieved by in-built smart data partitioning with load balancing and loosening some of the ACID guarantees.
Based on the above criteria, the decision can be seen as:
Decision Summary
RDBMS when:
involves structured data with infrequent schema changes
a lot of querying on each and every field is required with support for joins
optimized storage is required with the normalized data schema
transactions are needed across multiple rows
strong consistency is required for almost all the operations
strict schema enforcement is required
general use cases (OLTP) - payments systems required consistency and transactions, user management systems, content management systems
NoSQL when:
involves unstructured/semi-structured data with schema changes
scalability for large volumes and velocity of data is required
general use cases - caching/transient data storage (shopping cart), big data applications such as social media analytics & IoT systems, managing complex relationships between data, such as social networks or recommendation systems.
I hope this has given you some clarity in terms of applying different criteria and looking at the practical trade-offs while choosing the database. Most of these points have their own intricacies and gotchas, but I hope we can discuss them in more detail in the next posts.