by Ultra Tendency
Share
by Ultra Tendency

In today’s world, maintaining data integrity with evermore crescent volumes of data is a critical challenge for businesses that are searching for greater reliability and performance; as such, understanding the basic principles of data integrity and transactions is essential, from ACID transactions to BASE this blogpost will look into how traditional relational and more modern NoSQL DataBases and DataLakes look into solving the data integrity problem.
ACID transaction guarantees
Transactions
A transaction is a grouping of several DataBase operations into one single unit of work; this unit should either successfully complete together, or abort together. Each different DataBase type and vendor implements transactions differently, sometimes sacrificing more or less speed for safety and vice-versa, however one of the most common frameworks used in the context of transactions is defined by the acronym ACID.
ACID
ACID stands for Atomicity, Consistency, Isolation and Durability, and is the most common set of guarantees provided by Relational Databases which taught themselves as ACID compliant. In practice what each vendor means with this term is slightly different, especially concerning isolation.
Atomicity
The word Atom derives from the greek Atomos meaning uncuttable, as such, a transaction should behave as utterly indivisible from the perspective of the DataBase or any clients accessing it, in other terms, either all of the steps in a transaction succeed together or, in the case of any fault, they should abort and rollback together.
Consistency
Consistency is the idea that there are certain invariant statements that should be maintained throughout the Databases’ life span. These encompass constraints, data integrity, consistency between partitions; guaranteeing as such that the Database adheres to a set of predefined rules.
Isolation
Isolation means concurrently executing multiple transactions from each other without them affecting the results of the other ones. Ideally the resulting DataBase state would be the same as if all operations had been run serially.
Durability
Durability is the concept that the data, after being written, will be maintained even if problems occur in the DataBase, providing reliability and stability.
Race Conditions
A Race Condition occurs when multiple transactions try to read or write the same sets of data, which can lead to unpredictable results. Here are some examples of Race conditions:
Dirty Reads
If a client reads a DataBase while another is changing it but still hasn’t committed it, the read might contain uncommitted data which might be inconsistent with the correct final state of the committed data.

Dirty Writes
When two clients are trying to modify the same data, there exists the possibility of one of them overwriting the other’s data before it being committed.

Read Skew
If a client reads from the DataBase when another is updating it, it can receive results from different points in time, some from before and some from after the update, causing reads that are inconsistent with one another.

Write Skew
When two clients try to write or update data, ont of them can update older uncommitted vales of the other transaction that is inconsistent with the newer updated data.
Lost Updates
When two clients preform a read-modify-write, one of them can update the transaction without taking into account the changes made by the other client.
Phantom reads
When a transaction reads the same set of rows twice and returns different results due to it being updated by another client in the meantime.
Isolation Levels
Weak Isolation Levels
In practice, each DataBase vendor implements Isolation differently, the strongest level of isolation Serializable Isolation, comes with significant performance costs and, as such, a lot of systems forego ideal Isolation for weaker guarantees of safety in order to achieve higher speeds and support a broader range of use cases.
Read Committed
Read committed is one of the simplest and more popular Isolation levels, it prevents a reader from reading any uncommitted data, and locks any data that is being written until a transaction completely commits. This prevents Dirty Reads and Dirty Writes.
Snapshot Isolation
In Snapshot Isolation, each transaction reads from a snapshot of the DataBase taken at the start of the transaction. Additionally it locks DataBase writes in a way where readers never block writers and writers never block readers, this provides us with all of the guarantees of Read Committed and also prevents Read Skew.
Serializable Isolation
Serializable Isolation is the Strongest Isolation level. While there are several different implementations, the end result should mirror the result of all of the transactions being run in a serial manner, even if they were run in parallel.
Serial Execution
The simplest way of implementing Serializable Isolation is to forego parallelism and just run everything serially, this will impact performance and, as such, should only be reserved for use cases where the entire DataBase can fit in memory and every transaction is small enough to be run in an expedite manner in a single CPU core. In-memory databases like Redis often use serial execution for simplicity and performance.
Two phase locking
Two Phase Locking uses a system of locks similar to the ones used in the Weak Isolation Levels previously discussed, but, in this case, transactions are only allowed to read data that no other client is writing to, writers not only block other writers, as before, they also block all readers; preventing all other race conditions. The biggest drawbacks of Two Phase Locking are its degraded performance, and very high latencies on systems with high concurrency.
Serializable Snapshot Isolation
Serializable Snapshot Isolation is a fairly recent algorithm that is being used in some modern DataBases, it is an optimistic control algorithm directly in contrast to pessimistic control algorithms like the aforementioned Two Phase Locking as it allows all transactions to proceed without blocking. When a transaction commits the DataBase checks whether or not it was serializable and retries as needed. This approach can provide higher parallelism compared to other Strong Isolation techniques, but it can bring higher overheads and applications need to be able to handle DataBase rollbacks.
NoSQL – Sacrificing ACID in favour of BASE
In modern days, the need to process enormous amounts of data demands gigantic scalability from DataBases. In a 24/7 connected world, higher availability and fault tolerance might be preferred to higher levels of safety; as such, some DataBases often sacrifice ACID properties in favour of a softer set of transaction guarantees known as BASE
BASE properties
- Basically Available: Even in the presence of failures, the system should guarantee availability.
- Soft State: Even with no inputs being given the state of the system might change in order to achieve eventual consistency
- Eventually Consistent: While no consistency can be guaranteed in a specific moment in time, the system will evolve into a consistent state given no further updates
Many NoSQL DataBases, such as HBase and Cassandra, use these BASE principles in order to provide performances and scalability levels that would be incompatible with strict ACID compliance.
While BASE helps us achieve the higher performance and availability that some products require, it brings with it higher application complexity and the need for applications to be able to support the lack of data consistency on a specific moment in time. Adopting BASE principles means that developers must design their systems to handle eventual consistency.
Modern Data Lakes
Recently, products like Databricks’ Delta Lake and Apache Iceberg have been working to bring the big data world of mostly BASE data lakes into the realm of higher consistency with ACID properties.
These technologies offer greater versatility for a broader range of applications, and higher ease of use, but might lose some specificity and perform worse in certain scenarios compared to traditional BASE data lakes.
Looking for the right training?
We got you: