Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

2.5 Consistency and Version Stamps

Lesson 14 of 36 in the free Big Data-1 notes on Siksha Sarovar, written by Rohit Jangra.

2.5.1 The CAP Theorem

Proposed by Eric Brewer, the CAP Theorem states that a distributed system can only provide two of the three following guarantees at once:

  1. Consistency: Every read receives the most recent write or an error.
  2. Availability: Every request receives a (non-error) response, without guarantee that it contains the most recent write.
  3. Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network.

2.5.5 Deep Dive: Apache Cassandra Architecture

Cassandra is a peer-to-peer distributed NoSQL database that avoids the "Single Point of Failure" of master-slave systems.

  • Gossip Protocol: Nodes periodically talk to their neighbors to share state information about the cluster (e.g., which nodes are up/down).
  • Hinted Handoff: If a node is down during a write, a neighbor stores a "hint" and delivers it as soon as the node comes back online.
  • LSM Trees: Unlike RDBMS which use B-Trees, Cassandra uses Log-Structured Merge Trees which are optimized for high-speed writes at the cost of slower reads.

2.5.6 Comparison: Normalization vs Denormalization

FeatureNormalization (SQL)Denormalization (NoSQL)
Data IntegrityHigh (Primary/Foreign Keys).Lower (Must be managed by App).
StorageEfficient (Minimal redundancy).Inefficient (Data is duplicated).
Read SpeedSlower (Requires Joins).Faster (Data is in one record).
OptimizationOptimized for Updates.Optimized for Reads.

2.5.2 Relaxing Consistency: ACID vs BASE

Traditional databases follow ACID properties (Atomicity, Consistency, Isolation, Durability). Big Data systems often follow BASE:

  • Basically Available: The system works even if parts are down.
  • Soft State: The state of the system might change over time, even without input.
  • Eventually Consistent: Given enough time, all copies of the data will eventually be the same.

2.5.3 Version Stamps (Managing Conflict)

When multiple users update the same record on different servers, conflicts occur. Version Stamps help detect these.

  • Content Hash: A unique string generated based on the data.
  • Timestamp: The time of the update.
  • Vector Clock: A sophisticated list of version numbers from all servers that allows detecting whether one update "happened before" another.

Conflict Resolution:

  • Last Write Wins (LWW): The newest timestamp is kept (simple but risky).
  • Application-Specific: Merging the changes (e.g., Git merge or shopping cart merger).

2.5.4 Quorum-Based Consistency

Many NoSQL systems (like Cassandra) allow you to tune your consistency using three numbers:

  • N: The number of replicas (nodes where data is stored).
  • W: The number of nodes that must confirm a Write was successful.
  • R: The number of nodes that must respond to a Read.

The Rule: If W + R > N, you are guaranteed Strong Consistency because the read and write sets will always overlap. If W + R <= N, you have Eventual Consistency.

2.5.7 Comparative Matrix: Serialization Formats

FeatureXMLJSONBSONAvro / Protobuf
ReadabilityHigh (Human)High (Human)Low (Binary)Low (Binary)
Parsing SpeedSlowFastVery FastBlazing Fast
Space EfficiencyPoor (Text tags)ModerateGoodExcellent
Schema RequiredNo (Optional XSD)NoNoYes
Use CaseLegacy APIsREST APIsMongoDBInternal RPC

2.5.8 Conflict Resolution: The Shopping Cart Problem

Imagine two servers both think you added an item to your cart.

  • The Problem: If they don't sync instantly, one update might overwrite the other.
  • The NoSQL Solution: Use Vector Clocks to keep both versions and ask the user to "merge" them (or merge them automatically by taking the union of both lists).