Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

2.3 Graph and Schemaless Databases

Lesson 12 of 36 in the free Big Data-1 notes on Siksha Sarovar, written by Rohit Jangra.

2.3.1 Graph Databases

Graph Databases focus on the relationships (edges) between data points (nodes). In a relational DB, modeling complex relationships (like "Friends of Friends") requires recursive joins which are incredibly slow.

Core Elements:

  1. Nodes (Entities): e.g., A person, a product.
  2. Edges (Relationships): e.g., "Person A knows Person B," "Person A ordered Product X."
  3. Properties: Key-value pairs stored on nodes or edges (e.g., "Relationship started on 2024-01-01").
  • Best for: Social networks, recommendation engines, fraud detection.
  • Example: Neo4j.

2.3.2 Graph Algorithms for Big Data

Graph databases allow for complex mathematical analysis of relationships:

  • PageRank: Developed by Google to rank web pages. It measures the importance of a node based on the number and quality of links to it.
  • Centrality Measures: Identifying the most influential person in a social network or the most critical node in a power grid.
  • Pathfinding (Dijkstra's): Finding the shortest or "cheapest" path between two nodes, used extensively in logistics and GPS software.

2.3.3 Graph Use Case: Supply Chain Visibility

In a global supply chain, a "Part" might go through 10 factories and 5 shipping companies. A Graph DB allows a company to instantly see: "If Factory X burns down, which 500 final products will be delayed?"

2.3.2 Schemaless: Freedom and Chaos

NoSQL databases are often called Schemaless, but it's more accurate to say they are Schema-on-Read.

  • Schema-on-Write (RDBMS): You must define columns before saving data. If data doesn't fit, it's rejected.
  • Schema-on-Read (NoSQL): You save whatever you want. The application code must provide the structure when it reads the data.

The Trade-off:

Pros of SchemalessCons of Schemaless
Faster development (no migrations).Risk of "Dirty Data" (multiple formats for one field).
Handles varying data (e.g., product specs).Logic moves from DB to Application code.
No downtime for structural changes.Reporting tools (BI) struggle without a fixed schema.

2.3.3 Materialized Views

In NoSQL, complex queries can be slow because we avoid joins. To solve this, we use Materialized Views.

  • Concept: We pre-calculate the result of a query and save it in a separate table/collection.
  • Maintenance: When the source data changes, we update the materialized view background (Asynchronous update).
  • Example: Summing total sales for every city every 15 minutes, rather than calculating it every time a dashboard is loaded.