Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Indexing OLAP Data

Lesson 18 of 32 in the free Data Warehousing and Data Mining(Elective-II) notes on Siksha Sarovar, written by Rohit Jangra.

---

8. Indexing OLAP Data

Standard B-Tree indexes (used in OLTP) are often inefficient for Data Warehouses because they handle high cardinality (many unique values) well, but DW dimensions often have low cardinality (few unique values, like 'Male/Female' or 'States').

8.1 Bitmap Indexing

  • Definition: An indexing method that uses a bit vector (string of 0s and 1s) for each unique value in a column.
  • Mechanism:
  • If a column has 3 values (Red, Green, Blue), the index creates 3 bitmaps.
  • If Row 1 is Red, the 'Red' bitmap has a 1 at position 1. 'Green' and 'Blue' have 0.
  • Advantages:
  • Efficient for Low Cardinality: Perfect for Gender, Status, Region.
  • Fast Boolean Operations: Queries like WHERE Gender='Male' AND Region='West' are resolved by performing bitwise AND operations (extremely fast for CPUs).
  • Compression: Runs of zeros compress very well.

8.2 Join Indexing

  • Problem: In a Star schema, joining the large Fact table to Dimension tables is expensive.
  • Definition: A Join index pre-joins the foreign key in the Fact table with the attribute in the Dimension table.
  • Mechanism: It stores pairs of RowIDs: (RowID of Fact Table, RowID of Dimension Table).
  • Example: A join index on Sales_Fact(Customer_ID) and Customer_Dim(City). The index points directly from the City attribute to the relevant Fact table rows, bypassing the lookup in the Dimension table.
  • Benefit: Speeds up queries that filter on dimension attributes.

8.3 Bitmapped Join Index

  • Definition: A hybrid index combining Bitmap and Join indexes.
  • Mechanism: It creates a bitmap index on the Fact table's foreign key, but logically linked to the Dimension table's attribute.
  • Use Case: Best for filtering queries on Dimension attributes with low cardinality against a massive Fact table.
  • Query: "Count sales for Product Category 'Electronics'."
  • Index: A bitmap where bit 1 corresponds to a Fact row belonging to Electronics.
  • Performance: The database counts the 1s in the bitmap without even touching the Product Dimension table or scanning the Fact table rows.

8.4 Indexing Summary Table

Index TypeBest Use CaseMechanismPerformance Gain
B-TreeHigh Cardinality (Unique IDs).Tree structure of values to RowIDs.Good for point lookups. Bad for warehousing.
BitmapLow Cardinality (Gender, Status).Bit vectors (0/1) for each value.Massive speedup for AND/OR queries.
Join IndexStar Schema Joins.Pre-joined RowID pairs.Avoids runtime table joins.
Bitmapped JoinStar Schema filtering on Dimensions.Bitmap on Fact linked to Dimension attribute.Fastest for analytical filtering.

---

Summary of Unit II

Unit II focuses on the operational layer of Data Warehousing.

  1. OLAP provides the interactive interface for analysis, transforming static data into dynamic cubes.
  2. OLAP Operations (Slice, Dice, Roll-up, Drill-down) provide the navigational tools for business users to explore data hierarchies.
  3. Server Architectures (ROLAP, MOLAP, HOLAP) offer trade-offs between the scalability of relational storage and the speed of multidimensional arrays.
  4. Implementation relies heavily on strategies to manage data volume, particularly through Efficient Computation (Iceberg cubes, partial pre-computation) to handle the exponential nature of aggregations.
  5. Query Processing and Indexing (specifically Bitmap and Join indexes) ensure that analytical queries return results in seconds, even when scanning billions of rows.