---
8. Indexing OLAP Data
Standard B-Tree indexes (used in OLTP) are often inefficient for Data Warehouses because they handle high cardinality (many unique values) well, but DW dimensions often have low cardinality (few unique values, like 'Male/Female' or 'States').
8.1 Bitmap Indexing
- Definition: An indexing method that uses a bit vector (string of 0s and 1s) for each unique value in a column.
- Mechanism:
- If a column has 3 values (Red, Green, Blue), the index creates 3 bitmaps.
- If Row 1 is Red, the 'Red' bitmap has a 1 at position 1. 'Green' and 'Blue' have 0.
- Advantages:
- Efficient for Low Cardinality: Perfect for Gender, Status, Region.
- Fast Boolean Operations: Queries like
WHERE Gender='Male' AND Region='West'are resolved by performing bitwise AND operations (extremely fast for CPUs). - Compression: Runs of zeros compress very well.
8.2 Join Indexing
- Problem: In a Star schema, joining the large Fact table to Dimension tables is expensive.
- Definition: A Join index pre-joins the foreign key in the Fact table with the attribute in the Dimension table.
- Mechanism: It stores pairs of RowIDs: (RowID of Fact Table, RowID of Dimension Table).
- Example: A join index on
Sales_Fact(Customer_ID)andCustomer_Dim(City). The index points directly from the City attribute to the relevant Fact table rows, bypassing the lookup in the Dimension table. - Benefit: Speeds up queries that filter on dimension attributes.
8.3 Bitmapped Join Index
- Definition: A hybrid index combining Bitmap and Join indexes.
- Mechanism: It creates a bitmap index on the Fact table's foreign key, but logically linked to the Dimension table's attribute.
- Use Case: Best for filtering queries on Dimension attributes with low cardinality against a massive Fact table.
- Query: "Count sales for Product Category 'Electronics'."
- Index: A bitmap where bit 1 corresponds to a Fact row belonging to Electronics.
- Performance: The database counts the 1s in the bitmap without even touching the Product Dimension table or scanning the Fact table rows.
8.4 Indexing Summary Table
| Index Type | Best Use Case | Mechanism | Performance Gain |
|---|---|---|---|
| B-Tree | High Cardinality (Unique IDs). | Tree structure of values to RowIDs. | Good for point lookups. Bad for warehousing. |
| Bitmap | Low Cardinality (Gender, Status). | Bit vectors (0/1) for each value. | Massive speedup for AND/OR queries. |
| Join Index | Star Schema Joins. | Pre-joined RowID pairs. | Avoids runtime table joins. |
| Bitmapped Join | Star Schema filtering on Dimensions. | Bitmap on Fact linked to Dimension attribute. | Fastest for analytical filtering. |
---
Summary of Unit II
Unit II focuses on the operational layer of Data Warehousing.
- OLAP provides the interactive interface for analysis, transforming static data into dynamic cubes.
- OLAP Operations (Slice, Dice, Roll-up, Drill-down) provide the navigational tools for business users to explore data hierarchies.
- Server Architectures (ROLAP, MOLAP, HOLAP) offer trade-offs between the scalability of relational storage and the speed of multidimensional arrays.
- Implementation relies heavily on strategies to manage data volume, particularly through Efficient Computation (Iceberg cubes, partial pre-computation) to handle the exponential nature of aggregations.
- Query Processing and Indexing (specifically Bitmap and Join indexes) ensure that analytical queries return results in seconds, even when scanning billions of rows.