Aim
To create a pandas DataFrame containing e-commerce order data and perform row and column selection using the label-based indexer loc[] and the position-based indexer iloc[].
CO Mapping: CO1, CO2, CO3
Theory
A DataFrame is a two-dimensional, size-mutable, labelled data structure — conceptually a dictionary of Series that share a common row index. Every cell is therefore addressable in two independent ways:
loc[rows, cols]— label-based selection. Rows are chosen by their index labels and columns by their names. Slices are inclusive of both endpoints (loc[1:3]returns labels 1, 2 and 3), because labels have no natural "one past the end".iloc[rows, cols]— integer-position selection. Rows and columns are chosen by 0-based positions, and slices follow normal Python half-open semantics (iloc[0:3]returns positions 0, 1, 2 only).
This distinction is the single most common source of off-by-one bugs in analytics code. It matters even more after operations such as sort_values(), dropna() or boolean filtering, which leave "holes" in the index: loc[5] then means the row labelled 5, wherever it now sits, while iloc[5] means the sixth physical row. Subsetting is the pandas equivalent of SQL's SELECT ... WHERE ... and is the first step of virtually every analysis, so fluency here pays off in every later practical.
Dataset
The student creates this 5-row order table (built in the snippet as the ecommerce DataFrame with a default RangeIndex 0–4):
| Index | OrderID | Customer | Category | Sales |
|---|---|---|---|---|
| 0 | 101 | Aman | Electronics | 25000 |
| 1 | 102 | Riya | Books | 1200 |
| 2 | 103 | Kabir | Fashion | 3400 |
| 3 | 104 | Sneha | Electronics | 18000 |
| 4 | 105 | Vikas | Books | 800 |
Procedure
- Import pandas as
pd. - Build the
ecommerceDataFrame from a column dictionary; pandas assigns the default integer index 0–4. - Print the full DataFrame and confirm the index labels equal the positions (they coincide here, which is exactly why both indexers must be tested consciously).
- Run
ecommerce.loc[1:3, ["OrderID", "Category", "Sales"]]— a label slice plus an explicit column list. - Run
ecommerce.iloc[0:3, 0:3]— a position slice on both axes. - Count the rows returned by each call and note which endpoint was included.
- (Extension) Execute
ecommerce.sort_values("Sales").loc[1:3]and the same withilocto see the two indexers diverge once row order changes.
Interpretation of Results
loc[1:3] returns three rows (labels 1, 2, 3 → orders 102, 103, 104) while iloc[0:3] also returns three rows but a different set (positions 0, 1, 2 → orders 101, 102, 103). The overlap of two rows with different boundaries is the visual proof of inclusive-vs-exclusive slicing. Reading the output analytically: Electronics orders carry the largest ticket sizes (25000, 18000) while Books orders are small-value — a subset like loc[:, ["Category", "Sales"]] is how an analyst isolates just the fields needed for a category-revenue question instead of dragging the whole table around.
Common Mistakes
- Assuming
loc[1:3]excludes 3 like a list slice — it is endpoint-inclusive. - Using
ilocwith column names (iloc[0:3, ["Sales"]]raisesIndexError); positions only. - Chained indexing such as
df[df.Sales > 1000]["Category"] = ..., which triggersSettingWithCopyWarningand silently fails to write — always use a singleloccall for assignment.
🎯 Viva Questions
- Why is
locendpoint-inclusive? Labels have no natural successor, so pandas includes both endpoints of a label slice. - What happens to
loc/ilocafter sorting?locfollows the (now shuffled) labels;ilocfollows the new physical order. - How do you select rows 0–2 of only the first three columns by position?
df.iloc[0:3, 0:3]. - What does
df.loc[df["Sales"] > 5000]return? All rows whose boolean mask is True — boolean indexing works insideloc. - Difference between
df["Sales"]anddf[["Sales"]]? The first is a Series, the second a one-column DataFrame. - Which indexer should be used with a string index like customer names?
loc, because positions are then meaningless to the reader.