Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Practical 1: DataFrame Selection with loc() and iloc()

Lesson 1 of 15 in the free Data Visualisation and Analytics Lab notes on Siksha Sarovar, written by Rohit Jangra.

Aim

To create a pandas DataFrame containing e-commerce order data and perform row and column selection using the label-based indexer loc[] and the position-based indexer iloc[].

CO Mapping: CO1, CO2, CO3

Theory

A DataFrame is a two-dimensional, size-mutable, labelled data structure — conceptually a dictionary of Series that share a common row index. Every cell is therefore addressable in two independent ways:

  • loc[rows, cols] — label-based selection. Rows are chosen by their index labels and columns by their names. Slices are inclusive of both endpoints (loc[1:3] returns labels 1, 2 and 3), because labels have no natural "one past the end".
  • iloc[rows, cols] — integer-position selection. Rows and columns are chosen by 0-based positions, and slices follow normal Python half-open semantics (iloc[0:3] returns positions 0, 1, 2 only).

This distinction is the single most common source of off-by-one bugs in analytics code. It matters even more after operations such as sort_values(), dropna() or boolean filtering, which leave "holes" in the index: loc[5] then means the row labelled 5, wherever it now sits, while iloc[5] means the sixth physical row. Subsetting is the pandas equivalent of SQL's SELECT ... WHERE ... and is the first step of virtually every analysis, so fluency here pays off in every later practical.

Dataset

The student creates this 5-row order table (built in the snippet as the ecommerce DataFrame with a default RangeIndex 0–4):

IndexOrderIDCustomerCategorySales
0101AmanElectronics25000
1102RiyaBooks1200
2103KabirFashion3400
3104SnehaElectronics18000
4105VikasBooks800

Procedure

  1. Import pandas as pd.
  2. Build the ecommerce DataFrame from a column dictionary; pandas assigns the default integer index 0–4.
  3. Print the full DataFrame and confirm the index labels equal the positions (they coincide here, which is exactly why both indexers must be tested consciously).
  4. Run ecommerce.loc[1:3, ["OrderID", "Category", "Sales"]] — a label slice plus an explicit column list.
  5. Run ecommerce.iloc[0:3, 0:3] — a position slice on both axes.
  6. Count the rows returned by each call and note which endpoint was included.
  7. (Extension) Execute ecommerce.sort_values("Sales").loc[1:3] and the same with iloc to see the two indexers diverge once row order changes.

Interpretation of Results

loc[1:3] returns three rows (labels 1, 2, 3 → orders 102, 103, 104) while iloc[0:3] also returns three rows but a different set (positions 0, 1, 2 → orders 101, 102, 103). The overlap of two rows with different boundaries is the visual proof of inclusive-vs-exclusive slicing. Reading the output analytically: Electronics orders carry the largest ticket sizes (25000, 18000) while Books orders are small-value — a subset like loc[:, ["Category", "Sales"]] is how an analyst isolates just the fields needed for a category-revenue question instead of dragging the whole table around.

Common Mistakes

  1. Assuming loc[1:3] excludes 3 like a list slice — it is endpoint-inclusive.
  2. Using iloc with column names (iloc[0:3, ["Sales"]] raises IndexError); positions only.
  3. Chained indexing such as df[df.Sales > 1000]["Category"] = ..., which triggers SettingWithCopyWarning and silently fails to write — always use a single loc call for assignment.

🎯 Viva Questions

  1. Why is loc endpoint-inclusive? Labels have no natural successor, so pandas includes both endpoints of a label slice.
  2. What happens to loc/iloc after sorting? loc follows the (now shuffled) labels; iloc follows the new physical order.
  3. How do you select rows 0–2 of only the first three columns by position? df.iloc[0:3, 0:3].
  4. What does df.loc[df["Sales"] > 5000] return? All rows whose boolean mask is True — boolean indexing works inside loc.
  5. Difference between df["Sales"] and df[["Sales"]]? The first is a Series, the second a one-column DataFrame.
  6. Which indexer should be used with a string index like customer names? loc, because positions are then meaningless to the reader.