Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Practical 7: NGO Aid DataFrame Filtering

Lesson 7 of 15 in the free Data Visualisation and Analytics Lab notes on Siksha Sarovar, written by Rohit Jangra.

Aim

To create a DataFrame aid storing aid (toys, books, uniforms, shoes) sent by NGOs to different states, and to display (a) Books and Uniforms only and (b) Shoes only, using column-subset selection with a list of column names.

CO Mapping: CO1, CO2, CO5

Theory

Column projection — choosing which variables to look at — is the vertical twin of the row filtering practised in earlier practicals. In relational terms it is SQL's SELECT col1, col2 as opposed to WHERE. pandas offers a compact idiom: indexing a DataFrame with a list of column names, aid[["State", "Books", "Uniform"]], returns a new DataFrame containing exactly those columns, in exactly the order listed.

Two syntactic subtleties matter:

  • Single brackets vs double brackets. aid["Shoes"] (one pair) returns a Series; aid[["Shoes"]] (a one-element list) returns a one-column DataFrame. The distinction changes what methods are available and how the output prints — a recurring viva favourite.
  • Keep the identifier column. Selecting aid[["Books", "Uniform"]] alone would produce numbers with no state attached. Including "State" in every projection keeps each row interpretable — a discipline analysts call never separating measurements from their keys.

Projection is not merely cosmetic. Narrowing a wide table to the two or three columns a question actually needs reduces memory, speeds later operations, and — most importantly — removes the temptation to eyeball irrelevant columns and drift off the analytical question.

Dataset

The 4-state, 4-item aid table built as the aid DataFrame:

IndexStateToysBooksUniformShoes
0Delhi120300160110
1UP8025014095
2Haryana6519013070
3Rajasthan9021015085

Procedure

  1. Import pandas as pd.
  2. Build the aid DataFrame from a column dictionary with State, Toys, Books, Uniform and Shoes; pandas assigns the default index 0–3.
  3. Print the full DataFrame as the reference record.
  4. For part (a), run aid[["State", "Books", "Uniform"]] — a three-column projection that pairs each state with its Books and Uniform counts.
  5. For part (b), run aid[["State", "Shoes"]] — a two-column projection isolating Shoes.
  6. Confirm both outputs still have 4 rows: projection changes the columns, never the rows.

Interpretation of Results

Both projections keep all four states — the row count is untouched because no filtering condition was applied. Reading part (a): Books dominate the aid mix everywhere (Delhi's 300 books against 160 uniforms), and Delhi receives the most of both items while Haryana receives the least (190 books, 130 uniforms). Part (b) shows the Shoes column ranges from 70 (Haryana) to 110 (Delhi) — the same Delhi-first, Haryana-last ordering, suggesting aid volume scales with the state rather than with the item type. That is the kind of pattern projection makes visible: with only the relevant columns on screen, cross-state comparison becomes a single glance instead of a hunt through a wide table.

Common Mistakes

  1. Writing aid["State", "Books"] with one pair of brackets — pandas treats the tuple as a single column label and raises KeyError; the inner list is mandatory.
  2. Dropping the State column from the projection and presenting orphaned numbers that no reader can attribute to a state.
  3. Misspelling a column name ("Uniforms" instead of "Uniform") — pandas column selection is exact and case-sensitive, and fails with KeyError.

🎯 Viva Questions

  1. What is the difference between aid["Shoes"] and aid[["Shoes"]]? The first returns a Series; the second returns a one-column DataFrame.
  2. Does column selection change the number of rows? No — projection affects columns only; all 4 rows survive.
  3. What is the SQL equivalent of this operation? The SELECT column list (projection), as opposed to WHERE (row selection).
  4. How would you select Books and Uniform using loc? aid.loc[:, ["State", "Books", "Uniform"]]: keeps all rows.
  5. Can the column list reorder columns? Yes — the output follows the order of the list, not the original DataFrame order.
  6. How would you show only states where Shoes > 90? Combine filtering with projection: aid.loc[aid["Shoes"] > 90, ["State", "Shoes"]].