Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Practical 8: Pivot - Projects by Position and City

Lesson 8 of 15 in the free Data Visualisation and Analytics Lab notes on Siksha Sarovar, written by Rohit Jangra.

Aim

To create a DataFrame ndf with Name, Gender, Position, City, Age and Projects, and to summarise the total projects handled by each Position for each City using pd.pivot_table() with aggfunc="sum".

CO Mapping: CO1, CO2, CO5

Theory

A pivot table reshapes and aggregates in one move. The input ndf is in long (tidy) format — one row per employee. The question "projects per Position per City" wants a wide format answer: Positions down the side, Cities across the top, one aggregated number per cell. pivot_table() performs exactly this long-to-wide transformation.

Its four key parameters map directly onto the question:

  • values="Projects" — the quantity to aggregate;
  • index="Position" — the variable whose levels become rows;
  • columns="City" — the variable whose levels become columns;
  • aggfunc="sum" — how to combine multiple matching rows into one cell.

The aggfunc is what separates pivot_table from plain pivot: pivot demands a unique row per index-column pair and crashes on duplicates, while pivot_table expects duplicates and collapses them (two Delhi Analysts become one summed cell). Conceptually it is a groupby(["Position", "City"])["Projects"].sum() followed by an unstack() — the same numbers, arranged as a matrix instead of a list.

Not every Position exists in every City, so some cells have no underlying rows. By default those cells become NaN; fill_value=0 replaces them, which is semantically honest here because "no employees of that position in that city" really does mean zero projects handled there.

Dataset

The 8-employee table built as ndf:

NameGenderPositionCityAgeProjects
AmanMAnalystDelhi232
RiyaFAnalystDelhi243
KunalMManagerNoida315
SnehaFManagerNoida294
NikhilMAnalystPune252
PoojaFDeveloperPune273
RohitMDeveloperDelhi282
MeghaFManagerPune306

Procedure

  1. Import pandas and build ndf from a column dictionary; print it as the input record.
  2. Call pd.pivot_table(ndf, values="Projects", index="Position", columns="City", aggfunc="sum", fill_value=0) and store it as summary.
  3. Print summary — a 3 × 3 matrix (Positions: Analyst, Developer, Manager; Cities: Delhi, Noida, Pune).
  4. Hand-verify one cell against the raw data, e.g. Analyst–Delhi should be Aman's 2 plus Riya's 3.
  5. Note which cells were filled with 0 and confirm no employee combination exists for them in ndf.

Interpretation of Results

Tracing the data by hand, the pivot table is:

PositionDelhiNoidaPune
Analyst502
Developer203
Manager096

Analyst–Delhi = 5 comes from two rows (Aman 2 + Riya 3) — proof that aggfunc="sum" collapsed duplicates. The zeros are structural: no Analyst or Developer works in Noida and no Manager works in Delhi, so those cells had nothing to sum. Analytically, the matrix shows Managers carry the heaviest project load (15 of the 27 total projects) and that Noida's entire output (9) is manager-driven — insights that were scattered across eight rows in long format and become visible at a glance once pivoted.

Common Mistakes

  1. Using pivot() instead of pivot_table() on data with duplicate Position–City pairs — it raises an error instead of aggregating.
  2. Forgetting aggfunc="sum" — the default is mean, so Analyst–Delhi would silently show 2.5 instead of 5.
  3. Leaving out fill_value=0 and later doing arithmetic on the table — the NaN cells poison sums and comparisons.

🎯 Viva Questions

  1. pivot vs pivot_table? pivot only reshapes and fails on duplicate index-column pairs; pivot_table aggregates duplicates with an aggfunc.
  2. What is the default aggfunc of pivot_table? mean.
  3. Why do some cells show 0 here? fill_value=0 replaces NaN in combinations that have no rows, e.g. Manager–Delhi.
  4. What is the groupby equivalent of this pivot? ndf.groupby(["Position", "City"])["Projects"].sum().unstack().
  5. What are long and wide formats? Long: one row per observation (the input). Wide: one row per group with one column per category level (the pivot output).
  6. How would you also get row and column totals? Pass margins=True, which appends an All row and column.