Aim
To create a DataFrame ndf with Name, Gender, Position, City, Age and Projects, and to summarise the total projects handled by each Position for each City using pd.pivot_table() with aggfunc="sum".
CO Mapping: CO1, CO2, CO5
Theory
A pivot table reshapes and aggregates in one move. The input ndf is in long (tidy) format — one row per employee. The question "projects per Position per City" wants a wide format answer: Positions down the side, Cities across the top, one aggregated number per cell. pivot_table() performs exactly this long-to-wide transformation.
Its four key parameters map directly onto the question:
values="Projects"— the quantity to aggregate;index="Position"— the variable whose levels become rows;columns="City"— the variable whose levels become columns;aggfunc="sum"— how to combine multiple matching rows into one cell.
The aggfunc is what separates pivot_table from plain pivot: pivot demands a unique row per index-column pair and crashes on duplicates, while pivot_table expects duplicates and collapses them (two Delhi Analysts become one summed cell). Conceptually it is a groupby(["Position", "City"])["Projects"].sum() followed by an unstack() — the same numbers, arranged as a matrix instead of a list.
Not every Position exists in every City, so some cells have no underlying rows. By default those cells become NaN; fill_value=0 replaces them, which is semantically honest here because "no employees of that position in that city" really does mean zero projects handled there.
Dataset
The 8-employee table built as ndf:
| Name | Gender | Position | City | Age | Projects |
|---|---|---|---|---|---|
| Aman | M | Analyst | Delhi | 23 | 2 |
| Riya | F | Analyst | Delhi | 24 | 3 |
| Kunal | M | Manager | Noida | 31 | 5 |
| Sneha | F | Manager | Noida | 29 | 4 |
| Nikhil | M | Analyst | Pune | 25 | 2 |
| Pooja | F | Developer | Pune | 27 | 3 |
| Rohit | M | Developer | Delhi | 28 | 2 |
| Megha | F | Manager | Pune | 30 | 6 |
Procedure
- Import pandas and build
ndffrom a column dictionary; print it as the input record. - Call
pd.pivot_table(ndf, values="Projects", index="Position", columns="City", aggfunc="sum", fill_value=0)and store it assummary. - Print
summary— a 3 × 3 matrix (Positions: Analyst, Developer, Manager; Cities: Delhi, Noida, Pune). - Hand-verify one cell against the raw data, e.g. Analyst–Delhi should be Aman's 2 plus Riya's 3.
- Note which cells were filled with 0 and confirm no employee combination exists for them in
ndf.
Interpretation of Results
Tracing the data by hand, the pivot table is:
| Position | Delhi | Noida | Pune |
|---|---|---|---|
| Analyst | 5 | 0 | 2 |
| Developer | 2 | 0 | 3 |
| Manager | 0 | 9 | 6 |
Analyst–Delhi = 5 comes from two rows (Aman 2 + Riya 3) — proof that aggfunc="sum" collapsed duplicates. The zeros are structural: no Analyst or Developer works in Noida and no Manager works in Delhi, so those cells had nothing to sum. Analytically, the matrix shows Managers carry the heaviest project load (15 of the 27 total projects) and that Noida's entire output (9) is manager-driven — insights that were scattered across eight rows in long format and become visible at a glance once pivoted.
Common Mistakes
- Using
pivot()instead ofpivot_table()on data with duplicate Position–City pairs — it raises an error instead of aggregating. - Forgetting
aggfunc="sum"— the default ismean, so Analyst–Delhi would silently show 2.5 instead of 5. - Leaving out
fill_value=0and later doing arithmetic on the table — the NaN cells poison sums and comparisons.
🎯 Viva Questions
- pivot vs pivot_table?
pivotonly reshapes and fails on duplicate index-column pairs;pivot_tableaggregates duplicates with anaggfunc. - What is the default aggfunc of pivot_table?
mean. - Why do some cells show 0 here?
fill_value=0replaces NaN in combinations that have no rows, e.g. Manager–Delhi. - What is the groupby equivalent of this pivot?
ndf.groupby(["Position", "City"])["Projects"].sum().unstack(). - What are long and wide formats? Long: one row per observation (the input). Wide: one row per group with one column per category level (the pivot output).
- How would you also get row and column totals? Pass
margins=True, which appends anAllrow and column.