Aim
To create a pandas Series object s5 containing numbers, compute the square of every element into a second Series s6 using vectorised arithmetic, and display only the squared values greater than 15 using boolean masking.
CO Mapping: CO1, CO2, CO4
Theory
A Series is a one-dimensional labelled array — the building block of every DataFrame column. Two ideas from this practical power almost all pandas analytics:
- Vectorisation.
s5 ** 2applies the power operation to every element at once. Internally the loop runs in optimised C inside NumPy, so no Pythonforloop is written. Vectorised code is both faster (often 10–100×) and closer to mathematical notation, which reduces bugs. - Boolean masking. A comparison such as
s6 > 15does not return one True/False — it returns a Series of booleans, one per element, aligned by index. Passing that mask back into the Series (s6[s6 > 15]) keeps only the rows where the mask is True. This is the filtering primitive behind every "show me records where…" question in analytics.
A third, quieter idea is index preservation: filtering never renumbers the survivors. The labels of the kept elements travel with them, so you can always trace a filtered value back to its original observation.
Dataset
| Index | s5 (original) | s6 = s5 ** 2 | s6 > 15? |
|---|---|---|---|
| 0 | 1 | 1 | No |
| 1 | 2 | 4 | No |
| 2 | 3 | 9 | No |
| 3 | 4 | 16 | Yes |
| 4 | 5 | 25 | Yes |
| 5 | 6 | 36 | Yes |
Procedure
- Import pandas as
pd. - Create
s5 = pd.Series([1, 2, 3, 4, 5, 6])— pandas assigns the index 0–5. - Compute
s6 = s5 ** 2. No loop is written; the operation broadcasts across all six elements. - Print both Series with
.to_list()for compact output. - Build the mask
s6 > 15and inspect it mentally: it is[False, False, False, True, True, True]. - Apply the mask —
s6[s6 > 15]— and print the surviving values.
Interpretation of Results
The filter returns 16, 25 and 36 — and, crucially, they keep their original index labels 3, 4 and 5. That tells the analyst which original observations survived, not just their values. Reading further: squaring is a monotonic transform for positive numbers, so the same three elements would be selected by the equivalent condition s5 > 3.87 (the square root of 15) — recognising such equivalences lets you filter on raw or derived columns interchangeably, whichever is cheaper or clearer.
Common Mistakes
- Writing a Python loop (
for x in s5: ...) instead of the vectoriseds5 ** 2— it works but abandons the entire point (and the speed) of pandas. - Using Python's
and/orto combine masks — they raiseValueErroron Series; use&and|with parentheses, e.g.s6[(s6 > 15) & (s6 < 30)]. - Expecting the filtered result to be re-indexed from 0 — it keeps labels 3, 4, 5 unless you call
.reset_index(drop=True).
🎯 Viva Questions
- What is a Series? A one-dimensional labelled array; every DataFrame column is a Series.
- Why is
s52 faster than a loop?** The elementwise loop runs in compiled C inside NumPy instead of interpreted Python bytecode. - What type does
s6 > 15return? A boolean Series aligned tos6's index. - How do you combine two filter conditions? With
&(and) /|(or), wrapping each condition in parentheses. - What happens to the index after filtering? It is preserved — survivors keep their original labels.
- How would you renumber the filtered result from zero?
s6[s6 > 15].reset_index(drop=True).