Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Practical 2: Series Square and Filter

Lesson 2 of 15 in the free Data Visualisation and Analytics Lab notes on Siksha Sarovar, written by Rohit Jangra.

Aim

To create a pandas Series object s5 containing numbers, compute the square of every element into a second Series s6 using vectorised arithmetic, and display only the squared values greater than 15 using boolean masking.

CO Mapping: CO1, CO2, CO4

Theory

A Series is a one-dimensional labelled array — the building block of every DataFrame column. Two ideas from this practical power almost all pandas analytics:

  1. Vectorisation. s5 ** 2 applies the power operation to every element at once. Internally the loop runs in optimised C inside NumPy, so no Python for loop is written. Vectorised code is both faster (often 10–100×) and closer to mathematical notation, which reduces bugs.
  2. Boolean masking. A comparison such as s6 > 15 does not return one True/False — it returns a Series of booleans, one per element, aligned by index. Passing that mask back into the Series (s6[s6 > 15]) keeps only the rows where the mask is True. This is the filtering primitive behind every "show me records where…" question in analytics.

A third, quieter idea is index preservation: filtering never renumbers the survivors. The labels of the kept elements travel with them, so you can always trace a filtered value back to its original observation.

Dataset

Indexs5 (original)s6 = s5 ** 2s6 > 15?
011No
124No
239No
3416Yes
4525Yes
5636Yes

Procedure

  1. Import pandas as pd.
  2. Create s5 = pd.Series([1, 2, 3, 4, 5, 6]) — pandas assigns the index 0–5.
  3. Compute s6 = s5 ** 2. No loop is written; the operation broadcasts across all six elements.
  4. Print both Series with .to_list() for compact output.
  5. Build the mask s6 > 15 and inspect it mentally: it is [False, False, False, True, True, True].
  6. Apply the mask — s6[s6 > 15] — and print the surviving values.

Interpretation of Results

The filter returns 16, 25 and 36 — and, crucially, they keep their original index labels 3, 4 and 5. That tells the analyst which original observations survived, not just their values. Reading further: squaring is a monotonic transform for positive numbers, so the same three elements would be selected by the equivalent condition s5 > 3.87 (the square root of 15) — recognising such equivalences lets you filter on raw or derived columns interchangeably, whichever is cheaper or clearer.

Common Mistakes

  1. Writing a Python loop (for x in s5: ...) instead of the vectorised s5 ** 2 — it works but abandons the entire point (and the speed) of pandas.
  2. Using Python's and/or to combine masks — they raise ValueError on Series; use & and | with parentheses, e.g. s6[(s6 > 15) & (s6 < 30)].
  3. Expecting the filtered result to be re-indexed from 0 — it keeps labels 3, 4, 5 unless you call .reset_index(drop=True).

🎯 Viva Questions

  1. What is a Series? A one-dimensional labelled array; every DataFrame column is a Series.
  2. Why is s5 2 faster than a loop?** The elementwise loop runs in compiled C inside NumPy instead of interpreted Python bytecode.
  3. What type does s6 > 15 return? A boolean Series aligned to s6's index.
  4. How do you combine two filter conditions? With & (and) / | (or), wrapping each condition in parentheses.
  5. What happens to the index after filtering? It is preserved — survivors keep their original labels.
  6. How would you renumber the filtered result from zero? s6[s6 > 15].reset_index(drop=True).