Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Unit 6: Intro to NLP

Lesson 33 of 49 in the free AI Fundamentals notes on Siksha Sarovar, written by Rohit Jangra.

Natural Language Processing (NLP) is the branch of AI that gives computers the ability to understand, interpret, and generate human language in a valuable way.

The Challenge of Language

Human language is not like code. It is messy, ambiguous, and context-dependent.

  • Ambiguity: "I saw the man with the telescope." (Did I have the telescope, or him?)
  • Sarcasm: "Oh, great!" (Could mean terrible).
  • Slang: "This song is fire." (Not actual combustion).

NLP Pipeline

The standard process for making text machine-readable:

StepActionExample
TokenizationBreaking text into words/units."AI is cool" -> ["AI", "is", "cool"]
Stop Word RemovalRemoving common filler words.["AI", "is", "cool"] -> ["AI", "cool"]
Stemming/LemmatizationReducing to root forms."Running", "Ran" -> "Run"
Part of Speech TaggingIdentifying nouns, verbs, etc."Run"(Verb), "AI"(Noun)
Named Entity RecognitionIdentifying People, Places, Dates."Google"(Org), "2024"(Date)