Aim
To write and test regular expressions in PHP that demonstrate pattern modifiers, operators (quantifiers) and metacharacters using the PCRE preg_* function family.
Theory
PHP's regex engine is PCRE (Perl Compatible Regular Expressions), exposed through preg_match(), preg_match_all(), preg_replace() and preg_split(). A pattern is written between delimiters (commonly /pattern/) and may be followed by modifiers that change engine behaviour:
i— case-insensitive matchingm— multiline:^and$anchor at every line boundarys— dotall:.also matches newlinex— extended: whitespace inside the pattern is ignored (self-documenting patterns)u— treat pattern and subject as UTF-8
Metacharacters carry special meaning: ^ (start anchor), $ (end anchor), . (any character), \d (digit), \w (word character), \s (whitespace), \b (zero-width word boundary), plus character classes like [a-z] and the negated form [^a-z]. Operators/quantifiers control repetition: * (0 or more), + (1 or more), ? (0 or 1), {n} / {n,m} (bounded), and | (alternation). Quantifiers are greedy by default; appending ? makes them lazy. Note the return contract: preg_match() returns 1 on match, 0 on no match and false on a malformed pattern — so === 1 is the robust production idiom.
Requirements
- XAMPP/WAMP with PHP 8.x (or standalone PHP CLI)
- Code editor (VS Code)
- Browser (Chrome/Edge) or terminal
Procedure
- Start Apache from the XAMPP Control Panel.
- Create the folder
C:\xampp\htdocs\wbplaband save the program asp01_regex.phpinside it. - Type the code from the snippet below and save.
- Run it at
http://localhost/wbplab/p01_regex.php, or from the terminal withphp p01_regex.php. - Edit the
$samplesarray, predict each MATCH/NO MATCH, then re-run to confirm.
Explanation of the Code
$patternsis an associative array mapping a human-readable label to a PCRE pattern. The first pattern/^web[a-z]\d{2}$/ianchors the whole string (^...$), requires the literalweb, any run of letters ([a-z]), then exactly two digits (\d{2}); theimodifier is whyWebDev23matches despite its capital letters.- The second pattern
/\d+/uses the+operator — at least one digit anywhere in the subject. - The third pattern
/\bphp\b/iuses the\bword-boundary metacharacter sophpmatches only as a whole word, never inside a longer token likephpMyAdmin. - The nested
foreachfeeds every string in$samplestopreg_match()and printsMATCH/NO MATCHthrough a ternary expression. - Finally,
preg_replace('/[^a-z\s]/i', '', $sentence)deletes every character that is not a letter or whitespace (a negated character class), stripping punctuation from$sentence.
Expected Output
For each labelled pattern the script prints one line per sample: WebDev23 => MATCH (pattern 1), course123 => MATCH and WebDev23 => MATCH (pattern 2, both contain digits), and I love PHP and regex => MATCH for the word-boundary pattern (the other two samples say NO MATCH). The final lines show Original: PHP, Python, and JavaScript! and After preg_replace: PHP Python and JavaScript — commas and the exclamation mark removed, spaces preserved.
🎯 Viva Questions
- Why does a PCRE pattern need delimiters? They separate the pattern body from trailing modifiers such as
iorm. - Difference between
preg_match()andpreg_match_all()? The former stops at the first match; the latter collects every match into$matches. - What does
\bmatch? A zero-width position between a word and a non-word character — it consumes no text. - Greedy vs lazy quantifier? Greedy (
.) takes the longest possible match; lazy (.?) the shortest. - What does
[^a-z]mean? A negated class — any single character that is not a lowercase letter. - Why compare
preg_match()with=== 1? It can also returnfalseon pattern error, which a loose truthiness check would mishandle.
CO Mapping
CO1, CO2