Broad Network


First Occurrence in Perl Regex Matching

Advanced Perl Regular Expressions – Part 4

Foreword: In the evaluation of the binding operation by Perl, the regex matches the first occurrence of the sub-string in the subject; that is what I talk about in this part of the series.

By: Chrysanthus Date Published: 2 Apr 2016

Introduction

This is part 4 of my series, Advanced Perl Regular Expressions. In the evaluation of the binding operation by Perl, the regex matches the first occurrence of the sub-string in the subject; that is what I talk about in this part of the series. You should be reading the tutorials in this series in the order given, because this is a continuation.

Illustration
Read and try the following script:

use strict;

    my $subject = "I am a man. You are a man. He is a man.";
    if ($subject =~ /man/)
        {
            print "Matched";
        }

In the subject string, the word, “man” is typed in three places. The regex is /man/. It is the first occurrence of the sub-string, “man” that is matched in the subject. The other words of “man” in the subject, are ignored. If you want all the occurrences of the sub-string in question to be matched, you have to use the global, g modifier. The first occurrence can be called the leftmost occurrence. The following program is an illustration:

use strict;

    my $subject = "This is a cat. That is a rat. Here is a bat.";
    my @arr = $subject =~ /[brc]at/g;
    print $arr[0], "\n";
    print $arr[1], "\n";
    print $arr[2], "\n";

Even with the g modifier, “cat”, which occurs first in the subject here, is matched first. The left occurrence (leftmost) sub-string in the subject is always matched first. Without the g modifier, the rest of the occurrences are ignored. Read and try the following code that proves this:

use strict;

    my $subject = "This is a cat. That is a rat. Here is a bat.";
    $subject =~ /([brc]at)/;
    print $1;

In this code, the regex matches just “cat” which is the first occurrence of the possible matches in the subject. I did not bother to check by code if the other occurrences are matched; they are not matched.

Same thing with all Alternatives
The class e.g. [brc] produces a set of alternatives in the regex. With any form of alternatives in the regex even with the g modifier present, it is the first occurrence in the subject that is matched first. Read and try the following code, which uses the official alternative operator, | in the regex:

use strict;

    my $subject = "This is a child. That is a man. Here is a woman.";
    my @arr = $subject =~ /(woman|man|child)/;
    print @arr;

The output is just, “child”, which is the leftmost or first occurrence sub-string in the subject corresponding to the regex.

Nested Groups
With nested groups, it is still the first occurred sub-string in the subject that matches first; it does not matter what nests what, in the regex. Any group in the regex that corresponds to the first occurred sub-string, matches first. Read and try the following code that illustrates this with the g modifier:

use strict;

    "keepers, bookkeepers, bookkeeper and book go together." =~ /book(keeper(s|)|)/g;
    print $1, "\n";
    print $2, "\n";

The output is,

keepers
s

$1 displays “keepers” and $2 displays “s”. The first occurred sub-string in the subject that could match any group in the regex is “keepers”; the second occurred sub-string that could match any group in the regex is “s”. That is how matching progresses: first sub-string, then second sub-string, then third sub-string, and so on.

Note: if you are capturing the matches into an array, what is matched first goes first into the array.

That is it for this part of the series. We stop here and continue in the next part.

Chrys

Related Links

Perl Basics
Perl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course

BACK NEXT

Comments

Become the Writer's Fan
Send the Writer a Message