Broad Network


Greediness of Perl Quantifiers and Solution

Advanced Perl Regular Expressions – Part 8

Foreword: In this part of the series, I talk about the greedy nature of Perl quantifiers and how to limit that.

By: Chrysanthus Date Published: 2 Apr 2016

Introduction

This is part 8 of my series, Advanced Perl Regular Expressions. In this part of the series, I talk about the greedy nature of Perl quantifiers and how to limit that. Note, the greedy nature is also referred to as the maximal nature. You should have read the previous parts of the series before reaching here, this is a continuation.

Quantifiers
Quantifiers are:

x*         :   means match 'x' 0 or more times, i.e., any number of times

x+         :   means match 'x' 1 or more times, i.e., at least once

x?         :   means match 'x' 0 or 1 times

x{n,}    :   means match 'x' at least n or more times; note the comma.

x{n}     :    match 'x'  exactly n times

x{n,m} :  match 'x'  at least n times, but not more than m times.

The Greediness of x* or x+ with the Dot
Consider the following binding operation:

    "In a meeting, you have to greet people" =~ /(m.*t)/;

The regex says, match from ‘m’ and then any character as many times as possible until ‘t’. From the subject string, the possible matches are “meet” or “meeting, you have to greet”. In practice, the above statement will match, “meeting, you have to greet”; that is greediness. The principle of first occurrence (or leftmost) in subject does not apply here.

Consider the following binding operation:

    "In a meeting, you have to greet people" =~ /(m.+t)/;

The regex says, match from ‘m’ and then any character you meet next, but as many times as possible until ‘t’. From the subject string, the possible matches again are “meet” or “meeting, you have to greet”. In practice, the above statement will match, “meeting, you have to greet”; that is greediness. The principle of first occurrence in subject string does not apply here.

This is part 8 of my series, Advanced Perl Regular Expressions. In this part of the series, I talk about the greedy nature of Perl quantifiers and how to limit that. Note, the greedy nature is also referred to as the maximal nature. You should have read the previous parts of the series before reaching here, this is a continuation.

Quantifiers
Quantifiers are:

x*         :   means match 'x' 0 or more times, i.e., any number of times

x+         :   means match 'x' 1 or more times, i.e., at least once

x?         :   means match 'x' 0 or 1 times

x{n,}    :   means match 'x' at least n or more times; note the comma.

x{n}     :    match 'x'  exactly n times

x{n,m} :  match 'x'  at least n times, but not more than m times.

The Greediness of x* or x+ with the Dot
Consider the following binding operation:

    "In a meeting, you have to greet people" =~ /(m.*t)/;

The regex says, match from ‘m’ and then any character as many times as possible until ‘t’. From the subject string, the possible matches are “meet” or “meeting, you have to greet”. In practice, the above statement will match, “meeting, you have to greet”; that is greediness. The principle of first occurrence (or leftmost) in subject does not apply here.

Consider the following binding operation:

    "In a meeting, you have to greet people" =~ /(m.+t)/;

The regex says, match from ‘m’ and then any character you meet next, but as many times as possible until ‘t’. From the subject string, the possible matches again are “meet” or “meeting, you have to greet”. In practice, the above statement will match, “meeting, you have to greet”; that is greediness. The principle of first occurrence in subject string does not apply here.

Solution to the Greediness of x* or x+ with the Dot
The solution or limiting of greediness is to make the quantifier match the first occurrence (leftmost) in the subject. To achieve this just append ? to the quantifier symbol, that is, x*? or x+? . Read and try the following script:

use strict;

    "In a meeting, you have to greet people" =~ /(m.*t)/;
    print $1, "\n";
    "In a meeting, you have to greet people" =~ /(m.*?t)/;
    print $1, "\n";

    "In a meeting, you have to greet people" =~ /(m.+t)/;
    print $1, "\n";
    "In a meeting, you have to greet people" =~ /(m.+?t)/;
    print $1, "\n";

In this code, where ? was appended, you have “meet”.

The x?, x{n,} and x{n,m} Quantifiers
The greediness of the x?, x{n,} and x{n,m} quantifiers is subjective or optional in interpretation. Whatever is the case, the solution is to append ? to the quantifier symbol. Let us consider them one-by-one.

The x? Quantifier
Consider the following statement:

    "The book is nice" =~ /(b.?)/;

The regex says, match b followed by any character, zero or 1 time. So, it can match “b” or “bo”. In practice, this statement will match “bo”; that can be considered as greediness. The apparent solution is to type ? after the quantifier symbol, ? to match ‘b’ alone.

The Quantifier x{n,}
Consider the following statement:

    "In a meeting, you have to greet people" =~ /(m.{2,}t)/;

In practice, you will have “meeting, you have to greet” and not “meet” matched; that can be interpreted as greediness. To have “meet”, use the syntax x{n,}? or exactly x{n.}; in this case, m.{2,}?t or exactly m.{2}t .

The Quantifier x{n,m}
Consider the following statement:

    "In a meeting, you have to greet people" =~ /(m.{2,24}t)/;

In practice, you will have “meeting, you have to greet” and not “meet” matched; that can be interpreted as greediness. To have “meet”, use m.{2,24}?t .

Read and try the following code:

use strict;

    "The book is nice" =~ /(b.?)/;
    print $1, "\n";
    "The book is nice" =~ /(b.??)/;
    print $1, "\n\n";

    "In a meeting, you have to greet people" =~ /(m.{2,}t)/;
    print $1, "\n";
    "In a meeting, you have to greet people" =~ /(m.{2,}?t)/;
    print $1, "\n";
    "In a meeting, you have to greet people" =~ /(m.{2}t)/;
    print $1, "\n\n";

    "In a meeting, you have to greet people" =~ /(m.{2,24}t)/;
    print $1, "\n";
    "In a meeting, you have to greet people" =~ /(m.{2,24}?t)/;
    print $1, "\n\n";

Note: When the solution of the greediness is given, the quantifier is said to be non-greedy or minimal.

That is it for this part of the series. We stop here and continue in the next part.

Chrys

Related Links

Perl Basics
Perl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course

BACK NEXT

Comments

Become the Writer's Fan
Send the Writer a Message