Broad Network


Greediness of PHP Quantifiers and Solution

Advanced PHP Regular Expressions - Part 7

Foreword: In this part of the series, I talk about the greedy nature of PHP quantifiers and how to limit that.

By: Chrysanthus Date Published: 11 Jul 2019

Introduction

This is part 7 of my series, Advanced PHP Regular Expressions. In this part of the series, I talk about the greedy nature of PHP quantifiers and how to limit that. The word, “subject” in this series, is the string in which the regular expression finds the match. Note: the abbreviation for regular expression in this series is, regex. You should have read the previous part of the series before reaching here, as this is a continuation.

Quantifiers
Quantifiers are:

x*         :   means match 'x' 0 or more times, i.e., any number of times

x+         :   means match 'x' 1 or more times, i.e., at least once

x?         :   means match 'x' 0 or 1 times

x{n,}    :   means match 'x' at least n or more times; note the comma.

x{n}     :    match 'x'  exactly n times

x{n,m} :  match 'x'  at least n times, but not more than m times.

The Greediness of x* or x+ with the Dot
Consider the following code segment that produces a match:

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.*t/';
        preg_match($regex, $subject, $matches);

The regex says, match from ‘m’ and then any character as many times as possible until ‘t’. From the subject string, the possible matches are “meet” or “meeting, you have to greet”. In practice, the matching statement above will match, “meeting, you have to greet”; that is greediness.

Consider this time, the following code segment that produces a match:

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.+t/';
        preg_match($regex, $subject, $matches);

The regex says, match from ‘m’ and then any character you meet next, but as many times as possible until ‘t’. From the subject string, the possible matches again are “meet” or “meeting, you have to greet”. In practice, the matching statement above will match, “meeting, you have to greet”; that is greediness.

Solution to the Greediness of x* or x+ with the Dot
The solution or limiting of greediness is to make the quantifier match the first occurrence (leftmost) in the subject. To achieve this just append ? to the quantifier symbol, that is, x*? or x+? . Read and test the following code:

<?php

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.*t/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.*?t/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.+t/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.+?t/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

?>

The output is:

meeting, you have to greet
meet
meeting, you have to greet
meet

In this code, where ? was appended, you have “meet” as the matched substring.

The x?, x{n,} and x{n,m} Quantifiers
The greediness of the x?, x{n,} and x{n,m} quantifiers is subjective or optional in interpretation. Whatever is the case, the solution is to append ? to the quantifier symbol. Let us consider them one-by-one.

The x? Quantifier
Consider the following statement:

        $subject = "The book is nice";
        $regex = '/(b.?)/';

where the subject is "The book is nice" and the regex is /(b.?)/.

The regex says, match b followed by any character, zero or 1 time. So, it can match “b” or “bo”. In practice, this statement will match “bo”; that can be considered as greediness. The apparent solution is to type ? after the quantifier symbol, ? to match ‘b’ alone.

The x{n,} Quantifier
Consider the following statement:

        $subject = "In a meeting, you have to greet people.";
        $regex = '/(m.{2,}t)/';

In practice, you will have “meeting, you have to greet” and not “meet” matched; that can be interpreted as greediness. To have “meet”, use the syntax x{n,}? or exactly x{n.}. In the case of m.{2,}?t you can use just m.{2}t .

The x{n,m} Quantifier
Consider the following statement:

        $subject = "In a meeting, you have to greet people";
        $regex = '/(m.{2,24}t)/';

In practice, you will have “meeting, you have to greet” and not “meet” matched; that can be interpreted as greediness. To have “meet”, use m.{2,24}?t .

Read and test the following code that demonstrates the above:

<?php

        $subject = "The book is nice";
        $regex = '/b.?/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        $subject = "The book is nice";
        $regex = '/b.??/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.{2,}t/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.{2,}?t/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.{2}t/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.{2,24}t/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        $subject = "In a meeting, you have to greet people";
        $regex = '/m.{2,24}?t/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

?>

The output is:

    bo
    b
    meeting, you have to greet
    meet
    meet
    meeting, you have to greet
    meet

Note: When the solution of the greediness is given, the quantifier is said to be non-greedy.

That is it for this part of the series. We take a break here and continue in the next part.

Chrys


Related Links

Basics of PHP with Security Considerations
White Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links

Cousins

BACK NEXT

Comments