Broad Network


Modifiers in PHP Regular Expressions

PHP Regular Expressions with Security Considerations - Part 5

Foreword: In this part of the series, I talk about modifiers in PHP Regular Expressions.

By: Chrysanthus Date Published: 18 Jan 2019

Introduction

This is part 5 of my series, PHP Regular Expressions with Security Considerations. Matching is case sensitive. You may not know if what you are looking for is in lower case or upper case or has mixed cases. It is possible for you to make a case insensitive match. You need what is called a modifier for this. There are a good number of modifiers and each has its own purpose. In this part of the series, I talk about modifiers in PHP Regular Expressions. You should have read the previous parts of the series before coming here, as this is the continuation.

The i modifier
By default, matching is case sensitive. To make it case insensitive, you have to use what is called the i modifier.

So if we have the regex,

          /send/

and then we also have

    $subject = "Click the Send button."

the following code will not produce a match:

var subject = "Click the Send button.";

<?php

    $subject = "Click the Send button.";

    if (preg_match("/send/", $subject) === 1)
               echo 'Matched';
           else
               echo 'Not Matched';

?>

The regex did not match the subject string because the regex has “send” where S is in lower case, but the subject string has “Send” where S is in upper case. If you want this matching to be case insensitive, then your regex will have to be

         /send/i

Note the i just after the second forward slash. It is the i modifier. The following code will produce a match.

<?php

    $subject = "Click the Send button.";

    if (preg_match("/send/i", $subject) === 1)
               echo 'Matched';
           else
               echo 'Not Matched';

?>

Matching has occurred because the regex has been made case insensitive, with the i modifier.

Global Matching
It is possible for you to have more than one sub string in the subject string that would match the regex. By default, only the first sub string in the subject is matched. To match all the sub strings in the subject, you have to use a different function, whose simplified syntax is:

    int preg_match_all ( string $pattern , string $subject [, array &$matches])

Here, $matches is a two dimensional array. It orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings, matched by the first parenthesized (group) subpattern, and so on.

The function returns the number of full pattern matches (which might be zero), or FALSE if an error occurred.

Consider the following subject string:

    $subject = "A cat is an animal. A rat is an animal. A bat is a creature.";

In the above subject, you have the sub strings: cat, rat and bat. You have cat first, then rat and then bat. Each of these sub strings matches the following regex:

                   /[cbr]at/

With the preg_match() function, this pattern will match only the first sub string, “cat”. If you want “cat” and “rat” and “bat” to be matched, you have to use the preg_match_all() function. The following code illustrates this:

<?php

    $subject = "A cat is an animal. A rat is an animal. A bat is a creature.";

    if (preg_match_all("/[cbr]at/", $subject))
               echo 'Matched';
           else
               echo 'Not Matched';

?>

The echo construct displays, Matched. Note that the $matches array has not been used in this code.

You can capture the different matched sub strings. The following code illustrates this:

<?php

    $subject = "A cat is an animal. A rat is an animal. A bat is a creature.";

    preg_match_all("/[cbr]at/", $subject, $matches);
    echo $matches[0][0], '<br>';
    echo $matches[0][1], '<br>';
    echo $matches[0][2];

?>

The first, second and third elements of the second row of the $matches array are “cat”, “rat” and “bat”. So the output of this code is:

cat
rat
bat

This is global matching.

The m and s modifier
The s modifier refers to a single line and the m modifier refers to multiple lines in a string. Usually, without these modifiers, we get what we want. Sometimes, however, we want to keep track of \n characters. A file in the hard disk might be made up of many lines of text, each ending with the \n character. By default, the ^ and $ characters anchor at the beginning and at the end of the subject string, respectively. We can make them anchor the beginning and end of lines. The m modifier affects the interpretation of the ^, $ and the dot metahcaracter. Here is the full description of the m modifier:

- no modifiers: Here we look at the case where there is no modifier just after the second forward slash. Under this condition '.' matches any character except "\n" . ^ matches only at the start of the string and $ matches only at the subject string end or before \n at the end. This is the default behavior of the dot metacharacter.

- m modifier: This makes the subject string behaves like a set of multiple lines. In the subject string, consecutive lines are separated by the \n character. So '.' matches any character except "\n". In this way ^ and $ are able to match at the start or end of any line within the subject string. Here, ^ matches at the beginning of the string or just after the \n character, while $ matches just before the \n character.

I shall use examples to illustrate the above two conditions. I start by looking at the first condition. I will use the preg_match() function and not the preg_match_all() function.

No modifiers
Read the first point above again. Consider the following multiline subject string:

      $subject = "The first sentence.\n The second sentence.\n The third sentence.\n";

The subject has three lines. The following conditional produces a match.

            if (preg_match("/second/", $subject) === 1)

The sub string “second”, in the second line (sentence) is matched. Consider the following pattern:

            /^.*$/

This pattern (regex) is expected under normal circumstances, to match the whole subject string. Let us see if it does so with the above multi-line subject string. Consider the following code:

<?php

    $subject = "The first sentence.\n The second sentence.\n The third sentence.\n";

    if (preg_match("/^.*$/", $subject) === 1)
           echo 'Matched';
       else
           echo 'Not Matched';

?>

If you run this code, no matching will occur. This is because of the presence of the \n character in the subject. By default the dot class does not match the \n character.

I hope you now appreciate what the first point above is talking about.

The m modifier
Read the second point above again. Here we look at the effect of the m modifier. Consider the following subject string:

    $subject = "The first sentence.\n The second sentence.\n The third sentence.\n";

The subject has three lines. The following conditional produces a match.

        if (preg_match("/second/m", $subject) === 1)

Note that the m modifier has been used. The sub string “second”, in the second line is matched. Consider the following pattern:

         /^.*$/m

With the m modifier, this pattern (regex) should match only one line. Let us see if it does so with the above multi-line subject string. Consider the following code:

<?php

    $subject = "The first sentence.\n The second sentence.\n The third sentence.\n";

    preg_match("/^.*$/m", $subject, $matches);
    echo $matches[0];
    echo $matches[1];
    echo $matches[2];

?>

The output is:

The first sentence.
null
null

As you can see, only the first line is matched. If you want all the lines to be matched, you have to use the  preg_match_all() function instead. The following code illustrates this:

<?php

    $subject = "The first sentence.\n The second sentence.\n The third sentence.\n";

    preg_match_all("/^.*$/m", $subject, $matches);
    echo $matches[0][0], '<br>';
    echo $matches[0][1], '<br>';
    echo $matches[0][2];

?>

The output is:

The first sentence.
The second sentence.
The third sentence.

Using more than one modifier

Know that you can have more than one modifier in a regex, like in:

         /send/im

Well, it is time for a break. See you in the next part of the series.

Chrys


Related Links

Basics of PHP with Security Considerations
White Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links

Cousins

BACK NEXT

Comments