Broad Network


PHP Regular Expression Functions

Advanced PHP Regular Expressions - Part 1

Foreword: In this part of the series I talk about the PHP regular expression Functions, which include match, match_all, replace, and split.

By: Chrysanthus Date Published: 11 Jul 2019

Introduction

This is part 1 of my series, Advanced PHP Regular Expressions. In this part of the series I talk about the PHP regular expression Functions, which include: preg_match(), preg_match_all(), preg_replace(), and preg_split().

Pre-Knowledge
Every computer language builds up as you learn. You need to learn something today in the language, and then use it to learn something at a higher level in the same language, tomorrow. This series is part of my volume, PHP Course. At the bottom of this page, you have links to the different series you should have read before reaching here.

The preg_match() Function
In simple terms, this PHP function is:

    int preg_match ( string $regex , string $subject);

where regex, in general is,

    "/pattern/"

In the above case, you would have,

    "/man/"

The quotes may be single or double. If single, is has nowdoc effects; if double it has heredoc effects. $subject is the string where the search is to take place.

Simple Word Matching
Consider the following code:

<?php

        $ret = preg_match("/World/", "Hello World");
        echo $ret;

?>

If you try the above code, you would have the output, 1.

The first statement uses the preg_match() function. The first argument of the preg_match() function is "/World/". The second argument is "Hello World!"; this is a string literal; it is the subject string, from where the search will be made.

The regex is

     "/World/"    

Here, the regex is made up of the word, “World”, preceded by a forward slash and terminated by another forward slash; all that in quotes.

The subject string is:

            "Hello World"

Now, if “World” is found in the subject string, the preg_match() function would returns 1. If there is no matching, that is if no sub string is found in the subject, the preg_match() function would return, 0.

In many cases, you would just want to know if matching occurs or not. For that, you can use the following code:

<?php

        if (preg_match("/World/", "Hello World") == 1)
            {
                echo 'Matched';
            }
        else
            {
                echo 'Not Matched';
            }

?>

Or

<?php

        if (preg_match("/World/", "Hello World"))
            {
                echo 'Matched';
            }
        else
            {
                echo 'Not Matched';
            }

?>

These two code samples are the same. When compared to 1, the == operator returns true. When compared to 0, it returns false.

Mote: Matching is case sensitive. So if we had “World” in the regex as “world” with the W in lower case, the if-condition would not hold, and our code would display, “Not Matched”.

You can have the regex and the subject as string variables. The following code illustrates this:

<?php

        $re = "/World/";
        $subject = "Hello World!";

        if (preg_match($re, $subject))
            {
                echo 'Matched';
            }
        else
            {
                echo 'Not Matched';
            }

?>

In this code, you have the variables,

   $re = "/Would/";
   $subject = "Hello World";

The if-condition is now:

        if (preg_match($re, $subject))

The first argument for the preg_match() function is, $re, and the second argument is, $subject.

Global Matching
It is possible for you to have more than one sub string in the subject string that would match the regex. By default, only the first sub string in the subject is matched. To match all the sub strings in the subject, you have to use a different function, whose simplified syntax is:

    int preg_match_all ( string $pattern , string $subject [, array &$matches])

Here, $matches is a two dimensional array. It orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings, matched by the first parenthesized (group) subpattern, and so on.

The function returns the number of full pattern matches (which might be zero), or FALSE if an error occurred.

Consider the following subject string:

    $subject = "A cat is an animal. A rat is an animal. A bat is a creature.";

In the above subject, you have the sub strings: cat, rat and bat. You have cat first, then rat and then bat. Each of these sub strings matches the following regex:

                 /[cbr]at/

With the preg_match() function, this pattern will match only the first sub string, “cat”. If you want “cat” and “rat” and “bat” to be matched, you have to use the preg_match_all() function. The following code illustrates this:

<?php

    $subject = "A cat is an animal. A rat is an animal. A bat is a creature.";

    if (preg_match_all("/[cbr]at/", $subject))
             echo 'Matched';
         else
             echo 'Not Matched';

?>

The echo construct displays, Matched. Note that the $matches array has not been used in this code.

You can capture the different matched sub strings. The following code illustrates this:

<?php

    $subject = "A cat is an animal. A rat is an animal. A bat is a creature.";

    preg_match_all("/[cbr]at/", $subject, $matches);
    echo $matches[0][0], '<br>';
    echo $matches[0][1], '<br>';
    echo $matches[0][2];

?>

The first, second and third elements of the second row of the $matches array are “cat”, “rat” and “bat”. So the output of this code is:

cat
rat
bat

This is global matching.

Search and Replace
You can search for a match in the subject, and have the sub strings matched (found) replaced. Consider the following subject string:

             "I am a man. You are a man."

The sub string “man” occurs in this subject in two places. You can have the occurrence of the sub string “man” replaced by woman. You do this using the preg_replace() method, whose simplified syntax is:

        mixed preg_replace ( mixed $pattern , mixed $replacement , mixed $subject)

If matches are found, the new subject (copy) will be returned, otherwise subject (copy) will be returned unchanged or NULL if an error occurred. The old subject remains the same.

The following code illustrates this:

<?php

    $subject = "I am a man. You are a man.";

    $str = preg_replace("/man/", 'woman', $subject);

    echo $subject, '<br>';
    echo $str;

?>

The output is:

             I am a man. You are a man.
             I am a woman. You are a woman.

There are four lines in the code. The first line is the declaration and assignment of the subject string. The second line does the replacement. The first argument of the replace() method is the regex; the second argument is the sub string for replacement. The third argument is the subject.

The first echo construct displays the subject. The second echo construct displays the string returned by the replace() method.

From the output, we see that the subject remains unchanged. The return string above is the subject, where all the occurrences of the sub string, “man” have been replaced to "woman".

If you want to replace only the first limited number of occurrences, then you have to use an additional argument, called the limit argument. The following code illustrates this:

<?php

    $subject = "I am a man. You are a man.";

    $str = preg_replace("/man/", 'woman', $subject, 1);

    echo $subject, '<br>';
    echo $str;

?>

The output is:

             I am a man. You are a man.
             I am a woman. You are a man.

The value of the limit argument is the number, 1 (not in quotes). And so 1 (the first) occurrence has been replaced. The limit argument is actually the maximum number of occurrences that can be replaced, beginning from the left. If there is no occurrence, nothing will be replaced.

The Split Operation
PHP has a function called the preg_split() function. This function splits the string (subject) into an array of sub strings. The simplified syntax is:

array preg_split ( string $separator , string $subject [, int $limit = -1])

The subject is the string to be split. It is not changed after the split. The separator is a regex. The return array contains the sub strings separated. The limit is an integer. Some strings (subjects) may have characters at their end that you do not want to split. If you know the number of sub strings in the subject that you want, you can type this number as the limit. The rest of the string that cannot be split, goes into the array as the last sub string. The limit argument is optional. If absent, the splitting will go across the whole subject.

Consider the following subject string:

    $subject = "one two three";

If we know the regex (pattern) to identify space between words, then we can split this string into an array made up of the words, “one”, “two” and “three”. \  is the character class for space. + will match a space, one or more times. The regex to separate the above words is

               \ +

A space might be created by hitting the spacebar more than once. The following code illustrates the use of the split function:

<?php

    $subject = "one two three four five";

    $arr = preg_split("/\ +/", $subject, 3);

    echo $arr[0], '<br>';
    echo $arr[1], '<br>';
    echo $arr[2], '<br>';

?>

In the subject string the words are separated by spaces. The output of the above code is:

one
two
three four five

The split function has split the words in the subject string using the space between the words, and put the words as elements in the returned array. The word, “split” is not really proper in this section, since the subject string remains unchanged; however, that is the vocabulary the PHP specification uses.

It is possible to have words in a string separated by a comma and a space, like

    $subject = "one, two, three";

The regex to separate these words is:

          /, +/

The following code illustrates this:

<?php

    $subject = "one, two, three";

    $arr = preg_split("/, +/", $subject);

    echo $arr[0], '<br>';
    echo $arr[1], '<br>';
    echo $arr[2], '<br>';

?>

The output of the above code is:

one
two
three

It is possible to split and have null values as array elements. In the following code, the regex is a comma, and there are two consecutive commas in the subject:

<?php

    $subject = "one, two,,three";

    $arr = preg_split("/,/", $subject);

    echo $arr[0], '<br>';
    echo $arr[1], '<br>';
    echo $arr[2], '<br>';
    echo $arr[3], '<br>';

?>

The output is:

one
two

three

where the third value is null and not a space.

That is it for this part of the series. We stop here and continue in the next part.

Chrys


Related Links

Basics of PHP with Security Considerations
White Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links

Cousins

NEXT

Comments