Broad Network


Backreferences in PHP Regular Expression

Advanced PHP Regular Expressions - Part 6

Foreword: In this part of the series, I explain how a group (subpattern) in a regex can be represented by a figure, ahead in the same regex.

By: Chrysanthus Date Published: 11 Jul 2019

Introduction

This is part 6 of my series, Advanced PHP Regular Expressions. In this part of the series, I explain how a group (subpattern) in a regex can be represented by a figure, ahead in the same regex. The possible figures are 1, 2, 3, etc. Remember, a group in a regex is a subpattern with parentheses. A group is another name for subpattern. The word, “subject” in this series, is the string in which the regular expression finds the match. Note: the abbreviation for regular expression in this series is, regex. You should have read the previous parts of the series before reaching here, as this is a continuation.

Backreference
Normally, when a writer types two consecutive words that are the same, it is a mistake. You may want to identify such a sequence in a subject string. Consider the following subject:

    $subject = "He has one  one of the books";

Here, the substring “one  one” accidentally typed, begins with “one”, then 1 or more character spaces and then “one” again. You may want to identify this substring. The pattern for the first word of interest is, \b\w\w\w\b . The pattern for 1 or more spaces is, \s+ . The pattern for the next word of interest is \b\w\w\w\b. Note that the two words of interest, one of which repeats, have the same pattern (subpattern). If you want to match the substring with the repeated word, you do not have to type the pattern for the word twice. A more mature regex to use is,

    /(\b\w\w\w\b)\s+\g{-1}/

In this expression, 1 represents the previous (\b\w\w\w\b) within the regex, making,

    /(\b\w\w\w\b)\s+(\b\w\w\w\b)/

equivalent to,

    /(\b\w\w\w\b)\s+\g{-1}/

which will match the phrase, “one  one”. As indicated above, 1 represents a previous grouping in the regex. Actually the above regex would match any three-letter word that repeats, e.g. “the the”, “him him”, “man  man”, etc.

You can use this same scheme to match a two-syllabus word, where the syllabuses are the same. So the following code segment produces a match for “beriberi”:

<?php

        $subject = "What does beriberi mean?";
        $regex = '/(beri)\g{-1}/';

        preg_match($regex, $subject, $matches);

        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

?>

The output is:

    beriberi
    beri

The first line is for the complete regex. The second one is for the group.

What about the situation where you have more than two previous subpatterns (groups) distributed out in the regex and you want to repeat them in the same regex ahead? This is where you need 1 for the previous pattern on the left in the regex, 2 for the other previous pattern further on the left, 3 for yet another previous pattern much further on the left in the regex, and so on. Consider the following code that produces a match:

        $subject = "Listen: A boy and a girl! Which boy and which girl?";
        $regex = '/(boy).+(girl).+\g{-2}.+\g{-1}/';

The phrase matched is, “boy and a girl! Which boy and which girl”, where in the regex, (boy) is for ”boy”, (girl) is for “girl”, then 1 is for (girl) and 2 is for (boy).
Read and test the following code that uses the above expressions:

<?php

        $subject = "He has one  one of the books";
        $regex = '/(\b\w\w\w\b)\s+\g{-1}/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        echo '<br>';

        $subject = "What does beriberi mean?";
        $regex = '/(beri)\g{-1}/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

        echo '<br>';

        $subject = "Listen: A boy and a girl! Which boy and which girl?";
        $regex = '/(boy).+(girl).+\g{-2}.+\g{-1}/';
        preg_match($regex, $subject, $matches);
        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

?>

The output is:

one one
one

beriberi
beri

boy and a girl! Which boy and which girl
boy
girl

That is it for this part of the series. We stop here and continue in the next part.

Chrys


Related Links

Basics of PHP with Security Considerations
White Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links

Cousins

BACK NEXT

Comments