Broad Network


Capturing Matches in PHP Regular Expressions

Advanced PHP Regular Expressions - Part 2

Foreword: In this part of the series I explain how to capture matches in PHP regular expression operations; the word 'capture' here means holding the substring matched in the subject.

By: Chrysanthus Date Published: 11 Jul 2019

Introduction

This is part 2 of my series, Advanced PHP Regular Expressions. In this part of the series I explain how to capture matches in PHP regular expression operations; the word “capture” here means holding the substring matched in the subject. You should have read the previous part of the series before reaching here; this is a continuation. Remember, a dot metacharacter matches any character in the subject, at its position. The word, “subject” in this series, is the string in which the regular expression finds the match. Note: the abbreviation for Regular Expression in this series is, regex.

Grouping
When you look at the subject, you may be interested in a particular substring of an overall substring to be matched; you have to target that substring in the regex by placing parentheses around the subpattern in the regex. The subpattern within parentheses in the regex is called a group. After the match, the substring of the overall substring is identified. Read and test the following code that illustrates this:

<?php

        $subject = "one two three four five";
        $regex = '/tw. (thre.) fou./';
        preg_match($regex, $subject, $matches);
        echo $matches[0], '<br>';
        echo $matches[1];

?>

The variable, $matches is an array that receives the overall matched substring and any captured substring of the overall substring.

The overall matched substring is, “two three four”. The substring of the overall substring matched is “three”. This is from the group, (thre.) . The first element in the $matches array is the overall matched substring. The next element in the array is the substring of the group, matched. The output for the code is:

    two three four
    three

It is possible to have more than one group in the overall pattern. In this case, the first element has the overall matched substring, the next element has the first matched substring group in the overall pattern, the element after has the second matched substring group in the overall pattern, the element following has the third matched substring group in the overall pattern, and so on. Read and try the following code that illustrates this:

<?php

        $subject = "The numbers are: one, two, three, and so on.";
        $regex = '/(on.), (tw.), (thre.), and/';

        preg_match($regex, $subject, $matches);

        echo $matches[0], '<br>';
        echo $matches[1], '<br>';
        echo $matches[2], '<br>';
        echo $matches[3], '<br>';

?>

The output is:

    one, two, three, and
    one
    two
    three

Alternative Capture within a Group
Here, alternative means Or. Consider the USA time, 8:5:13. The month can be written as 8 or 08; the day of the month can be written as 5 or 05; the year can be written as 19 or 2019. There are several ways in which this date can be written because of the different alternatives of each of the figures. A subject for the date may be, "8:05:2019"; another subject may instead be, "08:5:19", same thing but written in a different way. A regex to match the whole date and capture the different possible figures is,

    /(\d|\d\d):(\d|\d\d):(\d\d|\d\d\d\d)/

where \d represents a digit, | means Or, and so we would have a statement like,

    $regex = /(\d|\d\d):(\d|\d\d):(\d\d|\d\d\d\d)/;

For a filled array of the match() function, the first element will have an overall substring for the whole regex, the second element, will have the match for (\d|\d\d); the third element will have the match for (\d|\d\d) and the fourth element will have the match for (\d\d|\d\d\d\d). Try the following code:

<?php

        $subject = '08:5:2019';
        $regex = '/(\d|\d\d):(\d|\d\d):(\d\d|\d\d\d\d)/';

        preg_match($regex, $subject, $matches);

        for($i=0; $i<count($matches); ++$i)
            {
                echo $matches[$i], '<br>';
            }

?>

The output is:

08:5:20
08
5
20

Now, for the year, may be you were expecting 2019, but only the first two digits, 20 have been captured. This is because \d\d was typed before \d\d\d\d in the alternative. If you want 2019 to be returned instead of 20, then type (\d\d\d\d|dd) for the year in the regex.

That is it for this part of the series. We stop here and continue in the next part

Chrys


Related Links

Basics of PHP with Security Considerations
White Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links

Cousins

BACK NEXT

Comments