Broad Network


Embedded comments and Modifiers in PHP

Advanced PHP Regular Expressions - Part 5

Foreword: In this part of the series I talk about embedding comments and modifiers in a regex. I also talk about non-capturing groups.

By: Chrysanthus Date Published: 11 Jul 2019

Introduction

This is part 5 of my series, Advanced PHP Regular Expressions. In this part of the series I talk about embedding comments and modifiers in a regex. I also talk about non-capturing groups. The word, “Embedding” here, simply means placing some special data within the two forward slashes of a regex.  You should have read the previous parts of the series before reaching here, as this is the continuation.

The syntax to embed anything in a regex is

    (?char)

where char is a character that indicates what is embedded. After char, you can optionally have some datum.

Comments
Just as you can comment when writing ordinary code, you can comment within a regex, but you need to learn how to do this. A regex can have more than one comment. There are two ways of embedding a comment into a regex. You can use the above syntax or you can use the /x modifier. For either way, you can type comment next to a sub-pattern. A regex can consist of several sub-patterns.

Comments Using the Embedding Syntax
Using the above syntax, in this case you have,

    (?#text)

where ? means embedding, # means comment and text is the actual comment as text. Note that the embedding structure begins with parenthesis and ends with parenthesis. So there should be no closing parenthesis within the text as that would conflict with the terminating parenthesis of the embedded structure. You can type the comment before a sub-pattern, as in,

    /(?# on the head)[hc]a[tp]/

You can type the comment after a sub-pattern as in,

    /[hc]a[tp](?# on the head)/

The comment group does not match anything in the subject string, so the comment group can be broken down into more than one line by pressing the Enter key as in,

    /[hc]a[tp](?# on
the head)/

Note: with this syntax you cannot break the pattern (code) that matches, into lines by pressing the Enter key

Read and test the following code:

<?php

        $subject = "A hat and a cap";
        $regex = '/(?# on the head)[hc]a[tp]/';

        preg_match($regex, $subject, $matches);

        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

?>

The output is:

    hat

This style of commenting has been largely superseded by the raw, freeform commenting that is allowed with the x modifier.

Comments Using the x Modifier
With the x modifier you still have the comments embedded but not with the embedding syntax. The x modifier is at the end of the complete regex. In this case, you can optionally type the complete pattern as sub-patterns with the sub-patterns in different lines by pressing the Enter key. Next (on the right) to each sub-pattern you can type a comment beginning with #. With this syntax, the comment beginning with # has to be on one line, in order not to conflict with a sub-pattern

Try this:

<?php

        $subject = "A hat and a cap";
        $regex = '/#Talking about the head!
                             # Yes talking about it (head).
                             [hc] # A sub pattern
                             a
                             [tp] #comment on the right
                             /x';

        preg_match($regex, $subject, $matches);

        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[$i], '<br>';
            }

?>

I prefer to comment using the /x modifier.

Embedding Modifiers
Examples of modifiers are i, m, s, and x. These particular modifiers can be embedded in a regex using the embedding syntax, but there is no optional datum. I use the /i to make matching independent of casing to illustrate the embedding of modifiers. The syntax for embedding the i modifier is:

    (?i)

If you place this modifier at the beginning of a regex (just after the first forward slash), it will be the same as placing at the end, and the whole regex becomes case insensitive. So,

    /(?i)Augustine/

is the same as,

    /Augustine/i

You should not use the embedded modifier and the same modifier, at the end of the regex.

Now, if you embed the modifier within the regex, it acts from the point of embedding to the end of the regex. So,

    /Augus(?i)tine/

will match the subject string, "AugusTINE".

Each embedded modifier has a corresponding turn-off embedded modifier. You type a turn-off embedded modifier in the same way that you type the embedded modifier but you precede the letter with -. So the turn-off embedded i modifier is, (?-i). Wherever you place the embedded turn-off modifier in the regex, it has its effect from the point of insertion to the end of the regex. It neutralizes the effect of the end-of-regex modifier or the previously embedded modifier from the point of insertion to the end of the regex. So,

    /Au(?i)gus(?-i)tine/

will match "AuGUStine" but will not match "AuGUSTINE".

You can have a composite embedded modifier, by just having more than one modifier in the embedded modifier brackets, as in

    (?si)

Try the following script (which may not work well with your version of PHP):

<?php

        $subject = "I am Augustine, You are AuGUStine. He is not AuGUSTINE";
        $regex = '/Au(?i)gus(?-i)tine/';

        preg_match_all($regex, $subject, $matches);


        for ($i=0;$i<count($matches);++$i)
            {
                echo $matches[0][$i], '<br>';
            }

?>

The output should be:

    Augustine
    AuGUStine

The third “Augustine” in the subject did not match; that is justified.

Non Capturing Group
In this section, I explain how to code a non-capturing group in PHP. A non-capturing group and a capturing group can co-exist in the same regex. A non-capturing group is a group, where the match in the subject string does not go into the $matches array.

Reason for Non-Capturing Group
Must every group (with parentheses) in PHP regex be captured (go into the $matches array)? - No. As you type a regex you might want a group for convenience. Such a group is evaluated faster than a capturing group. The syntax for a non-capturing group is:

    (?:pattern)

The ? means you have some embedded information. The : should be typed after ? . The pattern is of your choice.

Example of Non-Capturing Group
In the following script, the group (one) is not captured while the group (two) is captured. Read and try it:

<?php

        $subject = "This is one and that is two.";
        $regex = '/(?:one).*(two)/';

        preg_match($regex, $subject, $matches);
        echo $matches[0], '<br>';
        echo $matches[1], '<br>';
        echo $matches[2], '<br>';

?>

The output is:

    one and that is two
    two

The first line is the sub-string for the whole regex. The second line is "two" and not "one" because "one" typed as (?:one) in the regex, is not captured.

That is it for this part of the series. We stop here and continue in the next part.

Chrys


Related Links

Basics of PHP with Security Considerations
White Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links

Cousins

BACK NEXT

Comments