Broad Network


Embedded comments and Modifiers in Perl

Advanced Perl Regular Expressions – Part 9

Foreword: In this part of the series I talk about embedding comments and modifiers in a regex.

By: Chrysanthus Date Published: 2 Apr 2016

Introduction

This is part 9 of my series, Advanced Perl Regular Expressions. In this part of the series I talk about embedding comments and modifiers in a regex. The word, “Embedding” here, simply means placing some special data within the two forward slashes of a regex.  You should have read the previous parts of the series, because this is a continuation.

The syntax to embed anything in a regex is

    (?char)

where char is a character that indicates what is embedded. After char, you can optionally have some datum.

Note: in this article, if you cannot see any text or piece of code or if you think something (e.g. an image) is missing or link does not operate, just contact me at forchatrans@yahoo.com .

Comments
Just as you can comment when writing ordinary code, you can comment within a regex, but you need to learn how to do this. A regex can have more than one comment. There are two ways of embedding a comment into a regex. You can use the above syntax or you can use the x modifier. With either ways, you can type comment next to a sub-pattern. A regex can consist of several sub-patterns.

Comments Using the Embedding Syntax
Using the above syntax, in this case you have,

    (?#text)

where ? means embedding, # means comment and text is the actual comment as text. Note that the embedding structure begins with parenthesis and ends with parenthesis. So there should be no closing parenthesis within the text as that will conflict with the terminating parenthesis of the embedded structure. You can type the comment before a sub-pattern, as in,

    /(?# on the head)[hc]a[tp]/

You can type the comment after a sub-pattern as in,

    /[hc]a[tp](?# on the head)/

The comment group does not match anything in the subject string, so the comment group can be broken down into more than one line by pressing the Enter key as in,

    /[hc]a[tp](?# on
the head)/

Note: with this syntax you cannot break the pattern (code) that matches, into lines by pressing the Enter key

Read and try the following code:

use strict;

    if ("A hat and a cap" =~ /(?# on the head)[hc]a[tp]/)
        {
            print "Matched";
        }

This style of commenting has been largely superseded by the raw, freeform commenting that is allowed with the //x modifier.

Comments Using the x Modifier
With the x modifier you still have the comments embedded but not with the embedding syntax. The x modifier is at the end of the complete regex. In this case, you can optionally type the complete pattern as sub-patterns with the sub-patterns in different lines by pressing the Enter key. Next (on the right) to each sub-pattern you can type a comment beginning with #. With this syntax, the comment beginning with # has to be on one line, in order not to conflict with a sub-pattern

use strict;

    if ("A hat and a cap" =~ /#Talking about the head!
                             # Yes talking about it (head).
                             [hc] # A sub pattern
                             a
                             [tp] #comment on the right
                             /x
     )
        {
            print "Matched";
        }

I prefer to comment using the //x modifier.

Embedding Modifiers
Another way to mention modifiers is, //i , //m , //s and //x instead of just i, m, s, and x. These particular modifiers can be embedded in a regex using the embedding syntax, but there is no optional datum. I use the //i to make matching independent of casing to illustrate the embedding of modifiers. The syntax for embedding the //i modifier is:

    (?i)

If you place this modifier at the beginning of a regex (just after the first forward slash), it is the same as placing at the end and the whole regex becomes case insensitive. So,

    /(?i)Augustine/

is the same as,

    /Augustine/i

You should not use the embedded modifier and the same modifier at the end of the regex.

Now, if you embed the modifier within the regex, it acts from the point of embedding to the end of the regex. So,

    /Augus(?i)tine/

will match the subject string, "AugusTINE".

Each embedded modifier has a corresponding turn-off embedded modifier. You type a turn-off embedded modifier in the same way that you type the embedded modifier but you precede the letter with -. So the turn-off embedded i modifier is, (?-i). Wherever you place the embedded turn-off modifier in the regex, it has its effect from the point of insertion to the end of the regex. It neutralizes the effect of the end-of-regex modifier or the previously embedded modifier from the point of insertion to the end of the regex. So,

    /Au(?i)gus(?-i)tine/

will match "AuGUStine" but will not match "AuGUSTINE".

You can have a composite embedded modifier, by just having more than one modifier in the embedded modifier brackets, as in

    (?si)

Read and try the following script:

use strict;

    my @arr = "I am Augustine, You are AuGUStine. He is not AuGUSTINE" =~

/Au(?i)gus(?-i)tine/g;
    foreach my $var (@arr)
        {
            print $var, "\n";
        }

The output is:

Augustine
AuGUStine

The third “Augustine” in the subject did not match; that is justified.

That is it for this part of the series. We stop here and continue in the next part.

Chrys

Related Links

Perl Basics
Perl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course

BACK NEXT

Comments

Become the Writer's Fan
Send the Writer a Message