Broad Network


Regex Modifiers in Perl

Regular Expressions in Perl for the Novice – Part 5

Forward: It is possible for you to make a case insensitive match. You need what is called a modifier for this. There are modifiers to do other things. We shall learn some of them in this part of the series.

By: Chrysanthus Date Published: 13 Aug 2012

Introduction

This is the fifth part of my series, Regular Expressions in Perl for the Novice. Matching is case sensitive. You may not know if what you are looking for is in lower case or upper case or has mixed cases. It is possible for you to make a case insensitive match. You need what is called a modifier for this. There are modifiers to do other things. We shall learn some of them in this part of the series.

The i Modifier
By default, matching is case sensitive. To make it case insensitive, you have to use what is called the i modifier.

So if we have the regex,

          /send/

and then we also have

    my  $availableString = “Click the Send button.”

Then the following code will not produce a match:

use strict;

if ($availableString =~ /send/)
  {
    print "Matched\n";
  }
else
  {
    print "Not Matched\n";
  }

The regex did not match the available string because the regex has “send” where s is in lower case, but the available string has “Send” where S is in upper case. If you want this matching to be case insensitive, then your regex will have to be

         /send/i

Note the i just next to the second forward slash. The following code will produce a match.

use strict;

my  $availableString = "Click the Send button.";

if ($availableString =~ /send/i)
  {
    print "Matched\n";
  }
else
  {
    print "Not Matched\n";
  }

Matching has occurred because we have made the regex case insensitive, with the i modifier.

The g Modifier
It is possible for you to have more than one sub string in the available string that would match the regex. So far, we have been assuming that there is only one sub string in the available string that produces a match. If you want more than one sub string to be matched, you have to use the g modifier. You use it just as you use the i modifier. One of the situations in which you would need the g modifier is when you want to collect all the sub strings in the available string that match.

Consider the following available string:

my $availableString = "A cat is an animal. A rat is an animal. A bat is a creature.";

In the above available string, you have the sub strings: cat, rat and bat. You have “cat” first, then “rat” and then “bat”. Each of these sub strings match the following regex:

                   /[cbr]at/

Normally, this pattern will match only the first sub string, “cat”. If you want “cat”, “rat” and “bat” all to be matched, you have to use the g modifier as follows:

                  /[cbr]at/g

Just putting the g modifier at the end of the regex is not useful. You would want to collect all the sub strings that matched. The following statement collects all the sub strings in an array.

     my @arr = ($availableString =~ /[cbr]at/g);

The array is @arr. The first item in the array is “cat”; the second is “rat” and the third is “bat”. The following code illustrates this:

use strict;

my $availableString = "A cat is an animal. A rat is an animal. A bat is a creature.";

my @arr = ($availableString =~ /[cbr]at/g);

print "First Item is: ", $arr[0], "\n";

print "Second Item is: ", $arr[1], "\n";

print "Third Item is: ", $arr[2], "\n";

This is the output of the code:

First Item is: cat
Second Item is: rat
Third Item is: bat

Note: g as a modifier means global.

The pos() function
After a match, the pos() function can be used to return the next position that the searching of the available string is to begin for the next match. This works with the global modifier.

In the above case, after the first match of “cat”, pos() would return 5. Counting position in string begins from zero. A good way to use the pos() function is in a while loop. The following code illustrates this:

use strict;

my $availableString = "A cat is an animal. A rat is an animal. A bat is a creature.";

while($availableString =~ /[cbr]at/g)
  {
    print "Next search starts at position: ", pos($availableString), "\n";
  }

This is the output of the code above:

Next search starts at position: 5
Next search starts at position: 25
Next search starts at position: 45

The pos() function takes as argument the variable of the available string. The pos() function can also be used to set the position where search will continue, in the available string. However, I will not address that in this series.

The s and m modifiers
The s modifier refers to a single line and the m modifier refers to multiple lines in a string. Usually, without these modifiers, we get what we want. Sometimes, however, we want to keep track of newline characters. A file in the hard disk might be made up of many lines of text each ending with the n character. By default, the ^ and $ characters anchor at the beginning and at the end of the available string. We can make them anchor the beginning and end of lines. The s and the m modifiers affect the interpretation of the ^, $ and the dot metahcaracter. Here is the full description of the s and m modifiers

- no modifiers: Here we look at the case where there is no modifier just after the second forward slash. Under this condition '.' matches any character except "\n" . ^ matches only at the start of the string and $ matches only at the available string end or before n at the end. This is the default behavior of the dot metacharacter.

- s modifier: This makes the available string behaves like a long line independent of any newline character that may be there. So '.' matches any character, even "\n" . ^ matches only at the start of the string and $ matches only at the end of the available string or before n.

- m modifier: This makes the available string behaves like a set of multiple lines. In the available string, consecutive lines are separated by the n character. So '.' matches any character except "\n". In this way ^ and $ are able to match at the start or end of any line within the available string. Here, ^ matches at the beginning of the string or just after the n character, while & matches just before the n character.

We shall use examples to illustrate the above three conditions. We start by looking at the first condition.

No Modifiers
Read the first condition above again. Consider the following available string:

my $availableString = "The first sentence.\n The second sentence.\n The third sentence.\n";

The available string has three lines. The following expression produces a match.

         $availableString =~ /second/

The sub string “second”, in the second line is matched. Consider the following pattern:

          /(^.*$)/

This pattern (regex) is expected under normal circumstances, to match the whole string. Let us see if it does so with the above multi-line available string. Consider the following code:

use strict;

my $availableString = "The first sentence.\n The second sentence.\n The third sentence.";

(my $matchedString) = ($availableString =~ /(^.*$)/);

print $matchedString;

If you run this code, no matching will occur. This is because of the presence of the n character in the available string. So the following expression, got from the code above, does not produce a match:

"The first sentence.n The second sentence.n The third sentence.\n" =~ /(^.*$)/);

By default, ‘.’ does not match the n character. In the expression, ^ would match the start of the available string and $ would match the end of the available string. However, ‘.’ Does the match the n and so the matching fails. I hope you now appreciate what the first condition is talking about.

s modifier
Read the second condition above again. We shall do a similar thing that we did above. Consider the following available string:

my $availableString = "The first sentence.n The second sentence.n The third sentence.\n";

The available string has three lines. The following expression produces a match.

         $availableString =~ /second/s

Note that the s modifier has been used. The sub string “second”, in the second line is matched. Consider the following pattern:

          /(^.*$)/s

This pattern (regex) is supposed to match the whole string. Let us see if it does so with the above multi-line available string, now that there is the s modifier. Consider the following code:

use strict;

my $availableString = "The first sentence.\n The second sentence.\n The third sentence.\n";

(my $matchedString) = ($availableString =~ /(^.*$)/s);

print $matchedString;

This is the output:

The first sentence.
The second sentence.
The third sentence.

In the output, the whole string has been displayed. The first line is displayed. The new-line character is not displayed. It caused the next line to be displayed below. The second and third lines are displayed. The n character in the string causes the next line to be displayed below current line.

Here, because of the s modifier, /(^.*$)/ matches each sentence. I hope you now appreciate what the second condition is talking about.

The m Modifier
Read the third condition above again. Here we look at the effect of the m modifier. Consider the following available string:

my $availableString = "The first sentence.\n The second sentence.\n The third sentence.\n";

The available string has three lines. The following expression produces a match.

         $availableString =~ /second/m

Note that the m modifier has been used. The sub string “second”, in the second line is matched. Consider the following pattern:

          /(^.*$)/m

With the m modifier, this pattern (regex) should match a line. Let us see if it does so with the above multi-line available string. Consider the following code:


use strict;

my $availableString = "The first sentence.\n The second sentence.\n The third sentence.\n";

(my $matchedString) = ($availableString =~ /(^.*$)/m);

print $matchedString;

The output of the code is:

The first sentence.

So it matched the first line. This is all right. The use of the m modifier can be complicated. So I will not give any more explanation about it.

The x Modifier
If you want to include comments in your regex, you can use the x modifier. With the x modifier whitespaces in regex are ignored. We shall see an example later.

Using more than one Modifier
We shall soon take a break. Before we have a break, know that you can have more than one modifier in a regex, like in:

          /send/ig

It is time. Let us take a break.

Chrys

Related Links

Perl Reference
Object Oriented Programming in Perl
Date and Time in Perl
Regular Expressions in Perl
Perl Course
Web Development Course
Major in Website Design
NEXT

Comments

Become the Writer's Fan
Send the Writer a Message