Functions Related to Regular Expressions in Perl

Advanced Perl Regular Expressions – Part 10

Foreword: In this part of the series, I talk about functions that are related to regular expressions in Perl.

By: Chrysanthus Date Published: 2 Apr 2016

Introduction

This is part 10 of my series, Advanced Perl Regular Expressions. In this part of the series, I talk about functions that are related to regular expressions in Perl. In Perl, operators and functions mean more or less the same thing. I present only the basics of each function. If you want any detail just type your search phrase in the Search Box of this page and click Search. You should have read the previous parts of the series, because this is a continuation.

The m// Function
In a regex you do not always need to have two forward slashes. You can have the pair of ' ' or "" or () or {}. However, if you want such a pair, you have to precede the pair with, m as in:

    =~ m{pattern}

for

    =~ m/pattern/

The s/// Operator
This is the Search and Replace operator. You can search for a match in the subject string and have the sub-string matched replaced. The syntax is:

$subject =~ s/regex/replacement/modifiers

regex, you already know what that means; replacement, is the text that will replace the sub string found. We have seen modifiers. An example is the g modifier. In the statement modifiers are optional.

The following code illustrates this.

use strict;

    my $subject = "I am a man.";

    $subject =~ s/man/woman/;

    print $subject;

The extra print statements are to produce a web page. The output is:

I am a woman.

The subject string content is “I am a man”. The Search and Replace statement is “$subject =~ s/man/woman/;”. The subject string, after Search and Replace is “I am a woman.”. So the word “man” in the subject string has been matched and replaced by “woman”. The pattern for matching is /man/. “woman” is the sub string for replacement in the subject string.

The split Function
There is a function called the split function. The syntax is:

split /pattern/, string

The split operator (function) splits a string into a list of sub strings and returns the list. The pattern is the separator e.g. a comma. The separator should not be part of the returned list. You can place parentheses around the arguments. The return object of the split function is a list (array) of the different sub-strings. Consider the following subject:

    my $subject = "one, two, three";

You may want to separate this into the sub-strings, “one”, “two” and “three”. The separator is /, /, that is, comma. and a space. The following code does the split but there is some redundancy;

use strict;

    my $subject = "one, two, three";
    my @words = split(/(, )/, $subject);

    for (my $i=0;$i<@words;++$i)
        {
            print $words[$i], "\n";
        }

Note that the regex is made up of the capturing group, (, ). The output is:

one
,
two
,
three

The problem here is that because of the capturing group, the separator has also been returned as sub-strings. In the absence of the parentheses the separator will not be returned as sub-strings. However, you can never guarantee that your separator of interest will not have a capturing group. Just use the Search Box above to know the solution.

The tr/// Operator
This is the transliteration operator. Imagine that the Perl’s special variable, $_ has the string:

    "oneA twoB threeC fourD, fiveE"

In this string, if you want to change the character, ‘A’ to the character, ‘1’, then the character, ‘B’ to ‘2’, then the character ‘C’ to ‘3’, then the character ‘D’ to ‘4’ and then ‘E’ to ’5’, all in one sweep, you would use the transliteration operator as follows:

    tr/ABCDE/12345/

Within the first forward slash pair, you have the letters A, B, C, D and E. Within the second and third forward slash you have the characters for replacement. These characters for replacement are type corresponding to the characters within the first and second forward slashes. That is, 1 will replace only A, 2 will replace only B, 3 will replace only C, etc. Remember, a number in single or double quotes, is a character. The syntax for the operator is:

    tr/SEARCHLIST/REPLACEMENTLIST/cdsr

Note what is called the searchlist and the replacementlist in the syntax. cdsr is a modifier (see below). Without any modifier, the operator returns the number of characters replaced in the $_ string. With the d modifier, the operator returns the number of characters deleted (see below). The transliteration operator transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list.

Read and try the following code:

use strict;

    my $str = "oneA twoB threeC fourD fiveE threeC";
    $_ = $str;
    my $no = tr/ABCDE/12345/;
    print $no . "\n";
    print $_;

The output is:

6
one1 two2 three3 four4 five5 three3

In the code of interest, there are 5 character corresponding pairs in the operator, but 6 replacements in the $_ string are made. That is why the output shows 6 for the number of replacement. Note that the sixth replacement comes from the change of “threeC” to “three3”. The value (content) of the string, $_ is modified, but that of $str. You can use the Search Box above for more details on transliteration.

The grep Function
The grep function uses regular expression to look for matches of elements in a list (array). In scalar context, it returns the number of elements in the list that matched. In list context it returns the elements that matched, as a list. The syntax is:

    grep EXPR,LIST

where EXPR is the regular expression pattern within forward slashes. Both EXPR and LIST can optionally be in parentheses. In parentheses, the grep function (operator) has the highest precedence in the statement. LIST, as usual can be anonymous or an array. Read and try the following code:

use strict;

    my $sca = grep (/[brc]at/, ("bat", "cat", "rat", "dog", "hen", "elephant"));
    print $sca, "\n";

    my @arr = grep (/[brc]at/, ("bat", "cat", "rat", "dog", "hen", "elephant"));
    print @arr, "\n";

End of Tutorial and end of Series
This is the end of the tutorial and the end of the series. I hope you appreciated the series.

Chrys

Broad Network

Related Articles

Functions Related to Regular Expressions in Perl

Advanced Perl Regular Expressions – Part 10

Introduction

Related Links

Comments