Search and Replace with Perl Regular Expressions

Perl Regular Expressions – Part 7

Perl Course

Foreword: In this part of the series, I explain how to search for a sub string in a subject and then replace it.

By: Chrysanthus Date Published: 6 Oct 2015

Introduction

This is part 7 of my series, Perl Regular Expressions. In this part of the series, I explain how to search for a sub string in a subject and then replace it. You should have read the previous parts of the series before coming here, as this is a continuation.

The Substitution Operator
The substitution operator is the search and replace operator. It is

    s///

The syntax to use it is:

    s/search/replacement/

where search is a pattern; and replacement is a replacement sub string. The following code illustrates its use:

use strict;

my $subject = "I am a man.";

my $ret = $subject =~ s/man/woman/;

print $subject;

The output is:

I am a woman.

From the output, we see that the subject acquires the replacement; that is, the subject is changed to have the replacement. If the search is not found, no replacement takes place. Note that you use the s/// with the binding operator. The subject cannot be a string literal. So the following statement will not work:

    my $ret = "I am a man." =~ s/man/woman/;

In scalar context, when the search is found the return value of the substitution expression is 1, otherwise the return value is undef (false). Try the following code and you will have 1 for the return value.

use strict;

my $subject = "I am a man.";

my $ret = $subject =~ s/man/woman/;

print $ret;

Effect of the g Modifier
With the g modifier, all the occurrences of the regex is replaced. Read and try the following code:

use strict;

my $subject = "I am a man. You are a man. He is a man.";

$subject =~ s/man/woman/g;

print $subject;

The output is:

    I am a woman. You are a woman. He is a woman.

Note where the g modifier has been used. If the g modifier had not been used, only the first item found in the subject (from left) will be replaced. Also note that the return value (true or false) of the substitution expression must not necessarily be assigned to a variable, as in the above code.

You can use other modifiers as you think they are relevant. With the i modifier, the search becomes case insensitive. Try the following code:

use strict;

my $subject = "I am a Man. You are a Man. He is a Man.";

$subject =~ s/man/woman/ig;

print $subject;

The output is:

    I am a woman. You are a woman. He is a woman.

Any modifier is for the search component, not the replacement component.

The r Modifier
The r modifier is called the non-destructive modifier. Notice above that when the replacement is done, the subject string is changed. If you do not want this to happen, then you have to use the r modifier. In this case the return string will be a new subject string with the replacement, while the original subject string is unchanged. You will have to assign the new subject string to a new variable. Read and try the following code:

use strict;

my $subject = "I am a man.";

my $ret = $subject =~ s/man/woman/r;

print $subject, "\n";
print $ret, "\n";

The output is:

    I am a man.
    I am a woman.

Transliteration
In Perl transliteration means, you go into a subject string; look for particular characters and replace the characters with a list of corresponding new characters. So you replace one list of characters within another list of characters. Read and try the following code:

use strict;

    my $subject = "ABCD EFGH AB";

    my $ret = $subject =~ tr/ABCD/0123/;
    print $subject;

The output is:

    0123 EFGH 01

In this code, the old-list is ABCD and the new-list is 0123. A particular character in the subject can be replaced more than once depending on its occurrence in the subject. Well, this code is not quite useful. Try the following code, which attempts to change a phrase into title case; that is, change the first character of an important word in the phrase, from lower to upper case:

use strict;

    my $subject = "the last process in the literate school";

    my $ret = $subject =~ tr/tlps/TLPS/;
    print $subject;

The Output is:

    The LaST ProceSS in The LiTeraTe SchooL

This is a more useful code than the previous one, but does not have perfect result. The old-list is tlps and the new-list is TLPS. Transliteration is case sensitive.

The transliteration operator, which is tr/// works with the binding operator, =~. The return value of the transliteration expression is the number of characters replaced and deleted. If no character is replaced or deleted, 0 is returned. Try the following code:

use strict;

    my $subject = "the last process in the literate school";

    my $ret = $subject =~ tr/tlps/TLPS/;
    print $ret;

The output is, 13, meaning 13 characters where changed in the subject. Any transliterated character that occurred more than once in the subject, is counted as many times as it was changed, in the overall counting.

You do not always have to assign the return value to a variable. In the following code, the return value of the transliteration expression has not been assigned.

use strict;

    my $subject = "ABCD EFGH AB";

    $subject =~ tr/ABCD/0123/;

    print $subject;

The output is 0123 EFGH 01 and the code works well.

To delete a character in the subject (and all its occurrences), type the character at the end of the old-list; and in the new-list, do not type any replacement for the character; then use the d modifier. The following code deletes the characters, A, B and space:

use strict;

    my $subject = "ABCD EFGH AB";

    my $ret = $subject =~ tr/CDAB /23/d;

    print $subject, "\n";
    print $ret;

The output is:

    23EFGH
    8

indicating that the characters A, B and space in the subject were deleted. A total of 8 characters (including repeated characters and spaces) were replaced or deleted: 2 characters were replaced and 6 were deleted.

Notice that in the above code samples, the subject string is changed. If you do not want the subject string to be changed, use the r modifier. With the r modifier, the return value becomes a new string with the changes, while the original subject string remains unchanged. Read and try the following code:

use strict;

    my $subject = "ABCD EFGH AB";

    my $ret = $subject =~ tr/CDAB /23/dr;

    print $subject, "\n";
    print $ret;

The output is:

    ABCD EFGH AB
    23EFGH

The syntax for the transliteration operator is:

    tr/SEARCHLIST/REPLACEMENTLIST/cdsr

where SEARCHLIST is old-list and REPLACEMENTLIST is new-list. Remember that this operator works with the biding operator. You already know the meaning of the modifiers, d and r.

To understand the use of the c modifier, we shall have to use two code samples. You can replace all the characters in the SEARCHLIST, with only one character, as the following code illustrates (try it):

use strict;

    my $subject = "ABCD EFGH AB";

    $subject =~ tr/ABCD/*/;
    print $subject;

The output is:

    **** EFGH **

So, each occurrence of A or B or C or D in the subject has been replaced by * . Now, in order to replace all the characters in the subject that are not in the SEARCHLIST, with a single character, you have to use the complement modifier, c. Read and try the following code:

use strict;

    my $subject = "ABCD EFGH AB";

    $subject =~ tr/ABCD/*/c;
    print $subject;

The output is:

    ABCD******AB

Note that all the characters (including space) that were not in the SEARCHLIST have been replaced by * .

The s modifier is used to replace a sequence of characters in the subject, to another sequence of characters. The number of characters in the new sequence should not be more than the number of characters in the old sequence. Read and try the following code:

use strict;

    my $subject = "ABCD EFGH AB";

    $subject =~ tr/AB/XY/s;
    print $subject;

The output is:

    XY EFGH XY

The old sequence is AB and the new sequence is XY.

Do not confuse between the s modifier here and the s modifier for the ordinary binding operator, which makes the dot metacharacter to match every \n character in the subject, making a string with lines appear as a single line (single string).

Before we continue, know that the left operand (argument) to the transliteration operator (including =~) cannot be a string literal – it has to be a variable. So the following statement will not work:

    "ABCD EFGH AB" =~ tr/ABCD/0123/;

while the following code will work:

    my $subject = "ABCD EFGH AB";
    $subject =~ tr/ABCD/0123/;

Summary of the use of the Modifiers with tr///
c: Complement the SEARCHLIST.
d: Delete found but unreplaced characters.
s: Squash duplicate replaced characters. Can change ee to e, oo to o, etc.
r: Return the modified string and leave the original string untouched.

Some Uses of Transliteration
The above code samples just look like transliteration exercises. For the rest of this tutorial I talk about possible uses of transliteration.

Changing to Lower case and Vise Versa
The following code uses the simple ranges A-Z and a-z to change all alphabets in a string to lowercase:

use strict;

    my $subject = "Html, BODY, TABLE";

    $subject =~ tr/A-Z/a-z/;

    print $subject;

The output is:

    html, body, table

Any character that was already in lowercase remains in lowercase (they are not replaced). To change all alphabets to uppercase, swap the positions of A-Z and a-z in tr///.

Changing \ to /
For the Windows Operating System, a path is typed as follows:

    c:\dir1\dir2\file.ext

With other operating systems a path is typed as follows:

    c:/dir1/dir2/file.ext

The following code changes \ to / in the subject:

use strict;

    my $subject = 'c:\dir1\dir2\file.ext';

    $subject =~ tr/\\/\//;
    print $subject;

The original subject string is within single quotes, not double quotes, to prevent \d and \f within it from expanding into different values (characters). Note that in the tr/// operation, \ and / have been escaped. The output is:

    c:/dir1/dir2/file.ext

Counting the Number of a Particular Character in a String
To achieve this, you let the SEARCHLIST be the particular character in the string and you let the REPLACEMENTLIST be the same character. Read and try the following code:

use strict;

    my $subject = "I have 15. You have 45. She has nothing";

    my $ret = $subject =~ tr/e/e/;
    print $ret;

The output is 3 for three e’s. In the subject, any e has been replaced by e, ending up just counting the e’s.

Counting the Number of Digits in a String
To achieve this, let the SEARCHLIST be the simple range, 0-9; and let the REPLACEMENTLIST be nothing (not space). Read and try the following code:

use strict;

    my $subject = "I have 15. You have 45. She has nothing";

    my $ret = $subject =~ tr/0-9//;
    print $ret;

The output is, 4 for the four digits.

There are other uses of transliteration that are more technical and involve the modifiers.

Time to take a break. We stop here and continue in the next part of the series.

Chrys

Broad Network

Related Articles

Search and Replace with Perl Regular Expressions

Perl Regular Expressions – Part 7

Perl Course

Introduction

Related Links

Comments