Broad Network


Characters and Substrings in Perl

Handling Strings in Perl – Part 1

Perl Course

Foreword: In this part of the series I talk about the handling of characters and sub-strings. Sections covered are: Quoting a Character, ASCII Character, The chr() Function, The ord() Function, Position of a Substring, Position of First Occurrence of Substring, Position of Different Occurrences of Substring, Position of Last Occurrence of Substring, The index Function, The rindex Function, The substr Function, Search and Replace.

By: Chrysanthus Date Published: 23 Oct 2015

Introduction

This is part 1 of my series, Handling Strings in Perl. In this part of the series I talk about the handling of characters and sub-strings. Sections covered are: Quoting a Character, ASCII Character, The chr() Function, The ord() Function, Position of a Substring, Position of First Occurrence of Substring, Position of Different Occurrences of Substring, Position of Last Occurrence of Substring, The index Function, The rindex Function, The substr Function, Search and Replace.

Remember, in a string, a space is a character.

Pre-Knowledge
This series is part of the volume, Perl Course. At the bottom of this page, you have links to the different series you should have read before coming here, as this series is a continuation.

Quoting a Character
A character can be quoted in single or double quotes, e.g. 'A' or "A". I prefer the single quotes.

ASCII Character
The ASCII character set is a set of characters with corresponding decimal numbers. These decimal numbers are ASCII codes. The following are some characters and their codes:

A - 65
B -  66
a - 97
b - 98
@ - 64
/ - 47
* -  42

The chr() Function
This function takes an ASCII code as argument and converts it into the corresponding character. It would also take a Unicode as argument and converts it into the corresponding character. The ASCII code for A is 65. The Unicode for a smiley face is 0x263a (your system may not respond well to Unicode). The syntaxes are,

    chr NUMBER
    chr

If number is omitted, the value of $_ is used.

Try the following code:

use strict;

    my $char1 = chr(65);
    my $char2 = chr(0x263a);

    print $char1, "\n";
    print $char2, "\n";

The following code prints all ASCII characters from 32 to 126;

use strict;

    print chr($_), " " foreach (32..126);

The ord() Function
This function does the reverse of chr().The syntaxes are:

    ord EXPR
    ord

It returns the numeric value (code) of the first character of EXPR. If EXPR is an empty string, returns 0. If EXPR is omitted, uses $_ .

Try the following code:

use strict;

    my $num = ord("Booking");
    print $num;

The output is:

    66

A string as argument should be in quotes.

Index
Index counting of the position of the characters in a string begins from zero.

Position of a Substring
A substring is a small part of the string (main string). The first occurrence of a substring is the first occurrence of the first character of the substring, looking from left to right. The last occurrence of a substring is the last occurrence of the first character of the substring. The sub-string can consist of just one character or more than one characters.

Position of First Occurrence of Substring
The predefined pos() function returns the offset (index) for the next match, after the recent match of a regex with the g modifier. So use the binding operator and the pos() function, and then subtract the length of the substring. Try the following code, where the phrase, “a multipurpose” is searched:

use strict;

    my $subStr = 'a multipurpose';
    my $str = "A Star Wars Lightsaber looks like a multipurpose weapon for the future.";
    $str =~ /$subStr/g;
    my $firstPos = pos($str) - length($subStr);
    print $firstPos;

The argument to the pos function is the main string.

The output is 34.

Position of Different Occurrences of Substring
You use the same technique as above, but with a while-loop, subtracting the length of substring for each iteration. Try the following code:

use strict;

    my $subStr = 'cat';
    my $str = "A cat is an animal. A cat is a creature.";

    print "Next Position is: ", pos($str) - length($subStr), "\n" while ($str =~ /cat/g);

The output is:

    Next Position is: 2
    Next Position is: 22

Position of Last Occurrence of Substring
The technique is similar to the above but you only print (or assign number) after the while loop. Try the following code:

use strict;

    my $subStr = 'cat';
    my $str = "A cat is an animal. A cat is a creature.";

    my $lastPos;
    $lastPos = pos($str) - length($subStr) while ($str =~ /cat/g);

    print $lastPos;

You can make the above searches cases insensitive. Just change the regexes to /$subStr/ig and /cat/ig .

The index Function
Before I continue, remember, index counting begins from 0 and not 1. So, for the string:

    "abcdefghijklmopq"

a is at index zero, b is at index 1, c is at index 3 and so on.

Assume that you want to know the index at which the sub-string, "ghijk" starts (is found) in the above string, after index 2 (for c), you would type:

    index($str, $substr, 2)

where $str has the main string, $substr has the sub string and 2 is the index position at which to start the search. Try the following code:

use strict;

    my $str = "abcdefghijklmopq";
    my $substr = "ghijk";
    my $indx = index($str, $substr, 2);
    print $indx;

The output is 6.

Note: if the sub-string is not found in the main string, the index function returns –1.

If you want the search to begin from the beginning of the string, you just omit the last argument (2) in the index function. In this case, the index function becomes:

    index($str, $substr);

You can also use this function to find the index of a character. Read and try the following code:

use strict;

    my $str = "abcdefghijklmopq";
    my $substr = 'k';
    my $indx = index($str, $substr);
    print $indx;

The output is 10.

If there are multiple occurrences of the character (or sub-string), then the first occurrence is the one found. Try the following code for the space character:

use strict;

    my $str = "a b c d e f g h i j k l m o p q";
    my $substr = ' ';
    my $indx = index($str, $substr);
    print $indx;

The output is 1, because the first occurrence of the space character is at index 1 (index counting begins from 0).

Now, here are the syntaxes for the index function from the Perl specification:

    index STR,SUBSTR,POSITION

    index STR,SUBSTR

The rindex Function
The rindex function is the opposite of the index function in the sense that it searches for the last occurrence of the sub-string in the main string. The index function searches for the first occurrence of the sub-string. rindex works just like index() except that it returns the position of the last occurrence of SUBSTR in STR. If POSITION is specified, returns the last occurrence beginning at or before that position. The syntaxes for the rindex function are:

    rindex STR,SUBSTR,POSITION

    rindex STR,SUBSTR

Read and try the following code for the rindex function for the space character:

use strict;

    my $str = "a b c d e f g h i j k l m o p q";
    my $substr = ' ';
    my $indx = rindex($str, $substr);
    print $indx;

The output is 29, because the last space is at index position 29.

The substr Function
This function is used to extract a portion of a string. The extracted portion can be replaced. It returns the extracted portion. If the extracted portion is not replaced the original string remains unchanged. This function has three syntaxes. One of them is:

    substr EXPR,OFFSET

where EXPR is the main string to extract a portion from and OFFSET is the index in the string where the extraction starts. Remember, index counting in a string begins from zero. Read and try the following code:

use strict;

    my $str = "one two three four five";
    my $ret = substr($str, 14);
    print $ret, "\n";
    print $str, "\n";

With the above syntax, all the characters from the offset index to the end of the string are extracted.

It is possible to extract from the offset point to a point before the end of the string. You achieve this by indicating the length of sub-string to be extracted, in number of characters, as in the following syntax:

    substr EXPR,OFFSET,LENGTH

Read and try the following code:

use strict;

    my $str = "one two three four five";
    my $ret = substr($str, 14, 5);
    print $ret, "\n";
    print $str, "\n";

Remember, a space is a character, so the five characters extracted above are ‘f’, ‘o’, ‘u’, ‘r’ and ‘ ’.

You can also replace the extracted sub-string using the following syntax:

    substr EXPR,OFFSET,LENGTH,REPLACEMENT

where REPLACEMENT is the new sub-string to replace the extracted sub-string. Try the following code:

use strict;

    my $str = "one two three four five";
    my $ret = substr($str, 14, 4, "ffff");
    print $ret, "\n";
    print $str, "\n";

Search and Replace Known Substring
To search and/or replace known substring, use the substitution operator, s///. To make the search case insensitive, use the i modifier. To replace more than one occurrence of the substring, use the g modifier. To replace a particular occurrence (number), use the pos() function and while-loop, as well.

That is it for this part of the series. We stop here and continue in the next part.

Chrys

Related Links

Perl Basics
Perl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course

NEXT

Comments

Become the Writer's Fan
Send the Writer a Message