Broad Network


Backreferences in Perl Regular Expression

Advanced Perl Regular Expressions – Part 7

Foreword: In this part of the series, I explain how a group in a regex can be represented by a figure, ahead in the same regex.

By: Chrysanthus Date Published: 2 Apr 2016

Introduction

This is part 7 of my series, Advanced Perl Regular Expressions. In this part of the series, I explain how a group in a regex can be represented by a figure, ahead in the same regex. The possible figures are \g{-1}, \g{-2}, \g{-3}, etc. The \g{-1}, \g{-2}, \g{-3}, etc. are known as the special predefined backreferences. Remember, a group in a regex is a sub-pattern with parentheses. You should have read the previous parts of the series, because this is a continuation.

Back Reference
Normally, when a writer types two consecutive words that are the same, it is a mistake. You may want to identify such a sequence in a subject string. Consider the following subject:

    my $subject = "He has one  one of the books";

Here, the sub-string “one  one” accidentally typed, begins with “one”, then 1 or more character spaces and then “one” again. You may want to identify this sub-string. The pattern for the first word of interest is, \b\w\w\w\b . The pattern for 1 or more spaces is, \s+ . The pattern for the next word of interest is \b\w\w\w\b. Note that the two words of interest, one of which repeats, have the same pattern (sub-pattern). If you want to match the sub-string with the repeated word, you do not have to type the pattern for the word twice. A more mature regex to use is,

    /(\b\w\w\w\b)\s+\g{-1}/

In this expression, \g{-1} represents the previous, (\b\w\w\w\b) within the regex, making,

    /(\b\w\w\w\b)\s+(\b\w\w\w\b)/

equivalent to,

    /(\b\w\w\w\b)\s+\g{-1}/

which will match “one  one”. As indicated above, g{-1} represents a previous grouping in the regex. Actually the above regex would match any three-letter words that repeat, e.g. “the the”, “him him”, “man  man”, etc. However, you can use this same scheme to match a two-syllabus word, where the syllabuses are the same. So the following binding operation will produce a match:

    "What does beriberi mean?" =~ /(beri)\g{-1}/

What about the situation when you have more than two previous patterns distributed out in the regex and you want to repeat them in the same regex ahead? This is where you need \g{-1} for the previous pattern on the left in the regex, \g{-2} for the other previous pattern further on the left, \g{-3} for yet another previous pattern much further on the left in the regex, and so on. Consider the following binding operation that produces a match:

    "Listen: A boy and a girl! Which boy and which girl?" =~ /((boy).+(girl).+\g{-2}.+\g{-1})/;

The phrase matched is, “boy and a girl! Which boy and which girl”, where in the regex, (boy) is for ”boy”, (girl) is for “girl”, then \g{-1} is for (girl) and \g{-2} is for (boy).

Read and try the following code that uses the above expressions:

use strict;

    my $subject = "He has one  one of the books";
    $subject =~ /((\b\w\w\w\b)\s+\g{-1})/;
    print $1, "\n";

    "What does beriberi mean?" =~ /((beri)\g{-1})/;
    print $1, "\n";
    
    "Listen: A boy and a girl! Which boy and which girl?" =~ /((boy).+(girl).+\g{-2}.+\g{-1})/;
    print $1, "\n";
    print $2, ', ', $3, ' ', $4;

The overall pattern in each case has been placed in a lager group for capturing with $1, $2, $3, etc. Remember, after each successful capturing, the variables, $1, $2, $3, etc. are reset. If a matching fails (returns false) or there is no capturing, these variables are not reset.

That is it for this part of the series. We stop here and continue in the next part.

Chrys

Related Links

Perl Basics
Perl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course

BACK NEXT

Comments

Become the Writer's Fan
Send the Writer a Message