Broad Network


Metacharacters when Searching within a Site using Perl and MySQL

Search Within a Site using Perl and MySQL – Part 3

Web Development with Perl and MySQL

Foreword: In this part of the series, I talk about, Metacharacters when Searching within a Site with Perl and MySQL. You should have read the previous part of the series before reaching here, as this is a continuation.

By: Chrysanthus Date Published: 10 Sep 2016

Introduction

This is part 3 of my series, Search Within a Site using Perl and MySQL. In this part of the series, I talk about, Metacharacters when Searching within a Site with Perl and MySQL. You should have read the previous part of the series before reaching here, as this is a continuation.

In the previous part of the series, Perl regular expression and MySQL regular expression were used. Sometimes keywords have metacharacters. A good example of a word with metacharacters is the name of the computer language, C++. It has two plus’s that are metacharacters. A keyword is a normal word, which is important in a passage. So, in some articles, “C++” is a keyword. The question here is, how do you type a metacharacter in a regular expression (regex)? Also, how do you find and replace a metacharacter that you do not know, in a string?

In this tutorial I use the “C++” keyword for illustration. After that I talk about the problem of finding and replacing a metacharacter that you do not know, in a string.

Metacharacters
Metacharacters in Perl regex that I know are, { } [ ] ( ) ^ $ . | * + ? \ . Metacharacters in MySQL regex that I know are, ^ $  . * + ? | { } [ ]  : = > < . Do not confuse between white space characters and metacharacters. In Perl white space characters are: ‘\t’, ‘\r’, ‘\n’, and ‘\f’. In MySQL, white space (escape sequences) characters are: '\b', '\t', '\n', '\v', '\f', and '\r'.

How do you type a metacharacter in a regex? Answer: in Perl, you precede the metacharacter with a back slash, as in /C\+\+/. In MySQL you precede the metacharacter with double back slashes as in "C\\+\\+"

Metacharacters and the Perl Search Engine Script
I talk about the Perl file of the previous part of the series, here. The Perl file used regular expression technique in three places. In two of these places Perl regular expression operations were use. In the first place, an operation is used to remove the non-keywords from the search phrase. In the second place, the keywords of the new search phrase are placed in an array. In the third place, Perl “wraps” a MySQL regular expression technique; this happened at the formation of the MySQL Select Query.

In the first place, the Perl operation is:

            $searchStr =~ s/\b$nonKeywords[$i]\b//g;

In the second place, the Perl operation is:

            my @searchStrArr = $searchStr =~ /\b\w+\b/g;

In the third place, you have the code segment:

            #form the WHERE clause of SQL select statement
            my $numberWords = @searchStrArr; #no. of keywords
            my $firstKeyword = $searchStrArr[0];
            my $whereStr = " WHERE (series.keywords rLike \"$firstKeyword\")";
            my $temp;
            if ($numberWords > 1)
                {
                    for(my $j=1; $j<$numberWords; ++$j)
                        {
                            $temp = $searchStrArr[$j];
                            $whereStr .= " AND (series.keywords rLike \"$temp\")";
                        }
                }

Here, “rLike” is MySQL regular expression operator and the MySQL regex is in the Perl variable, $temp.

Search Phrase Having C++
If the search phrase has “C++” for the above code as given, the “C++” word will not be selected into the array, @searchStrArr. This is because + is a metacharacter. To solve the problem, the second place of regular expression above has to be rewritten as:

            my @searchStrArr = $searchStr =~ /c\+\+|\b\w+\b/ig;

The regex is now /c\+\+|\b\w+\b/ instead of just, /\b\w+\b/. Note that the + signs have been escaped. So the operation now searches for an ordinary word (\b\w+\b) or C++ (c\+\+) and place in the array. The search is case insensitive, so c or C mean the same thing.

Unknown Metacharacter in a Keyword
Above, we know that the peculiar word is “C++”, and the metacharacter is, +. In theory you can have situations when you are not sure of the peculiar word and you are not sure of the metacharacter. In this case, the Perl file has to be modified again.

One way to do this is to know the different possible types of peculiar words, and then adjust the file accordingly. Assuming that there are three possible peculiar words, which are, “C++”, “C**”, and “W^^H”, the Perl operation in the second place for regular expression above would be:

            my @searchStrArr = $searchStr =~ /c\+\+|c\*\*|w\^\^h|\b\w+\b/ig;

Wow, I find the production of this series exciting; I hope you find it interesting. We have come to the end of this part of the series. We take a break here and continue in the next part.

Chrys

Related Links

Web Development Basics with Perl and MySQL
Perl Validation of HTML Form Data
Page Views with Ajax and Perl and MySQL
Web Live Text Chart Application using Perl and MySQL
Search Within a Site using Perl and MySQL
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course

BACK

Comments

Become the Writer's Follower
Send the Writer a Message