Code for Search within a Site using Perl and MySQL

Search Within a Site using Perl and MySQL – Part 2

Web Development with Perl and MySQL

Foreword: In this part of the series I explain the code of the Perl file used in searching a site with MySQL.

By: Chrysanthus Date Published: 10 Sep 2016

Introduction

This is part 2 of my series, Search Within a Site using Perl and MySQL. In this part of the series I explain the code of the Perl file used in searching a site with MySQL. The code is my design. I gave you an overview of this topic in the previous part of the series. You should be reading the parts of this series in the order given; this is a continuation of the previous part. The interface code and the database query are in one Perl file. I gave you the client web page Form code in the previous part of the series and that will not be repeated here.

The Perl File Segments
There are three major code segments in the file. The first segment obtains the search phrase sent from the client. Remember, the Perl file is at the server. The next major segment, removes the non-keywords from the search phrase, places the keywords in a Perl array and then prepares the SQL query in string form. The third major segment connects to the database, sends the SQL query to the database, receives the result from the database and then prepares the HTML code for the result and send to the client’s browser. The result is the feedback seen by the client. The second and third code segments are further divided into smaller segments.

The First Code Segment
The first major code segment is quite short. It is:

use Mysql;
use CGI;
use strict;

print "Content-Type: text/html\n\n";

    my $obj = CGI->new();
    my $searchStr  = $obj->param('searchStr');

It gets the search phrase from the HTTP POST variable and assigns to the ordinary Perl variable, $searchStr. Also note that we are using the PurePerl MySQL API that is easy to use and easy to install.

The Second Code Segment
The skeleton of the second major code segment is:

    if ($searchStr eq "")
        {
            print "<strong>Search string is empty!</strong>";
        }
    else
        {
            - - - - - -
        }

The if-condition checks if the search string was empty. If it was, the if-block informs the user so. Under this condition, the SQL Select query is not effectively form and no further information is developed and sent to the user. In your commercial project, you should use a better algorithm that will prevent the query from being sent to the database, when the search string is empty. The algorithm here does not do that.

If the search string is not empty, the else part of the if-construct is executed. This else part consists of smaller code segments. The first small code segment here removes the non-keywords from the search string. So, the ultimate search phrase is void of non-keywords. The second small segment here places the keywords into an array. The third small segment here, forms the WHERE clause for the database SQL query. The fourth and last small code segment here prepares the complete SQL query string in Perl terms.
The small segment to remove the keywords from the search phrase is:

            #remove the non-keywords using regex
            my @nonKeywords = ("about", "along", "among", "before", "after", "by", "for", "in", "from", "on", "of", "since", "to", "until", "till", "up", "with", "between", "the", "a", "an", "while", "whereas", "since", "as", "for", "therefore", "but", "and", "or", "I", "you", "he", "she", "we", "they", "me", "him", "her", "us", "them", "my", "your", "his", "her", "our", "their", "mine", "yours", "hers", "ours", "theirs", "some", "it", "its", "few", "many", "much", "little");

            my $arrLength = @nonKeywords; # no. of elements in the array

            for (my $i=0; $i<$arrLength; ++$i)
             {
             $searchStr =~ s/\b$nonKeywords[$i]\b//g;
             }

It begins with an array of all non-keywords. I analyzed these non-keywords in the previous part of the series. Then the length of the array is determined and assigned to the variable, $arrLength. The for-loop removes the non-keywords from the search phrase using the s/// regex (regular expression) operator. If you have covered the prerequisite for this series, then understanding the code of this series should not be a problem.

The second small code segment is:

            #place each word of search string into an array
            my @searchStrArr = $searchStr =~ /\b\w+\b/g;

At this point the search phrase no longer has the non-keywords. This segment has just one statement. The statement uses the Perl global regex to copy the keywords from the new search phrase into an array. The regex here is, “/\b\w+\b/g” meaning all words.

The third small code segment is:

            #form the WHERE clause of SQL select statement
            my $numberWords = @searchStrArr; #no. of keywords
            my $firstKeyword = $searchStrArr[0];
            my $whereStr = " WHERE (series.keywords rLike \"$firstKeyword\")";
            my $temp;
            if ($numberWords > 1)
             {
             for(my $j=1; $j<$numberWords; ++$j)
                        {
                         $temp = $searchStrArr[$j];
                         $whereStr .= " AND (series.keywords rLike \"$temp\")";
                        }
             }

It begins by counting the number of elements in the new search phrase array and assigning to the variable, $numberWords. The first keyword of the array is then assigned to the variable, $firstKeyword. The first part of the WHERE clause is determined and assigned to the variable, $whereStr.

If the new search phrase in the array consisted of more than one keyword, then the if-construct builds up the rest of the WHERE clause, otherwise the WHERE clause would have only one keyword. The for-loop is what actually builds the rest of the WHERE clause and string-append to the variable, $whereStr. In the last statement of the for-loop, the variable $temp is used.

The last small code segment is:

            #complete SQL select statement
            my $selectStr = "select series.directoryName AS directoryName, series.htmlfilename AS filename, series.title1 AS title, series.forward AS foreword, writers.firstname AS firstname FROM series inner join articles inner join writers on series.articleID=articles.articleID AND articles.writerID=writers.writerID" . $whereStr;

This statement assigns the complete database SQL query in Perl terms, to the variable, $selectStr. It string-appends the complete WHERE clause to the first part of the database SQL query.

The Database SQL Query
An example of the MySQL Select Query without Perl, for this project is:

select series.directoryName AS directoryName, series.htmlfilename AS filename, series.title1 AS title, series.forward AS forward, writers.firstname AS firstname FROM series inner join articles inner join writers on series.articleID=articles.articleID AND articles.writerID=writers.writerID WHERE
(series.keywords rLike "relationship") AND (series.keywords rLike "man") A
ND (series.keywords rLike "woman");

This Select query looks into the database for article data (rows) of articles that are based on “relationship between man and woman”. This quotation is the search phrase the user would type. The Perl file would remove the preposition, “between”, leaving you with an array consisting of the keywords, “relationship”, “man” and “woman”.

You should be able to understand the part of the Select query before the WHERE clause. The WHERE clause is:

    WHERE (series.keywords rLike "relationship") AND (series.keywords rLike "man") AND (series.keywords rLike "woman");

So, for each cell in the keywords column of the series table in the database, the query checks if “relationship” is present and if “man” is present and if “woman” is present. In the statement, “rLike” is of MySQL regular expression technique. The two same Boolean logic operators of “AND” are of SQL. Because of the ANDs, all three keywords, "relationship",  "man" and "woman" must be present in the cell. If any of these words is absent, the row is not selected. Remember, a cell of the keywords column of the series table, has article keywords separated by commas.

So, the more keywords the user has in his search phrase, the less the number of relevant rows selected.

I hope at this point, you can go back to the third small code segment above and understand how the WHERE clause is produced. Note, the user can type more than three keywords or less than three keywords. Above, there is an array that holds the keywords typed. The number of keywords in the search phrase determines the array length and so the length (number of ANDs) of the WHERE clause.

The Third Code Segment
The skeleton of the third major code segment of the Perl file is:

        #connect to DB and obtain resultset
        if (!Mysql::connect("user", "psswrd", "localhost", 3306))
         {
                print "<body style='background-color:bisque'><h3>$Mysql::Error_msg</h3>";
         }
        else
         {
                #select database
                 if (!Mysql::select_db("dbase"))
                 {
                        print "<body style='background-color:bisque'><h3>$Mysql::Error_msg</h3>";
                 }
                else
                 {
                         my $resultFound = 0;

                         #for no result found
                         if ($resultFound == 0)
                         {
                                print "<h3>No result found.</h3>";
                         }

                 }
                Mysql::close();
         }

This segment begins by attempting the connection. If that fails it sends an error message to the user and no database search is done. The else part of the if-construct here is what does the search. At the end of the segment, if no rows match the search phrase (keywords) the user is told so. At the end of the segment, the connection is closed.

Note, the feedback page sent to the client’s browser is formed by the Perl file using a number of print statements.

The part of the code that does the searching and sends the resultset HTML list to the user’s browser consists of two small code segments.

The first small code segment is:

                        if (!Mysql::query($selectStr))
                         {
                                print $Mysql::Error_msg, "<br>";
                                $resultFound = 0;
                         }
                        else
                         {

                         }

The query is sent to the database. The variable, $resultFound is declared. If the resultset is empty, the value of this variable is false, 0; if it is not empty, the value of the variable is true, 1.

The second small code segment is:

                                if ($Mysql::No_of_Rows > 0)
                                 {
                                 $resultFound = 1;

                                 print "<h3>Data Found</h3>";

                                 for (my $i=0; $i<$Mysql::No_of_Rows; ++$i)
                                 {
                                 print "<p>";
                                 print "<a href=\"http://www.somesite.com/$Mysql::Result[$i]{'directoryName'}/$Mysql::Result[$i]{'filename'}\"><strong>$Mysql::Result[$i]{'title'}</strong></a><br>";
                                 print $Mysql::Result[$i]{'foreword'};
                                 print "<br>";
                                 print "<i>by: $Mysql::Result[$i]{'firstname'}</i>";
                                 print "</p>";
                                 }
                                 }

This segment only operates if the resultset has some rows (more than zero). This is tested in the if-condition ($Mysql::No_of_Rows > 0). If that is true, the first thing the if-block does is to assign the value of true to the variable, $resultFound, indicating that rows (result) have been found. The next line sends the heading of the result list to the user’s browser.

After that you have a for-loop. The for-loop fetches the rows of the resultset. The body of the for-loop deals with one resultset row at a time. The HTML list item is prepared and sent to the browser using the row values. Note that here, the title of the article is sent as a hyperlink. The href value of the hyperlink uses the address of the site, which the programmer should already know, the name of the subdirectory from 'directoryName' and the value of the filename from 'firstname'. Remember, a Perl variable such as Result[$i]{'directoryName'} in a double quoted string expands (is replaced by) to its value.

Regular Expression Techniques used
The Perl file uses regular expression technique in three places. In two of these places Perl regular expression operations are use. In the first place, an operation is used to remove the non-keywords from the search phrase. In the second place, the keywords of the new search phrase are placed in an array. In the third place, Perl “wraps” a MySQL regular expression; this happens at the formation of the MySQL Select Query.

Database to be created
As you can see from this and the previous parts of the tutorial series, the database has to be created. For each HTML article, rows must be inserted into the series, articles, and writers tables. So, you or someone else has to create the database.

Wow, impressive, and not as difficult as it seemed, prerequisite haven accomplished! Time to take a break. See you in the next part of the series.

Chrys

Broad Network

Related Articles

Code for Search within a Site using Perl and MySQL

Search Within a Site using Perl and MySQL – Part 2

Web Development with Perl and MySQL

Introduction

Related Links

Comments