Overview of Search Within a Site using Perl and MySQL

Search Within a Site using Perl and MySQL – Part 1

Web Development with Perl and MySQL

Foreword: In this part of the series, I give you an overview for Searching within a Site using Perl and MySQL.

By: Chrysanthus Date Published: 10 Sep 2016

Introduction

This is part 1 of my series, Search Within a Site using Perl and MySQL. In this part of the series, I give you an overview for Searching within a Site using Perl and MySQL. We are dealing with one website, whose web server uses Perl, the PurePerl MySQL API, and has a MySQL database. The data searched are data on web pages. The data is either searched from the database or from the web pages themselves. The links to the HTML files are created and displayed at the user’s web page.

The first 3 parts of the series talks about the search from a database. The last 2 parts talks about the search from the web pages themselves without the use of a database.

Prerequisite
Here, I talk about what you should have studied in order to understand this series. This series is part of my Web Development Course with Perl and MySQL. Below you have the links to all the series in the volume, in the order you should be reading them.

User Requirements
- The user is in his house or office with his browser connected to the Internet.
- There is a website with a database at a web server.
- The website is a social network site consisting mainly of articles in HTML format.
- When the user wants to search for a particular kind of articles, he would type at his browser, the search phrase in a form text field in any of the web pages of the site and then click the Search Button.
- The search phrase goes to the server, and after a while, the user would have a return web page indicating whether or not any articles where found.
- If articles where found, he would see a list of possible articles of his interest.
- When he clicks the title of an article found, the actual article would be downloaded from server and displayed at his browser.

Technical Requirements
- The web server has a home directory with sub directories of the home directory.
- In the home directory, you have the index HTML file and a few other HTML files.
- Each subdirectory to the home directory belongs to a writer of articles and the subdirectory has all the articles of that particular writer.
- The articles are HTML files.
- Each HTML file in the home page and each HTML file in a subdirectory has a form text field in which the user can type in a new search phrase to search for a particular kind of articles.
- The subdirectory name, the article HTML file name, the title of article, the first name of writer, the keywords and the foreword for each article, are kept in the database.
- The search process at the server, searches the database and not the HTML article files.
- When relevant data related to the articles are found, hyperlinks to the articles are formed with associated information and placed in a web page and sent to the user’s browser. The last 2 parts of this series will talk about searching the web pages and not the database.
- The link the user can click to obtain the article has the title of the article as link text.
- The client script is ECMAScript, the server script is Perl script and the database is MySQL. The application programming interface that connects the client to the database, is PurePerl MySQL API.

Summary of Application Code
At the client computer, there is a web page with an HTML Form having one text field and one button with the title, “Search This Site”. The form has a supporting ECMAScript. At the server, you have a Perl script. This Perl script receives the search phrase, searches the database, prepares the feedback page and sends the feedback page to the user’s browser. So far as the search is concerned, the database has three tables, which are called, series, articles and writers.

The series table has the following columns: a column of subdirectory names, a column of article HTML file names, a column of titles for articles, a column of forewords for each article and a column of keywords for each article. In the column of keywords, each cell has a number of keywords for the corresponding article. The writers table has a column for the first name of writers. I talk about the articles table and the other two tables in more detail below.

The Search Form
The search form is on each article (HTML file) and on each of the HTML files of the home directory. The controls in the form are, the text field to type in the search phrase and the button to send the search phrase to the server. The code is:

    <form action='searchThisSite.pl' method='post' style='display:inline; float:right'>
        <span id='S1' style='display:none'><b>Please Wait . . . </b></span>
        <input type='search' size='40' name='searchStr'>
        <button type='submit' onclick="document.getElementById('S1').style.display = 'inline'">Search this Site</button>
    </form>

There are actually three HTML elements in the form. You have the span element, the input text element and the button element. The span element has the content, “Please Wait . . .” This span element is displayed only when the user has clicked the button, which is the search button. Note that the value of the control type attribute for the text field is ‘search’ and not ‘text’. If you type this today, some browsers will give you a better looking text field, better than if you typed, “text”.

The ECMAScript is the button’s onclick event’s code. This very short script displays the span element, and the user sees, “Please Wait . . .” when the button is clicked. The form uses the HTTP POST method to send the search phrase to the server. At the server, the search phrase is received by the Perl file, searchThisSite.pl. You can call a Perl file in this context, a Perl script or server script. Above is all the code for the form.

The Database
The actual database has many tables, but only three of them are used for the search. The three tables are called, series, articles and writers. The table notation for the series table is:

    series(articleID, title1, keywords, forward, directoryName, htmlfilename, …)

The table notation for the articles table is:

    articles(articleID, writerID, …)

The table notation for the writers table is:

    writers(writerID, firstname, ...)

In this application, the articles table serves to link the series table to the writers table (with the ON clause of the SQL select query). The use of the writers table is to obtain the first name of the writer. So the query will know the articleID from the series table. From there it will use the articleID to know the writerID from the articles table. From there it will use the writerID to know (obtain) the first name of the writer in the writers table. The data we want are in the series and writers table.

The Perl Script
The Perl script is at the server. It receives the search phrase and removes the non-keywords with regular expression techniques. It places the keywords of the phrase in an array still using regular expression techniques. The script goes on to form the MySQL Select query in Perl terms. Then it sends the query to the MySQL database.

The purpose of the query is to obtain a resultset for series.directoryName, series.htmlfilename, series.title1, series.forward, and writers.firstname. From the selected rows of these headings, the result list in HTML to be displayed to the user’s browser is produced still by the Perl script. After formation, this same Perl script sends the result HTML list to the browser.

The same Perl script is placed in the home directory and in each subdirectory belonging to a writer. This is because the value of the form action attribute above, does not have a preceding URL path.

The Perl file uses the Perl regular expression in its normal code, and the MySQL regular expression in the MySQL Query statement, written as a Perl string (Perl terms).

MySQL Regular Expression
Perl uses its regular expression techniques to place the keywords in a Perl array. In the query string formed by Perl, MySQL’s regular expression is used (typed) to search all the cells of the keywords column of the series table. Each cell of the keywords column has a number of keywords for the corresponding article; these cell keywords are separated by commas. For each cell, the query checks if all the keywords of the search phrase are found in the cell. If all the keywords are not found, the row and corresponding article are not selected.

This means that the more keywords in the search phrase the less the number of articles (article rows) selected. The query uses a combination of MySQL regular expression technique and SQL logic to achieve this – see details in the next part of the series.

Non-Keywords
The server Perl script (file) has to remove the non-keywords from the search phrase. So, what are non-keywords. In this section I give you the English non-keywords that I know, in the categories that I know: you have the preposition category, the grammatical article category and a few others.

Prepositions
Prepositions that I know for now are: about, along, among, before, after, by, for, in, from, on, of, since, to until, till, up, with, between. Some of these prepositions convey more meaning than the others, but not as much meaning as the other grammatical words (nouns, adjectives, etc.).

Grammatical Articles
The grammatical articles are: an, a, the.

Conjunctions
Conjunctions that I know are: while, whereas, since, as, for, therefore, but, and, or.

Pronouns
Pronouns that I know for now are: I, you, he, she, we, they, me, him, her, us, them, my, your, his, her, our, their, mine, yours, hers, ours, theirs, it, its.

Words of Quantity
Words of quantity that I know for now are: some, few, many, much, little

And so, the Perl script at the server removes all the prepositions, articles, conjunctions, pronouns and words of quantity from the search phrase. These non-keywords should not be searched for, because in a written article, they are many and do not indicate to the user what the written article is about.

Wow! At this point I guess you are hungry to see the code. You already have the code for the form at the client web page. You next have to see the Perl code and the database query. Rendezvous in the next part of the series.

Chrys

Broad Network

Related Articles

Overview of Search Within a Site using Perl and MySQL

Search Within a Site using Perl and MySQL – Part 1

Web Development with Perl and MySQL

Introduction

Related Links

Comments