Broad Network


Regular Expression Building in MySQL

Regular Expressions in MySQL - Part 4

Foreword: In this part of the series, we learn how to build a MySQL Regular Expression.

By: Chrysanthus Date Published: 28 Aug 2015

Introduction

This is part 4 of my series, Regular Expressions in MySQL. I assume you have read the previous parts of the series. This is the continuation. In this part of the series, we learn how to build a MySQL Regular Expression.

Steps required to build a Regex
Here, regex, means pattern. These are the steps required to build a regex:

- You specify the task in detail,
- You break down the problem into smaller parts,
- You translate the small parts into regexes,
- You combine the regexes,
- You optimize the final combined regexes.

Selecting Rows that have URLs
I will use one example to illustrate the above points. Let us build a pattern to match a Uniform Resource Locator (URL). A URL is the address you type in the address bar of a browser to display an Internet web page.

Specifying the Task in Detail
An Example of a URL is https://www.somewebsite.com/dir1/dir2/file.htm .

- A URL begins with a protocol text.
- This is followed by, ://.
- That is followed by an optional sub domain name and a dot .
- That is followed by the domain name.
- That is followed by an optional path.
- That is followed by an optional filename.

Breaking Down the Problem into Smaller Parts
- The protocol text is commonly, http or ftp,
- That is followed by an optional s,
- Then you have ://.
- The optional sub domain name that follows, is alphanumeric characters that can be from one character to any length; if present, it has to be followed by a dot.
- The domain name that follows consists of alphanumeric characters from one to any length, then a dot and then an extension of from 2 to 4 alphabetic characters.
- The optional path consists of a series of forward slash and alphanumeric characters of any length.
- This is followed by an optional filename of any length, a dot and an extension. The filename consists of alphanumeric characters of any length, a dot and an extension, which is typically, htm or html.

Translating into Regexes
The protocol and the optional s is typed as,

    (http)|(ftp)s?

The :// is typed as,

    \://

The sub domain name is typed as,

    ([[:alnum:]]+\.)?

The domain name is typed as,

    [[:alnum:]]+\.[[:alpha:]]{2,4}

The path is typed as,

    (/[[:alnum:]]+)*

The optional filename is typed as,

    ([[:alnum:]]+\.(htm)|(html))?

All the special characters and constructs above, have been taught in the previous parts of the series.

Combining the Regexes
Combining (joining) the regexes gives the pattern:

"(http)|(ftp)s?\://([[:alnum:]]+\.)?[[:alnum:]]+\.[[:alpha:]]{2,4}(/[[:alnum:]]+)*([[:alnum:]]+\.(htm)|(html))?"

Optimizing the Combined Regexes
Optimizing the combined regexes involves factorization. Factorization of a pattern is a topic that is not well developed. So, let us allow the regex (pattern) as it is above.

Illustration
Assume that you have a database table called, Credentials. Also assume that one of the columns of the table has the name URL. Assume that some cells of this column have URLs and others are empty. Under this situation, the following SELECT statement will return all the rows that have URLs:

    SELECT * from Pets where species rLike "(http)|(ftp)s?\://([[:alnum:]]+\.)?[[:alnum:]]+\.[[:alpha:]]{2,4}(/[[:alnum:]]+)*([[:alnum:]]+\.(htm)|(html))?";

Now, assume that some cells in the column have text without URLs and others have text with URLs. You can add ".*" in front and behind the above pattern to match only those cells that have URLs.

Summary of Regular Expressions in MySQL
Ordinarily, the SQL SELECT statement selects rows bases on the type of values in table cells. It is possible to go to a type of values and choose a sub set based on the nature of their text. This is where regular expressions come in. Regular expressions has 3 important aspects: the subject, the operator and the pattern. The subject is a string. The text cell values of a database table are subjects. The operator is, rLike or NOT rLike . The pattern is made up of ordinary characters, special characters and constructs combined in a special way. The pattern determines the criterion for selection. Regular expressions are used to improve on the selection criteria of the SELECT statement.

We have come to the end of the series. I hope you appreciated it.

Chrys

Related Links

Implementing Database in MySQL
Programming in MySQL
Backup Basics in MySQL
MySQL Access Privileges
Regular Expressions in MySQL
Date and Time in MySQL
Event in MySQL
MySQL Transaction
PurePerl MySQL API Prepared Statements
More Related Links
PurePerl MySQL Command Line Tool
Major in Website Design
Perl Course - Optimized
Web Development Course

BACK

Comments

Become the Writer's Fan
Send the Writer a Message