Broad Network


Regular Expressions Metacharacters in MySQL

Regular Expressions in MySQL - Part 2

Foreword: In this part of the series we look at the uses of metacharacters in MySQL Regular Expressions.

By: Chrysanthus Date Published: 28 Aug 2015

Introduction

This is part 2 of my series, Regular Expressions in MySQL. I assume you have read the previous part of the series. This is the continuation. In this part of the series we look at the uses of metacharacters in MySQL Regular Expressions.

Database table for Illustration
We shall use the following database table called Pets, for illustration:

| name   | owner  | species | sex  | birth      | death      |
+--------+--------+---------+------+------------+------------+
| Buffy  | Harold | dog     | f    | 1999-05-13 | NULL       |
| Bowser | Diane  | dog     | m    | 1999-08-31 | 2005-07-29 |
| Fluffy | Harold  Taylor | cat     | f    | 2003-02-04 | NULL  |
| Claws    | Gwen  Jones | cat     | m    | 2004-03-17 | NULL       |
| Whistler | Gwen  | bird    | NULL | 2007-12-09 | NULL       |

You should test the SELECT statements given below with this table.

The ^ Character
The ^ character is used to match the beginning of the subject. The following SELECT statement will return the first and second rows, matching “Buffy” and “Bowser” in the first and second cells of the first column:

    SELECT * from Pets where name rLike "^B";

The pattern (regex) is "^B". In the subjects, “Buffy” and “Bowser”, ‘B’ is at the beginning. Note the position of ^ (at the beginning) in the pattern.

The $ Character
The $ character is used to match the end of the subject. The following SELECT statement will return the first and third rows, matching “Buffy” and “Fluffy” in the first and third cells of the first column:

    SELECT * from Pets where name rLike "fy$";

The pattern is "fy$". In the subjects, “Buffy” and “Fluffy”, ‘fy’ is at the end. Note the position of $ (at the end) in the pattern.

The . Character
The dot character is used to match any character in the subject. It would also match the carriage return or the newline character. The following SELECT statement will return the fourth and fifth rows, matching “Gwen  Jones” and “Gwen” in the fourth and fifth cells of the second column:

    SELECT * from Pets where owner rLike "Gw.n";

The pattern is "Gw.n". In the subjects, “Gwen  Jones” and “Gwen”, ‘e’ corresponds to the dot in the pattern. So, any subject that has Gwen or Gwan or Gwsn or Gwin, etc is matched. In these words, the dot corresponds to ‘e’ or ‘a’ or ‘s’ or ‘i’ respectively. The dot corresponds to any character, in a particular position, in a pattern.

The subject, “Gwen  Jones” has been matched because the statement looks for the word, "Gw.n" within the subject that might be a long string. Once the statement sees the pattern within a subject (string), no matter how long the subject is, it matches it.

The x* Construct
Here, ‘x’ is a variable (place holder) for any character. The * just after ‘x’ means, match any sequence (consecutive) of zero or more ‘x’ characters. The pattern, "se*n" for example, would match “seen”; it would match “sen”; and it would match “sn”. That is, “e*” would match zero or more ‘e’ in sequence. The following SELECT statement will return the first and third rows, matching “Buffy” and “Fluffy” in the first and third cells of the first column:

    SELECT * from Pets where name rLike ".*f*y";

The pattern is ".*f*y" . In the subjects, “Buffy” and “Fluffy”, “.*” corresponds to “Bu” and “Flu” in “Buffy” and “Fluffy” respectively. “.*” means match any character, zero or more times. Still in the subjects, “Buffy” and “Fluffy”, “f*y” corresponds to “ffy” in “Buffy” and “Fluffy”. “f*” means match ‘f’ zero or more times.

The x+ Construct
Here, ‘x’ is a variable for any character. The + just after ‘x’ means match any sequence (consecutive) of one or more ‘x’ characters; as opposed to * which matches zero or more characters. The pattern, "se+n" , for example, would match “seen”; it would match “sen”; but it would not match “sn”, which does not have at least one ‘e’. The following SELECT statement will return the fifth row, matching “bird” in the fifth cell of the third column:

    SELECT * from Pets where species rLike "b.+d";

The pattern is "b.+d" . In the subject, “bird”, “ir” are any two characters in sequence matched by “.+”. “.+” matches any character (not necessarily the same character) that occurs in sequence one or more times. If you want a particular character in sequence, then type the character in place of the dot, as in “e+” against “seen”.

The x? Construct
Here, ‘x’ is a variable for any character. The ? just after ‘x’ means match either zero or one ‘x’ character. The following SELECT statement will return the third, fourth and fifth rows, matching “Fluffy”, “Claws”, and “Whistler” in the third, fourth and fifth cells of the first column:

    SELECT * from Pets where name rLike "l";

The pattern is "l". Each of the subjects, “Fluffy”, “Claws”, and “Whistler” has ‘l’. So matching occurs. Remember, once a pattern is found in a subject, matching occurs. Now the following SELECT statement will select all the five rows of the table, matching the first, second, third, fourth and fifth cells of the first column:

    SELECT * from Pets where species rLike "l?";

The pattern is "l?". The subjects are, “Buffy”, “Bowser”, “Fluffy”, “Claws” and “Whistler”. Each of these subjects has zero or one ‘l’; and so each is matched and so all five rows are selected.

The | Character
The | character means OR. It means match either what is on the left of | or what is on the right. The following SELECT statement will select the rows, whose cell in the second column has either “Harold” or “Diane”.

    SELECT * from Pets where owner rLike "Harold|Diane";

The pattern is "Harold|Diane". Note that the statement returns 3 rows. The three subjects are: “Harold”, “Diane” and “Harold  Taylor” found in the cells of the first 3 rows, in the second column. Each of the subjects has “Harold” or “Diane”. The third subject has more text than just “Harold”, but it still has “Harold” or “Diane”.

The alternative words (characters) may exist only in one subject or in different subjects. Let us look at another example. The following SELECT statement will select the rows, whose cell of the second column has either “Gwen” or “Jones”.

    SELECT * from Pets where owner rLike "Gwen|Jones";

The pattern is "Gwen|Jones". The fourth and fifth rows were selected. The subjects are, “Gwen  Jones” and “Gwen”. Note that “Gwen” is found in either of the subjects and it is enough for the two subjects to be selected. “Jones” is found only in “Gwen  Jones” and it is enough for “Gwen  Jones” to be selected. If the fifth row did not exist, only the fourth row with “Gwen  Jones” would have been selected; that is, “Gwen” or “Jones” would have selected  “Gwen  Jones”. So the | (OR) character means either words can be chosen from the same subject or different subjects.

You have seen MySQL regular expressions metacharacters and simple pattern constructs in this part of the series Let us stop here and continue in the next part with more pattern constructs.

Chrys

Related Links

Implementing Database in MySQL
Programming in MySQL
Backup Basics in MySQL
MySQL Access Privileges
Regular Expressions in MySQL
Date and Time in MySQL
Event in MySQL
MySQL Transaction
PurePerl MySQL API Prepared Statements
More Related Links
PurePerl MySQL Command Line Tool
Major in Website Design
Perl Course - Optimized
Web Development Course

BACK NEXT

Comments

Become the Writer's Fan
Send the Writer a Message