Introduction to regular expressions
Regular expressions exist to perform a search on a piece of text(usually a line). Unlike traditional searches, regular expressions - also known as regex - often include special pattern checks.
hello
This regex will match “hello”
The dot(.)
The dot is one of the most used special characters on a regex. A dot matches ANY letter, digit, white space, underscore or any other character you can think of(including the dot itself).
G-.ava
The above regex would match G-java, but also G-Java, G-Tava, G-%ava, G-.ava, G-?ava, G- ava, and many more.
The literal sign (\)
Imagine we wanted to match a dot.
.
wouldn’t do it, because it doesn’t match only the dot. The solution is using the literal sign(\)! The literal sign is used to escape special characters such as .
For example
\.
would match . The literal sign changes everything, including the literal sign.
\\
would match only \
Space&inc.
We might want to match the space sign, but also the tab and many more. To do that, we can use \s.
Hello\sworld!
would match Hello world! separated by a space, by a tab, or other separators.
The unlimited *
One of the most used signs of the regular expressions is the *. The * means repetition, and is assigned to the previous character. It matches the previous character 0 times, once, twice, 3 times, 4 times or anything like that.
.*
matches anything
The strict unlimited +
Suppose you wanted to rewrite the above regex, requiring at least one character:
..*
this can end up being VERY boring in some circumstances, so there is a shortcut. And that shortcut is the +.
.+
Both * and + can work with any character, even when it isn’t a dot(.)
For example,
aa*
is a synonym of
a+
,which would match any kind of text which only has “a”s on it, and which has at least one a.
The ? sign
The ? sign can have two meanings. One of those is when the previous character is * or +, and the other one is when it is not linked to a * or +.
I will start with the unlinked meaning.
cats?
would match both cat and cats. The unlinked meaning is optional
When it is after * or +, it has another meaning.
Think about this regular expression
<.+>
to match a HTML tag and this piece of text
ee<b><u>a</u></b>ee
Instead of matching only <b> or only <u>, it matches <b><u>a</u></b>!
That happens because the * and + try to match as much as they can. The ? offers a way out.
<.+?>
automatically fetches as little as possible. Which also means that
<input type="submit" value="Hi>There">
would match only <input type=”submit” value=”Hi>
These situations are a reason to be very careful when using regular expressions.
The end of this tutorial
There is much more about regular expressions, but this tutorial already told you enough - after all, this is an introductory tutorial.