Regular Expressions in JavaScript

What are Regular Expressions?

What are Regular Expressions?

Intro

Regular Expression is a string pattern, which can match or not other stings
You can think of it a as a kind of search mechanism.
Regular Expression synonyms:
Regex, RegEx, RegExp

Example


			var userEmail = "prefix@domain.com";

			//the RegEx to find if the userEmail contains '@' symbol:
			var re = /@/;

			// do the test:
			if ( re.test(userEmail) ){
				console.log(`Match`);
			}else{
				console.log(`No match!`);
			}
		

The Language

You can think of Regular Expressions as a separate language, with its own rules and specs.
In fact, the Regular Expressions are coming from the regular language defined by Kleene in the early 1950s
Nowadays, almost all programming languages implements the concept of Regex.

The Grammar

A regex grammar includes 2 types of symbols:
Regular symbols: they are matched literally on the matching string
Meta-characters: they have special meaning and gives the power of regex

Example


			var strings = [
				'alabala',
				' alabala',
				'Astronomy',
				'the apple'
			];
			var re = /^a/;

			strings.forEach((str)=>
				re.test(str) ?
					console.log(str+' -> match!') :
					console.log(str+'  -> NO match!')
			)
		
The regex /^a/ matches each string starting with 'a'
the a is a regular symbol
the ^ is a special symbol

Basic Regex Syntax

Basic Regex Syntax

Special Characters

Next characters has special meaning in Regex:

^ $ \ . * + ? ( ) [ ] { } |

They can be combined with ordinary characters to change their meaning too

If we want to match literally a special character we have to escape it with backslash '\'

Modifiers/Flags

They reflects how the regular expression is executed.

ModifierDescription
icase-insensitive matching
gglobal match (find all matches rather than stopping after the first match)
mmultiline matching

Modifiers/Flags example


			var matched, str = `alAbAla`;

			matched = str.match(/a/); // no flags
			console.log(`matched: ${matched}`);
			// matched: a (the first one)

			matched = str.match(/a/g); // g flag added
			console.log(`matched: ${matched}`);
			//matched: a,a

			matched = str.match(/a/gi); // g and i flags
			console.log(`matched: ${matched}`);
			//matched: a,A,A,a
		

More on Modifiers/Flags

more on Flags at MDN

Character Sets/Character Classes

Character Sets/Character Classes

Character Sets

The square brackets are used to define a character set. Like: [abc].
Symbols inside brackets are the elements of set.
The hyphen (-), when it is between 2 symbols, has special meaning inside the character class - it defines a range. Like: [0-9]. If it is in the end, it is considered as a hyphen.
The character set itself match only one symbol - one which is defined in set.

Character Sets

Character setDescription
[abc]Match any one of the symbols listed ('a' or 'b' or 'c')
[a-z]Match any symbol, from 'a' till 'z' (i.e. any lower Latin letter)
[^abc]Match any symbol, except 'a or 'b' or 'c' (i.e. the ^ negates the characters in the set)

Character Sets examples


			// match single vocals
			matched = "asteroid".match(/[aeiouy]/g);
			console.log(`matched: ${matched}`);
			// matched: a,e,o,i

			// match any consecutive vocals
			matched = "asteroid".match(/[aeiouy]+/g);
			console.log(`matched: ${matched}`);
			// matched: a,e,oi

			// match bg mobile phone numbers
			matched = "+359888123456".match(/\+3598[7-9][0-9]{7}/g);
			console.log(`matched: ${matched}`);
			// matched: +359888123456
		

Character classes

Character classes can be regarded as shorthands for some of the most used character sets. They work only on ASCII symbols.

Character classes

Char classDescription
.Match any character, except newline/line terminator.
\wMatch word character
(a character from a-z, A-Z, 0-9, including the _ (underscore) character.)
\dMatch any Arabic digit ( from 0 to 9)
\sMatch any whitespace character(space, tab, form feed, line ending, etc.)
Note that the concepts character set and character classes are often used as synonyms.
Any character class can be represented by a character set!

Character classes example


			// match bg mobile phone numbers
			matched = "+359888123456".match(/\+3598[7-9]\d{7}/g);
			console.log(`matched: ${matched}`);
			// matched: +359888123456
		

Character classes example


			var re = /[a-z]\w+/;
			var strings = [
				'petrov42',
				'42petrov',
				'ivan_pterov',
			]
			strings.forEach(str=>console.log(`${str.match(re)} matched in ${str}:`));

			// petrov42 matched in petrov42:
			// petrov matched in 42petrov:
			// ivan_pterov matched in ivan_pterov:
		

More on Character Classes

More Character classes on MDN

Quantifiers

Quantifiers

Quantifiers

QuantifierDescription
r *r match 0 or more times
r +r match 1 or more times
r ?r match 0 or 1time
r {n}r match exactly n times
r {n,m}r match between n and m times (n, m are positive)

r can be any regex!

Quantifiers (greedy and non-greedy match)

The quantifiers are greedy, meaning they will match the maximum part of the string they can:

				var matched, str = `ala bala`;

				matched = str.match(/a.*a/);
				console.log(`matched: ${matched}`); //matched: ala bala
			

Quantifiers (greedy and non-greedy match)

We can make them non-greedy, if we suffixed them with '?'

				var matched, str = `ala bala`;

				matched = str.match(/a.*?a/);
				console.log(`matched: ${matched}`); //matched: ala
			

Quantifiers example


			matched = "ala aa bala".match(/a.?a/g);
			console.log(`matched: ${matched}`);
			// matched: ala,aa,ala

			matched = "ala aa bala".match(/a.{3,5}a/g);
			console.log(`matched: ${matched}`);
			// matched: ala aa

			matched = "ala aa bala".match(/a.{3,}a/g);
			console.log(`matched: ${matched}`);
			// matched: ala aa bala

			matched = "ala aa bala".match(/a.{3,}?a/g);
			console.log(`matched: ${matched}`);
			// matched: ala a,a bala
		

Quantifiers

more on Quantifiers at MDN

Anchors and Boundaries

Anchors and Boundaries

They specify a position in the string where a match should occurs.
They are zero-width, i.e.when matched they do NOT consume characters from the string.
AnchorDescription
^Matches the beginning of the string (or the line, if m flag is used)
$Matches the end of the string (or the line, if m flag is used)
\bMatches on word boundaries, i.e. between word(\w) and non-word(\W) characters.
Note that the start and end of string are considered as non-word characters.

Example


			var re = /\b/g;
			var strings = [
				'',
				'a',
				'@',
				'aa',
				'a!',
				'a,a',
			]

			strings.forEach(str=>{
				var res = str.match(re);
				res && console.log(`${res.length} matches in '${str}'`)
			});
			// 2 matches in 'a'
			// 2 matches in 'aa'
			// 2 matches in 'a!'
			// 4 matches in 'a,a'
		

Example


			var re = /^a\w+\a$/g;
			var strings = [
				'ana',
				'ana bel',
			]
			strings.forEach(str=>{
				var res = str.match(re);
				res && console.log(`${res.length} matches in '${str}'`)
			});
			// 1 matches in 'ana'
		

Example


			var re = /\b[\w-]+\b/gi;
			var strings = [
				'one two three four, five, six. Seven!',
				'one-two,three!',
			];
			strings.forEach(str=>{
				var res = str.match(re);
				res && console.log(`${res.length} matches in '${str}'`)
			});
			// 7 matches in 'one two three four, five, six. Seven!'
			// 2 matches in 'one-two,three!'
		

More on Bounderies

Alternation

Alternation

With alternation we can match one or another regexp!
AlternationDescription
r1|r2Matches if r1 OR r2 is matched

Alternation example


			// NB: this is not example of good practice for grouping regex. Why? => check next slides
			var re = /\b(straw|rasp)?berries/;
			var strings = [
				'Icecream with strawberries? Yes!',
				'Icecream with blueberries? No!',
				'Icecream with raspberries? Yes!',
				'Icecream with berries? Yes!',
			]

			strings.forEach(str=> str.match(re) ?
				console.log(`${str} YES! YES!`) : console.log(`${str} NO! NO!`)
			)
			// Icecream with strawberries? Yes! YES! YES!
			// Icecream with blueberries? No! NO! NO!
			// Icecream with raspberries? Yes! YES! YES!
			// Icecream with berries? Yes! YES! YES!
		

More on Alternation

Alternation

Grouping and back references

Grouping and back references

Brackets: ( and ), play a dual role in regex!
They can be used for grouping regexes.Like:
/(r1|r2)r3/ => match r1r3 OR r2r3, but not r1r2r3
Or they can be used to capture (remember) the matched part of the string. Like:
/(r1)r2/ => match r1r2 and capture the part of the string that matched r1
If you just want to group regexes, without capturing the match, you should explicitly state that by:
(?:r1|r2) => match r1 or r2 but do not capture the match
NB! Capturing is slow and memory consuming! If you need the parenthesis just for grouping- always use the ?: prefix.

Grouping regexes example


			var re = /\b(?:straw|rasp)?berries/;
			var strings = [
				'Icecream with strawberries?',
				'Icecream with blueberries?',
				'Icecream with raspberries?',
				'Icecream with strawraspberries?',
				'Icecream with berries?',
			];
			strings.forEach(str=>str.match(re) ?
				console.log(`${str} YES!`) : console.log(`${str} NO!`)
			);
			// Icecream with strawberries? YES!
			// Icecream with blueberries? NO!
			// Icecream with raspberries? YES!
			// Icecream with strawraspberries? NO!
			// Icecream with berries? YES!
		

Grouping regexes: to group or not to group?


			var regexes = [
				// task: match only 'strawberries' or 'raspberries':
				/\bstraw|rasp{1}berries/,         // not what we want
				/(?:\bstraw)|(?:rasp{1}berries)/, // the same as above!!!
				/\b(?:straw|rasp){1}berries/,     // That's it!
			];
			var strings = [
				'Icecream with strawberries?',
				'Icecream with raspberries?',
				'Icecream with straw?',
				'Icecream with whateverraspberries?',
			];

			regexes.forEach(re=>{
				console.log(`\nMatched with: ${re}`);
				strings.forEach(str=>str.match(re) ?
					console.log(`${str} YES!`) : console.log(`${str} NO!`)
				)
			});
			// Matched with: /\bstraw|rasp{1}berries/
			// Icecream with strawberries? YES!
			// Icecream with raspberries? YES!
			// Icecream with straw? YES!
			// Icecream with whateverraspberries? YES!

			// Matched with: /(?:\bstraw)|(?:rasp{1}berries)/
			// Icecream with strawberries? YES!
			// Icecream with raspberries? YES!
			// Icecream with straw? YES!
			// Icecream with whateverraspberries? YES!

			// Matched with: /\b(?:straw|rasp){1}berries/
			// Icecream with strawberries? YES!
			// Icecream with raspberries? YES!
			// Icecream with straw? NO!
			// Icecream with whateverraspberries? NO!
		

More on Grouping and back references

Grouping and back references

Assertions

Assertions

Gives the possibility to match a regex only if it is followed or not by something. I.e. we can make lookahead or lookbehind!

Assertions on MDN

Assertions - use case


			const passwords = [
				'alabala', 	// false
				'alaba#a', 	// false
				'alaba9a', 	// false
				'a1aba#9', 	// true
				'a1@', 		// false
			];

			const re = /^(?=.*[a-zA-Z])(?=.*\d)(?=.*[!@#$%^&*]).{6,}$/;

			passwords.forEach(str=>{
				let matched = re.test(str);
				console.log(`'${str}' => ${matched}`)
			});
		

Create Regex in JavaScript

Create Regex in JavaScript

How to create the RegExp object

Each Regular Expression is an RegExp object
2 ways to create a RegExp object:
By RegExp literal:

				var myNameRE = /iva/gi;
			
By RegExp Constructor:

				var myNameRE = new RegExp('iva','gi');
			

RegExp literal

Place the pattern between 2 slashes
Place the modifiers after the second slash

			var str = 'Maria, ivan, eli, zdravka, stoyan';

			// match the first word which ends on 'a':
			var re1 = /\b\w+a\b/;

			// match all words which end on 'a':
			var re2 = /\b\w+a\b/g;

			console.log( str.match(re1).toString() ); // Maria
			console.log( str.match(re2).toString() ); // Maria,zdravka
		

RegExp Constructor


			var str = 'Maria, ivan, eli, zdravka, stoyan';

			// match the first word which ends on 'a':
			var re1 = new RegExp('\\b\\w+a\\b');

			// match all words which end on 'a':
			var re2 = new RegExp('\\b\\w+a\\b', 'g');

			console.log( str.match(re1).toString() ); // Maria
			console.log( str.match(re2).toString() ); // Maria,zdravka
		

As the Regex is given as string, we have to escape the backslash. I.e: \b => \\b

RegEx Literal vs Constructor

Literal:
Compiles only once - when evaluated.
Use literal notation when you know the regular expression in advance.
Constructor
Compiles dynamically, i.e. each time the regex obj is used.
Use the constructor function when you don't know the pattern in advance and will receive it from variable.

Regex in a loop


			var words = ["ябълка", "ария", "ягода", "ясен"];

			// Match string starting with 'я' and ending with 'а':
			//  RegExp literals:
			var re1 = /^я.*а$/i;
			words.forEach(word=>{
				// re1 is compiled only once !!!:
				let re1Matched = word.match(re1);
				re1Matched && console.log('re1: ' + re1Matched);
			});

			// RegExp Constructor
			var re2 = new RegExp('^я.*а$','i');
			words.forEach(word=>{
				// re2 is re-compiled in each iteration:
				let re2Matched = word.match(re2);
				re2Matched && console.log('re2: ' + re2Matched);
			})
		

Array of RegExp


			var commentsREs = [/\/\/.*/gm, /\/\*[^]+?\*\//g];
			var str = `
				// single line comment 1
				var x = 5;
				// single line comment 2
				var y = 10;
				/*this is multiline
				comment in JS */
				const pi = 3.14;
				for (let i = 0; i< x; i++) console.log(i);
			`;
			commentsREs.forEach( re=>{
				var matched = str.match(re);
				matched && matched.forEach(m=>
					console.log(m.toString())
				);
			});
		

Use Regex in String object methods

Use Regex in String object methods

match()
replace()
search()
split()

The match() String method: syntax


			str.match(regexp)
		
Argument:
A RegExp object, or a string which will be converted to RegExp pbject

The match() String method: Return value

If no match is found, the method returns null.
Without g modifier, match() returns an Array like object with next properties
Property/indexDescription
[0]The part of the string that mathched the Regex
[1]..[n]The captured groups matches, if any
indexThe string index, where the match starts. Strings indexes are 0-based!
inputThe original string

The match() String method: Return value

When g modifier is present, the result is an Array like object, containing all matched substrings.

The match() String method


			var str = 'abracadabra';

			var resultSimple = str.match(/r/);
			console.dir(resultSimple);

			var resultGlobal = str.match(/r/g);
			console.dir(resultGlobal);
		

The replace() String method


			// Goal: remove all vocal letters in a string

			// the input string:
			let str = 'The quick brown fox jumps over the lazy dog';

			// the regex:
			const re = /[aeiouy]/gi; // global and case-insensitive

			// replace all matches with empty string:
			let newStr = str.replace(re, '');

			// print output:
			console.log(`str: ${str}`);
			console.log(`newStr: ${newStr}`);

		

Reference: replace @mdn

The split() String method


			// Goal: split a string by any whitespace, or ',' or '.' sequences

			// the input string:
			let str = `word1, word2
			word3	word4. Word5`;

			// the regex: match any whitespace, or ',' or '.' sequences:
			const re = /[\s,.]+/;
			let words = str.split(re);

			// print the words array:
			console.log(words);
		

by RegExp object methods

exec()
test()

Regex UseCases

Regex UseCases

For "Hangman" game board

The code bellow is a very simple demo for using a regexp in hangman game.
Full implementation of the game can be checked here: hangman @github

			function guess(letter) {
				// find all matches (and their indexes) of letter in wordToGuess
				let matches = wordToGuess.matchAll(new RegExp(letter,"gi"));

				// replace each matched position in board array with the letter
				for (const match of matches) {
					gameBoard.splice(match.index,1,letter);
				}

				console.log(gameBoard);
			}

			const wordToGuess = "orinoko";

			// array of currently guessed letters. Each un-guessed letter is displayed as '_'
			let gameBoard = wordToGuess.replace(/\w/g,'_').split('');

			// Test
			guess('o');
			guess('m');
			guess('r');
			guess('n');
			guess('p');
			guess('k');


		

Useful Resources

Useful Resources

Sites:

regex101 Online Regex tester
JavaScript RegExp Reference on w3schools

YouTube

YouTube: Best of Fluent 2012: /Reg(exp){2}lained/: Demystifying Regular Expressions

HW

HW

TASK: replace_numbers

Given is the string: 'a1b2c3d'
Write a program which will replace each digit in string with '-'. I.e. the resulting string should be 'a-b-c-d'
Reference: replace @mdn

These slides are based on

customised version of

Hakimel's reveal.js

framework