N/Spacenext slide

Pprevious slide

OOverview

ctrl+left clickZoom element

Regular Expressions in JavaScript

Created for

Created by

Iva E. Popova

What are Regular Expressions?

Intro

Regular Expression is a string pattern, which can match or not other stings
You can think of it a as a kind of search mechanism.
Regular Expression synonyms:: Regex, RegEx, RegExp

Example


			var userEmail = "prefix@domain.com";

			//the RegEx to find if the userEmail contains '@' symbol:
			var re = /@/;

			// do the test:
			if ( re.test(userEmail) ){
				console.log(`Match`);
			}else{
				console.log(`No match!`);
			}

The Language

You can think of Regular Expressions as a separate language, with its own rules and specs.
In fact, the Regular Expressions are coming from the regular language defined by Kleene in the early 1950s
Nowadays, almost all programming languages implements the concept of Regex.

The Grammar

A regex grammar includes 2 types of symbols:: Regular symbols: they are matched literally on the matching string; Meta-characters: they have special meaning and gives the power of regex

Example


			var strings = [
				'alabala',
				' alabala',
				'Astronomy',
				'the apple'
			];
			var re = /^a/;

			strings.forEach((str)=>
				re.test(str) ?
					console.log(str+' -> match!') :
					console.log(str+'  -> NO match!')
			)

The regex /^a/ matches each string starting with 'a': the a is a regular symbol; the ^ is a special symbol

Basic Regex Syntax

Special Characters

Next characters has special meaning in Regex:

^ $ \ . * + ? ( ) [ ] { } |

They can be combined with ordinary characters to change their meaning too

If we want to match literally a special character we have to escape it with backslash '\'

Modifiers/Flags

They reflects how the regular expression is executed.

Modifier	Description
i	case-insensitive matching
g	global match (find all matches rather than stopping after the first match)
m	multiline matching

Modifiers/Flags example


			var matched, str = `alAbAla`;

			matched = str.match(/a/); // no flags
			console.log(`matched: ${matched}`);
			// matched: a (the first one)

			matched = str.match(/a/g); // g flag added
			console.log(`matched: ${matched}`);
			//matched: a,a

			matched = str.match(/a/gi); // g and i flags
			console.log(`matched: ${matched}`);
			//matched: a,A,A,a

More on Modifiers/Flags

Character Sets/Character Classes

Character Sets

The square brackets are used to define a character set. Like: [abc].
Symbols inside brackets are the elements of set.
The hyphen (-), when it is between 2 symbols, has special meaning inside the character class - it defines a range. Like: [0-9]. If it is in the end, it is considered as a hyphen.
The character set itself match only one symbol - one which is defined in set.

Character Sets

Character set	Description
`[abc]`	Match any one of the symbols listed ('a' or 'b' or 'c')
`[a-z]`	Match any symbol, from 'a' till 'z' (i.e. any lower Latin letter)
`[^abc]`	Match any symbol, except 'a or 'b' or 'c' (i.e. the `^` negates the characters in the set)

Character Sets examples


			// match single vocals
			matched = "asteroid".match(/[aeiouy]/g);
			console.log(`matched: ${matched}`);
			// matched: a,e,o,i

			// match any consecutive vocals
			matched = "asteroid".match(/[aeiouy]+/g);
			console.log(`matched: ${matched}`);
			// matched: a,e,oi

			// match bg mobile phone numbers
			matched = "+359888123456".match(/\+3598[7-9][0-9]{7}/g);
			console.log(`matched: ${matched}`);
			// matched: +359888123456

Character classes

Character classes can be regarded as shorthands for some of the most used character sets. They work only on ASCII symbols.

Character classes

Char class	Description
`.`	Match any character, except newline/line terminator.
`\w`	Match word character (a character from a-z, A-Z, 0-9, including the _ (underscore) character.)
`\d`	Match any Arabic digit ( from 0 to 9)
`\s`	Match any whitespace character(space, tab, form feed, line ending, etc.)

Note that the concepts character set and character classes are often used as synonyms.
Any character class can be represented by a character set!

Character classes example


			// match bg mobile phone numbers
			matched = "+359888123456".match(/\+3598[7-9]\d{7}/g);
			console.log(`matched: ${matched}`);
			// matched: +359888123456

Character classes example


			var re = /[a-z]\w+/;
			var strings = [
				'petrov42',
				'42petrov',
				'ivan_pterov',
			]
			strings.forEach(str=>console.log(`${str.match(re)} matched in ${str}:`));

			// petrov42 matched in petrov42:
			// petrov matched in 42petrov:
			// ivan_pterov matched in ivan_pterov:

More on Character Classes

More Character classes on MDN

Quantifiers

Quantifier	Description
r *	r match 0 or more times
r +	r match 1 or more times
r ?	r match 0 or 1time
r {n}	r match exactly n times
r {n,m}	r match between n and m times (n, m are positive)

r can be any regex!

Quantifiers (greedy and non-greedy match)

The quantifiers are greedy, meaning they will match the maximum part of the string they can:

Quantifiers (greedy and non-greedy match)

We can make them non-greedy, if we suffixed them with '?'

Quantifiers example


			matched = "ala aa bala".match(/a.?a/g);
			console.log(`matched: ${matched}`);
			// matched: ala,aa,ala

			matched = "ala aa bala".match(/a.{3,5}a/g);
			console.log(`matched: ${matched}`);
			// matched: ala aa

			matched = "ala aa bala".match(/a.{3,}a/g);
			console.log(`matched: ${matched}`);
			// matched: ala aa bala

			matched = "ala aa bala".match(/a.{3,}?a/g);
			console.log(`matched: ${matched}`);
			// matched: ala a,a bala

Quantifiers

Anchors and Boundaries

They specify a position in the string where a match should occurs.
They are zero-width, i.e.when matched they do NOT consume characters from the string.

Anchor	Description
^	Matches the beginning of the string (or the line, if m flag is used)
$	Matches the end of the string (or the line, if m flag is used)
\b	Matches on word boundaries, i.e. between word(\w) and non-word(\W) characters. Note that the start and end of string are considered as non-word characters.

Example


			var re = /\b/g;
			var strings = [
				'',
				'a',
				'@',
				'aa',
				'a!',
				'a,a',
			]

			strings.forEach(str=>{
				var res = str.match(re);
				res && console.log(`${res.length} matches in '${str}'`)
			});
			// 2 matches in 'a'
			// 2 matches in 'aa'
			// 2 matches in 'a!'
			// 4 matches in 'a,a'

Example


			var re = /^a\w+\a$/g;
			var strings = [
				'ana',
				'ana bel',
			]
			strings.forEach(str=>{
				var res = str.match(re);
				res && console.log(`${res.length} matches in '${str}'`)
			});
			// 1 matches in 'ana'

Example


			var re = /\b[\w-]+\b/gi;
			var strings = [
				'one two three four, five, six. Seven!',
				'one-two,three!',
			];
			strings.forEach(str=>{
				var res = str.match(re);
				res && console.log(`${res.length} matches in '${str}'`)
			});
			// 7 matches in 'one two three four, five, six. Seven!'
			// 2 matches in 'one-two,three!'

Alternation

With alternation we can match one or another regexp!

Alternation	Description
r1\|r2	Matches if r1 OR r2 is matched

Alternation example


			// NB: this is not example of good practice for grouping regex. Why? => check next slides
			var re = /\b(straw|rasp)?berries/;
			var strings = [
				'Icecream with strawberries? Yes!',
				'Icecream with blueberries? No!',
				'Icecream with raspberries? Yes!',
				'Icecream with berries? Yes!',
			]

			strings.forEach(str=> str.match(re) ?
				console.log(`${str} YES! YES!`) : console.log(`${str} NO! NO!`)
			)
			// Icecream with strawberries? Yes! YES! YES!
			// Icecream with blueberries? No! NO! NO!
			// Icecream with raspberries? Yes! YES! YES!
			// Icecream with berries? Yes! YES! YES!

More on Alternation

Alternation

Grouping and back references

Brackets: ( and ), play a dual role in regex!
They can be used for grouping regexes.Like:: /(r1|r2)r3/ => match r1r3 OR r2r3, but not r1r2r3
Or they can be used to capture (remember) the matched part of the string. Like:: /(r1)r2/ => match r1r2 and capture the part of the string that matched r1
If you just want to group regexes, without capturing the match, you should explicitly state that by:: (?:r1|r2) => match r1 or r2 but do not capture the match
NB! Capturing is slow and memory consuming! If you need the parenthesis just for grouping- always use the ?: prefix.

Grouping regexes example


			var re = /\b(?:straw|rasp)?berries/;
			var strings = [
				'Icecream with strawberries?',
				'Icecream with blueberries?',
				'Icecream with raspberries?',
				'Icecream with strawraspberries?',
				'Icecream with berries?',
			];
			strings.forEach(str=>str.match(re) ?
				console.log(`${str} YES!`) : console.log(`${str} NO!`)
			);
			// Icecream with strawberries? YES!
			// Icecream with blueberries? NO!
			// Icecream with raspberries? YES!
			// Icecream with strawraspberries? NO!
			// Icecream with berries? YES!

Grouping regexes: to group or not to group?


			var regexes = [
				// task: match only 'strawberries' or 'raspberries':
				/\bstraw|rasp{1}berries/,         // not what we want
				/(?:\bstraw)|(?:rasp{1}berries)/, // the same as above!!!
				/\b(?:straw|rasp){1}berries/,     // That's it!
			];
			var strings = [
				'Icecream with strawberries?',
				'Icecream with raspberries?',
				'Icecream with straw?',
				'Icecream with whateverraspberries?',
			];

			regexes.forEach(re=>{
				console.log(`\nMatched with: ${re}`);
				strings.forEach(str=>str.match(re) ?
					console.log(`${str} YES!`) : console.log(`${str} NO!`)
				)
			});
			// Matched with: /\bstraw|rasp{1}berries/
			// Icecream with strawberries? YES!
			// Icecream with raspberries? YES!
			// Icecream with straw? YES!
			// Icecream with whateverraspberries? YES!

			// Matched with: /(?:\bstraw)|(?:rasp{1}berries)/
			// Icecream with strawberries? YES!
			// Icecream with raspberries? YES!
			// Icecream with straw? YES!
			// Icecream with whateverraspberries? YES!

			// Matched with: /\b(?:straw|rasp){1}berries/
			// Icecream with strawberries? YES!
			// Icecream with raspberries? YES!
			// Icecream with straw? NO!
			// Icecream with whateverraspberries? NO!

More on Grouping and back references

Grouping and back references

Assertions

Gives the possibility to match a regex only if it is followed or not by something. I.e. we can make lookahead or lookbehind!

Assertions on MDN

Assertions - use case


			const passwords = [
				'alabala', 	// false
				'alaba#a', 	// false
				'alaba9a', 	// false
				'a1aba#9', 	// true
				'a1@', 		// false
			];

			const re = /^(?=.*[a-zA-Z])(?=.*\d)(?=.*[!@#$%^&*]).{6,}$/;

			passwords.forEach(str=>{
				let matched = re.test(str);
				console.log(`'${str}' => ${matched}`)
			});

Create Regex in JavaScript

How to create the RegExp object

Each Regular Expression is an RegExp object
2 ways to create a RegExp object:: By RegExp literal:; By RegExp Constructor:

RegExp literal

Place the pattern between 2 slashes
Place the modifiers after the second slash


			var str = 'Maria, ivan, eli, zdravka, stoyan';

			// match the first word which ends on 'a':
			var re1 = /\b\w+a\b/;

			// match all words which end on 'a':
			var re2 = /\b\w+a\b/g;

			console.log( str.match(re1).toString() ); // Maria
			console.log( str.match(re2).toString() ); // Maria,zdravka

RegExp Constructor


			var str = 'Maria, ivan, eli, zdravka, stoyan';

			// match the first word which ends on 'a':
			var re1 = new RegExp('\\b\\w+a\\b');

			// match all words which end on 'a':
			var re2 = new RegExp('\\b\\w+a\\b', 'g');

			console.log( str.match(re1).toString() ); // Maria
			console.log( str.match(re2).toString() ); // Maria,zdravka

As the Regex is given as string, we have to escape the backslash. I.e: \b => \\b

RegEx Literal vs Constructor

Literal:: Compiles only once - when evaluated.; Use literal notation when you know the regular expression in advance.
Constructor: Compiles dynamically, i.e. each time the regex obj is used.; Use the constructor function when you don't know the pattern in advance and will receive it from variable.

Regex in a loop


			var words = ["ябълка", "ария", "ягода", "ясен"];

			// Match string starting with 'я' and ending with 'а':
			//  RegExp literals:
			var re1 = /^я.*а$/i;
			words.forEach(word=>{
				// re1 is compiled only once !!!:
				let re1Matched = word.match(re1);
				re1Matched && console.log('re1: ' + re1Matched);
			});

			// RegExp Constructor
			var re2 = new RegExp('^я.*а$','i');
			words.forEach(word=>{
				// re2 is re-compiled in each iteration:
				let re2Matched = word.match(re2);
				re2Matched && console.log('re2: ' + re2Matched);
			})

Array of RegExp


			var commentsREs = [/\/\/.*/gm, /\/\*[^]+?\*\//g];
			var str = `
				// single line comment 1
				var x = 5;
				// single line comment 2
				var y = 10;
				/*this is multiline
				comment in JS */
				const pi = 3.14;
				for (let i = 0; i< x; i++) console.log(i);
			`;
			commentsREs.forEach( re=>{
				var matched = str.match(re);
				matched && matched.forEach(m=>
					console.log(m.toString())
				);
			});

Use Regex in `String` object methods

match()
replace()
search()
split()

The match() String method: syntax


			str.match(regexp)

Argument:: A RegExp object, or a string which will be converted to RegExp pbject

The match() String method: Return value

If no match is found, the method returns null.
Without g modifier, match() returns an Array like object with next properties

Property/index	Description
`[0]`	The part of the string that mathched the Regex
`[1]..[n]`	The captured groups matches, if any
`index`	The string index, where the match starts. Strings indexes are 0-based!
`input`	The original string

The match() String method: Return value

When g modifier is present, the result is an Array like object, containing all matched substrings.

The match() String method


			var str = 'abracadabra';

			var resultSimple = str.match(/r/);
			console.dir(resultSimple);

			var resultGlobal = str.match(/r/g);
			console.dir(resultGlobal);

The replace() String method


			// Goal: remove all vocal letters in a string

			// the input string:
			let str = 'The quick brown fox jumps over the lazy dog';

			// the regex:
			const re = /[aeiouy]/gi; // global and case-insensitive

			// replace all matches with empty string:
			let newStr = str.replace(re, '');

			// print output:
			console.log(`str: ${str}`);
			console.log(`newStr: ${newStr}`);

Reference: replace @mdn

The split() String method


			// Goal: split a string by any whitespace, or ',' or '.' sequences

			// the input string:
			let str = `word1, word2
			word3	word4. Word5`;

			// the regex: match any whitespace, or ',' or '.' sequences:
			const re = /[\s,.]+/;
			let words = str.split(re);

			// print the words array:
			console.log(words);

by `RegExp` object methods

exec()
test()

Regex UseCases

For "Hangman" game board

The code bellow is a very simple demo for using a regexp in hangman game.
Full implementation of the game can be checked here: hangman @github


			function guess(letter) {
				// find all matches (and their indexes) of letter in wordToGuess
				let matches = wordToGuess.matchAll(new RegExp(letter,"gi"));

				// replace each matched position in board array with the letter
				for (const match of matches) {
					gameBoard.splice(match.index,1,letter);
				}

				console.log(gameBoard);
			}

			const wordToGuess = "orinoko";

			// array of currently guessed letters. Each un-guessed letter is displayed as '_'
			let gameBoard = wordToGuess.replace(/\w/g,'_').split('');

			// Test
			guess('o');
			guess('m');
			guess('r');
			guess('n');
			guess('p');
			guess('k');

Useful Resources

Sites:

regex101 Online Regex tester
JavaScript RegExp Reference on w3schools

YouTube

YouTube: Best of Fluent 2012: /Reg(exp){2}lained/: Demystifying Regular Expressions

HW

TASK: replace_numbers

Given is the string: 'a1b2c3d'
Write a program which will replace each digit in string with '-'. I.e. the resulting string should be 'a-b-c-d'
Reference: replace @mdn

These slides are based on

customised version of

Hakimel's reveal.js

framework

Regular Expressions in JavaScript

What are Regular Expressions?

What are Regular Expressions?

Intro

Example

The Language

The Grammar

Example

Basic Regex Syntax

Basic Regex Syntax

Special Characters

Modifiers/Flags

Modifiers/Flags example

More on Modifiers/Flags

Character Sets/Character Classes

Character Sets/Character Classes

Character Sets

Character Sets

Character Sets examples

Character classes

Character classes

Character classes example

Character classes example

More on Character Classes

Quantifiers

Quantifiers

Quantifiers

Quantifiers (greedy and non-greedy match)

Quantifiers (greedy and non-greedy match)

Quantifiers example

Quantifiers

Anchors and Boundaries

Anchors and Boundaries

Example

Example

Example

More on Bounderies

Alternation

Alternation

Alternation example

More on Alternation

Grouping and back references

Grouping and back references

Grouping regexes example

Grouping regexes: to group or not to group?

More on Grouping and back references

Assertions

Assertions

Assertions - use case

Create Regex in JavaScript

Create Regex in JavaScript

How to create the RegExp object

RegExp literal

RegExp Constructor

RegEx Literal vs Constructor

Regex in a loop

Array of RegExp

Use Regex in String object methods

Use Regex in String object methods

The match() String method: syntax

The match() String method: Return value

The match() String method: Return value

The match() String method

The replace() String method

The split() String method

by RegExp object methods

Regex UseCases

Regex UseCases

For "Hangman" game board

Useful Resources

Useful Resources

Sites:

YouTube

HW

HW

TASK: replace_numbers

Use Regex in `String` object methods

Use Regex in `String` object methods

by `RegExp` object methods