Wednesday, 10 April 2019

Introduction to RegEx

RegEx is used often when working with strings, paths, configurations etc...so here is a little breakdown of commonly used RegEx expressions. I will be adding examples I come across in my daily work and down below I'll be adding explanations on how to interpret them.

RegEx examples:


'/.*?\\.(test|spec)\\.js$'


RegEx Special Characters


\d = matches any single digit in most regex grammar styles and is equivalent to [0-9]


RegEx Expressions & Interpretation:


.
Dot matches any single character except the newline character, by default.
If s flag ("dotAll") is true, it also matches newline characters.

*
Asterisk matches the preceding expression 0 or more times.

Example: Find any text between two digits OR a single digit:

"\\d(.*\\d)*"

In string LeadingText-1-TrailingText found pattern 1
In string LeadingText-12-TrailingText found pattern 12
In string LeadingText-1.2-TrailingText found pattern 1.2
In string LeadingText-11.2-TrailingText found pattern 11.2
In string LeadingText-1.22-TrailingText found pattern 1.22
In string LeadingText-11.22-TrailingText found pattern 11.22
In string LeadingText-1234-TrailingText found pattern 1234 

.*
Matches any character greedily - as many characters as possible.

Example:
1.*1 in 101000001 will match 101000001

?
Matches the preceding expression 0 or 1 time.
If used immediately after any of the quantifiers *, +, ?, or {}, makes the quantifier non-greedy (matching the fewest possible characters), as opposed to the default, which is greedy (matching as many characters as possible).

.*?
Matches any character in non-greedy mode - as little as enough to match the pattern.

Example:
1.*1 in 101000001 will match 101

What is the difference between .*? and .* regular expressions?
(this answer also contains nice explanation of backtracking and how non-greedy expression can return multiple matches within a string)


+
Matches the preceding expression 1 or more times.

\
A backslash that precedes a non-special character indicates that the next character is special and is not to be interpreted literally.
A backslash that precedes a special character indicates that the next character is not special and should be interpreted literally (this is called escaping).

\\
The first backslash escapes the one after it, so the expression searches for a single literal backslash.

^
Caret.

  • ^ means "not the following" when inside and at the start of [], so [^...].
  • When it's inside [] but not at the start, it means the actual ^ character.
  • When it's escaped (\^), it also means the actual ^ character.
  • In all other cases it means start of the string / line (which one is language / setting dependent).

So in short:

  • [^abc] -> not a, b or c
  • [ab^cd] -> a, b, ^ (character), c or d
  • \^ -> a ^ character
  • Anywhere else -> start of string / line.


So ^[b-d]t$ means:

  • Start of line
  • b/c/d character
  • t character
  • End of line

Carets in Regular Expressions

TBC...


References:


Regular Expressions (MDN)
Quick-Start: Regex Cheat Sheet

No comments: