RegEx is used often when working with strings, paths, configurations etc...so here is a little breakdown of commonly used RegEx expressions. I will be adding examples I come across in my daily work and down below I'll be adding explanations on how to interpret them.

'/.*?\\.(test|spec)\\.js$'

Dot matches any single character except the newline character, by default.

If s flag ("dotAll") is true, it also matches newline characters. If we want to match a dot character itself, we need to escape it:

## RegEx examples:

'/.*?\\.(test|spec)\\.js$'

## RegEx Special Characters

**\d**= matches any single digit in most regex grammar styles and is equivalent to [0-9]## RegEx Expressions & Interpretation:

**.**Dot matches any single character except the newline character, by default.

If s flag ("dotAll") is true, it also matches newline characters. If we want to match a dot character itself, we need to escape it:

**\.*******

This quantifier (asterisk) matches the preceding expression 0 or more (unlimited) times, as many times as possible, giving back as needed (

**greedy**)

Example: Find any text between two digits OR a single digit:

"\\d(.*\\d)*"

In string LeadingText-1-TrailingText found pattern 1

In string LeadingText-12-TrailingText found pattern 12

In string LeadingText-1.2-TrailingText found pattern 1.2

In string LeadingText-11.2-TrailingText found pattern 11.2

In string LeadingText-1.22-TrailingText found pattern 1.22

In string LeadingText-11.22-TrailingText found pattern 11.22

In string LeadingText-1234-TrailingText found pattern 1234

**.***

Matches any character greedily - as many characters as possible.

Example:

1.*1 in 101000001 will match 101000001

**?**

Matches the preceding expression 0 or 1 time.

If used immediately after any of the quantifiers *, +, ?, or {}, makes the quantifier non-greedy (matching the fewest possible characters), as opposed to the default, which is greedy (matching as many characters as possible).

**.*?**

Matches any character in non-greedy mode - as little as enough to match the pattern.

Example:

1.*1 in 101000001 will match 101

What is the difference between .*? and .* regular expressions?

(this answer also contains nice explanation of backtracking and how non-greedy expression can return multiple matches within a string)

**+**

This quantifier matches the preceding expression 1 or more (unlimited) times, as many times as possible, giving back as needed (

**greedy**)

**\**

A backslash that precedes a non-special character indicates that the next character is special and is not to be interpreted literally.

A backslash that precedes a special character indicates that the next character is not special and should be interpreted literally (this is called

*escaping*).

Example: \. matches the character . literally (case sensitive)

The first backslash escapes the one after it, so the expression searches for a single literal backslash.

Caret.

So in short:

So ^[b-d]t$ means:

Carets in Regular Expressions

If a dollar sign ($) is at the end of the entire regular expression, it matches the end of a line.

If an entire regular expression is enclosed by a caret and dollar sign (^lorem ipusm$), it matches an entire line.

**\\**The first backslash escapes the one after it, so the expression searches for a single literal backslash.

**^**Caret.

- ^ means "not the following" when inside and at the start of [], so [^...].
- When it's inside [] but not at the start, it means the actual ^ character.
- When it's escaped (\^), it also means the actual ^ character.
- In all other cases it means start of the string / line (which one is language / setting dependent). If a caret (^) is at the beginning of the entire regular expression, it matches the beginning of a line.

So in short:

- [^abc] -> not a, b or c
- [ab^cd] -> a, b, ^ (character), c or d
- \^ -> a ^ character
- Anywhere else -> start of string / line.

So ^[b-d]t$ means:

- Start of line
- b/c/d character
- t character
- End of line

Carets in Regular Expressions

**$**

If a dollar sign ($) is at the end of the entire regular expression, it matches the end of a line.

If an entire regular expression is enclosed by a caret and dollar sign (^lorem ipusm$), it matches an entire line.

## Capturing Groups

Part of a pattern can be enclosed in parentheses (...). This is called a

**capturing group**.Multiple characters in that group are treated as a single unit that we want to match.

It allows to get a part of the match as a separate item in the result array

If we put a quantifier after the parentheses, it applies to the parentheses as a whole.

String: abababa

Goal: find all matches of sequence ab.

Result: There are 3 matches.

Regex: (ab)

String: ab123cd345ef785

Goal: find all sequences of numbers

Result: 123, 345, 785

Regex: (\d+)

String: abc345-1.23.456.7890+whatever.ext

Goal: extract only numbers which form a valid version number (greedy - M.m.r.b or M.m.r or M.m )

Result: 1.23.456.7890

Regex (\d+) returns 5 groups: 345, 1, 23, 456, 7890

Regex (\d+)\. returns all groups of numbers that are followed by dot. There are 3 such groups: 1, 23 and 456.

Let's look at some examples:

1.23

11.23

123.45

1.23.456

1.23.456.7890

We can see that all version numbers:

- start with a sequence of 1 or more digits which are followed by dot: \d+\.
- end with a sequence of 1 or more digits: \d+

So far we have: \d+\.\d+

Regex \d+\.\d+ returns 2 groups: 1.23 and 456.7890

Between these two sequences can be 0 or more (max 2 but let's ignore this) sequences of 1 or more digits that are followed by dot: \d+\.

This sequence is optional so

__let's put it in brackets that sequence to form a__and append * to it:**group**Regex \d+\.(\d+\.)*\d+ does match 1.23.456.7890 but as

**(...)**is**capturing group**it captures it and result is a single group: 456.Here we just want regex to match this group but not to capture it (not to return it in results). We want this group to be a

**non-capturing group**and there is a special syntax for it:**(?: ... )**.Regex \d+\.(?:\d+\.)*\d+ fully matches 1.23.456.7890

## No comments:

Post a Comment