RegEx is used often when working with strings, paths, configurations etc...so here is a little breakdown of commonly used RegEx expressions. I will be adding examples I come across in my daily work and down below I'll be adding explanations on how to interpret them.
'/.*?\\.(test|spec)\\.js$'
RegEx examples:
'/.*?\\.(test|spec)\\.js$'
RegEx Special Characters (Metacharacters)
. (dot) = any character apart from new line
\d = matches any single digit in most regex grammar styles and is equivalent to [0-9]
\t = match tab
\s = any whitespace character (space/blank, tab \t , and newline \r or \n)
[ ] = Square brackets are characters that help define a character set, which is a set of characters we want to match.
[ ] = Square brackets are characters that help define a character set, which is a set of characters we want to match.
- [aeiou] matches all vowels.
Some characters have special meanings inside square brackets:
- caret (^) before any other character inside square brackets means we’re negating the characters that follow the caret. We’re telling the regex engine not to match those characters.
- [^aeiou] matches all characters apart from vowels.
- hyphen (-) between two characters inside square brackets means range. [a-z] means "match all characters between a and z, inclusive"
If we want to match square brackets, we need to escape them with backslash. \[|\] means "match [ or ]".
RegEx Expressions & Interpretation:
.
Dot matches any single character except the newline character, by default.
If s flag ("dotAll") is true, it also matches newline characters. If we want to match a dot character itself, we need to escape it: \.
*
This quantifier (asterisk) matches the preceding expression 0 or more (unlimited) times, as many times as possible, giving back as needed (greedy)
Example: Find any text between two digits OR a single digit:
"\\d(.*\\d)*"
In string LeadingText-1-TrailingText found pattern 1
In string LeadingText-12-TrailingText found pattern 12
In string LeadingText-1.2-TrailingText found pattern 1.2
In string LeadingText-11.2-TrailingText found pattern 11.2
In string LeadingText-1.22-TrailingText found pattern 1.22
In string LeadingText-11.22-TrailingText found pattern 11.22
In string LeadingText-1234-TrailingText found pattern 1234
.*
Matches any character greedily - as many characters as possible.
Example:
1.*1 in 101000001 will match 101000001
?
Matches the preceding expression 0 or 1 time.
If used immediately after any of the quantifiers *, +, ?, or {}, makes the quantifier non-greedy (matching the fewest possible characters), as opposed to the default, which is greedy (matching as many characters as possible).
.*?
Matches any character in non-greedy mode - as little as enough to match the pattern.
Example:
1.*1 in 101000001 will match 101
What is the difference between .*? and .* regular expressions?
(this answer also contains nice explanation of backtracking and how non-greedy expression can return multiple matches within a string)
+
This quantifier matches the preceding expression 1 or more (unlimited) times, as many times as possible, giving back as needed (greedy)
\
A backslash that precedes a non-special character indicates that the next character is special and is not to be interpreted literally.
A backslash that precedes a special character indicates that the next character is not special and should be interpreted literally (this is called escaping).
Example: \. matches the character . literally (case sensitive)
\\
The first backslash escapes the one after it, so the expression searches for a single literal backslash.
^
Caret.
Carets in Regular Expressions
If a dollar sign ($) is at the end of the entire regular expression, it matches the end of a line.
If an entire regular expression is enclosed by a caret and dollar sign (^lorem ipusm$), it matches an entire line.
\\
The first backslash escapes the one after it, so the expression searches for a single literal backslash.
^
Caret.
- ^ means "not the following" when inside and at the start of [], so [^...].
- When it's inside [] but not at the start, it means the actual ^ character.
- When it's escaped (\^), it also means the actual ^ character.
- In all other cases it means start of the string or line (which one is language / setting dependent). If a caret (^) is at the beginning of the entire regular expression, it matches the beginning of a line.
- [^abc] = not a, b or c
- [ab^cd] = a, b, ^ (character), c or d
- \^ = a ^ character
- Anywhere else -> start of string / line. For example:
- ^.*".ds-metrics-apm.* matches for example a string which starts with SPACE characters:
- ^[ \t]* = match all spaces and tabs at the beginning of line
- ^\s* = match all whitespace characters at the beginning of line
- ^\n = match all empty lines (which only contain \n character)
- ^[b-d]t$ means:
- Start of line
- b/c/d character
- t character
- End of line
Carets in Regular Expressions
$
If a dollar sign ($) is at the end of the entire regular expression, it matches the end of a line.
If an entire regular expression is enclosed by a caret and dollar sign (^lorem ipusm$), it matches an entire line.
Caret and dollar are known as anchors as they denote the beginning and the end of string. String which matches "^$" regex is an empty string.
Capturing Groups
Part of a pattern can be enclosed in parentheses (...). This is called a capturing group.
Multiple characters in that group are treated as a single unit that we want to match.
It allows to get a part of the match as a separate item in the result array.
If we put a quantifier after the parentheses, it applies to the parentheses as a whole.
String: abababa
Goal: find all matches of sequence ab.
Result: There are 3 matches.
Regex: (ab)
String: ab123cd345ef785
Goal: find all sequences of numbers
Result: 123, 345, 785
Regex: (\d+)
String: abc345-1.23.456.7890+whatever.ext
Goal: extract only numbers which form a valid version number (greedy - M.m.r.b or M.m.r or M.m )
Result: 1.23.456.7890
Regex (\d+) returns 5 groups: 345, 1, 23, 456, 7890
Regex (\d+)\. returns all groups of numbers that are followed by dot. There are 3 such groups: 1, 23 and 456.
Let's look at some examples:
1.23
11.23
123.45
1.23.456
1.23.456.7890
We can see that all version numbers:
- start with a sequence of 1 or more digits which are followed by dot: \d+\.
- end with a sequence of 1 or more digits: \d+
So far we have: \d+\.\d+
Regex \d+\.\d+ returns 2 groups: 1.23 and 456.7890
Between these two sequences can be 0 or more (max 2 but let's ignore this) sequences of 1 or more digits that are followed by dot: \d+\.
This sequence is optional so let's put it in brackets that sequence to form a group and append * to it:
Regex \d+\.(\d+\.)*\d+ does match 1.23.456.7890 but as (...) is capturing group it captures it and result is a single group: 456.
Here we just want regex to match this group but not to capture it (not to return it in results, not to be stored in the backreference - $1 will be empty). We want this group to be a non-capturing group and there is a special syntax for it: (?: ... ).
Regex \d+\.(?:\d+\.)*\d+ fully matches 1.23.456.7890
Capturing group can be used to match characters between delimiters but excluding delimiters.
E.g. extract file name from: https://raw.example.com/aws/path/to/karpenter.sh_nodeclaims.yaml
/[^/]*$ captures leading /, any character apart from / (0 or more times), before the end of the string:
/karpenter.sh_nodeclaims.yaml
If we use capturing group, we can capture anything between the last / and the end of the string:
/([^/]*)$ matches /karpenter.sh_nodeclaims.yaml and a capturing group karpenter.sh_nodeclaims.yaml. Capturing group match is the desired file name.
^(.*)$ <-- capturing group captures entire string; entire string will be in the backreference $1
Negative Lookahead
(?!pattern)
It matches a position in the text that is not immediately followed by the given subpattern. It is a zero-width assertion, meaning it only asserts a condition without consuming any characters from the input string.
Examples:
foo(?!bar)
Description: Matches foo only if it is not followed by bar
Example match: In foobar, no match; in foobaz, matches foo
a(?!b)
Description: Matches a when it is not followed by b
Example match: In ac, matches a; in ab, no match
^(?!.*Error).*
Description: Matches an entire line that does not contain the word Error
Example match: Commonly used in monitoring or filtering logs
(?!XYZ)
It asserts that at the current position in the string, the text must not be followed by the pattern XYZ. In other words, it prevents a match if the specified sequence appears next.
How to select all lines apart from those containing some word e.g. XYZ:
^((?!XYZ).)*$\n
No comments:
Post a Comment