String Pattern Matching

Applies to: ✅ Data 360 SQL ✅ Tableau Hyper API

Operators and functions for matching strings against patterns. Use pattern matching to:

  • Search with wildcards using LIKE for simple pattern matching with % and _.
  • Match complex patterns using regular expressions for advanced text processing.
  • Extract and replace substrings based on regex patterns.
  • Count occurrences of patterns within strings.
  • LIKE vs regex: Use LIKE for simple wildcard matching. Use regular expressions for complex patterns like repetition, alternation, and character classes.
  • Case sensitivity: Both LIKE and regex operators have case-insensitive variants (ILIKE, ~*).
  • Full string matching: LIKE matches the entire string by default. Regular expressions match anywhere within the string unless anchored with ^ and $.

Simple pattern matching with wildcards:

  • LIKE / ILIKE - Match patterns using % (any sequence) and _ (single character) wildcards.

Match strings against regular expression patterns:

Functions for searching, extracting, and manipulating text with regular expressions:

Most characters match themselves. The following metacharacters have special meaning:

MetacharacterDescription
.Matches any single character (except newline).
|Matches expressions to either side of it.
\Match a metacharacter with this escape character

Quantifiers specify how many times the preceding pattern must occur to match. By default, quantifiers are greedy, meaning they try to match as much text as possible.

QuantifierDescription
*Zero or more
+One or more
?Zero or one
{n}Exactly n times
{n,}At least n times
{n,m}Between n and m times

Add ? after a quantifier for lazy matching: *?, +?, ??, {n,}?, {n,m}?. Lazy matching matches the least number of characters possible. For example, in the string <b>bold</b>, the greedy pattern <.*> matches the entire string. Whereas, the lazy pattern <.*?> only matches <b>.

The maximum number of repetitions for {n}, {n,}, {n,m} is 1000.

Groups ( (...) ) combine characters and metacharacters into a single unit. You can use them to apply quantifiers and flags to sub-patterns. Use (?flags:pattern) syntax to apply flags to a group.

FlagDescription
iCase-insensitive matching.
cCase-sensitive matching.
qTreats the pattern as a literal string.
gTests against all characters in the string, not just the first match.

For example, (?i:abc) sets the case-insensitive flag, so it matches abc, ABC, aBc, and more.

Character classes are a shorthand for common group patterns.

The most general form of a character class is [ ], matching any character contained in the square brackets. For example, you can abbreviate the pattern (a|e|i|o|u) to [aeiou]. Contiguous ranges of characters can be shortened by using a hyphen. For example, [a-e] is equivalent to [abcde]. The range notation disregards locale-specific collation.

You can negate character classes by placing a caret (^) after the opening bracket. For example, [^aeiou] matches any character that is not a vowel.

Most metacharacters have no special meaning inside a character class (For example, [.] matches a literal dot). The exceptions are \\, ^ (when at the start for negation), - (for ranges), and ]. To include a literal ] in a character class, put it first (for example, []abc]) or escape it [ab\\]c]. To include a literal -, put it first or last, or escape it.

PatternDescription
[[:alpha:]]alphabetic characters ([A-Za-z])
[[:upper:]]upper case letters ([A-Z])
[[:lower:]]lower case letters ([a-z])
[[:digit:]], \ddigits ([0-9])
[[:xdigit:]]hexadecimal digits ([0-9A-Fa-f])
\Dnon-digits ([^0-9])
[[:alnum:]]alphanumeric characters ([0-9A-Za-z])
[[:word:]], \wword characters ([0-9A-Za-z_])
\Wnon-word characters ([^0-9a-za-z_])
[[:space:]], \swhitespace ([\t\n\f\r ])
\Snon-whitespace ([^ \t\n\f\r])
[[:blank:]]blankspace ([\t ])
[[:punct:]]punctuation ([!-/:-@[\-\`{-~;])

Anchors match a position rather than a character. These patterns can be used to anchor a match to a specific position in a string. For example, you can ensure that a given string fully matches a pattern by surrounding it with ^ and $ anchors.

AnchorDescription
^Start of string (or line, if m flag is set)
\AStart of string
$End of string (or line, if m flag is set)
\zEnd of string
\bASCII word boundary (\w on one side; \W, \A, or \z on the other)
\BNot an ASCII word boundary

Find products starting with a specific prefix.

Search regardless of case.

Extract phone numbers from text.

Check whether values match an expected format.

Remove unwanted characters.