String Pattern Matching
Applies to: ✅ Data 360 SQL ✅ Tableau Hyper API
Operators and functions for matching strings against patterns. Use pattern matching to:
- Search with wildcards using
LIKEfor simple pattern matching with%and_. - Match complex patterns using regular expressions for advanced text processing.
- Extract and replace substrings based on regex patterns.
- Count occurrences of patterns within strings.
- LIKE vs regex: Use
LIKEfor simple wildcard matching. Use regular expressions for complex patterns like repetition, alternation, and character classes. - Case sensitivity: Both
LIKEand regex operators have case-insensitive variants (ILIKE,~*). - Full string matching:
LIKEmatches the entire string by default. Regular expressions match anywhere within the string unless anchored with^and$.
Simple pattern matching with wildcards:
- LIKE / ILIKE - Match patterns using
%(any sequence) and_(single character) wildcards.
Match strings against regular expression patterns:
- Regular Expression Match Operators - The
~,~*,!~, and!~*operators.
Functions for searching, extracting, and manipulating text with regular expressions:
- REGEXP_LIKE - Check whether a string matches a pattern.
- REGEXP_REPLACE - Replace substrings matching a pattern.
- REGEXP_INSTR - Find the position of a pattern.
- REGEXP_SUBSTR - Extract the substring matching a pattern.
- REGEXP_COUNT - Count occurrences of a pattern.
- REGEXP_SPLIT_TO_ARRAY - Split a string into an array using a regex delimiter.
Most characters match themselves. The following metacharacters have special meaning:
| Metacharacter | Description |
|---|---|
. | Matches any single character (except newline). |
| | Matches expressions to either side of it. |
\ | Match a metacharacter with this escape character |
Quantifiers specify how many times the preceding pattern must occur to match. By default, quantifiers are greedy, meaning they try to match as much text as possible.
| Quantifier | Description |
|---|---|
* | Zero or more |
+ | One or more |
? | Zero or one |
{n} | Exactly n times |
{n,} | At least n times |
{n,m} | Between n and m times |
Add ? after a quantifier for lazy matching: *?, +?, ??, {n,}?, {n,m}?. Lazy matching matches the least number of characters possible. For example, in the string <b>bold</b>, the greedy pattern <.*> matches the entire string. Whereas, the lazy pattern <.*?> only matches <b>.
The maximum number of repetitions for {n}, {n,}, {n,m} is 1000.
Groups ( (...) ) combine characters and metacharacters into a single unit. You can use them to apply quantifiers and flags to sub-patterns. Use (?flags:pattern) syntax to apply flags to a group.
| Flag | Description |
|---|---|
i | Case-insensitive matching. |
c | Case-sensitive matching. |
q | Treats the pattern as a literal string. |
g | Tests against all characters in the string, not just the first match. |
For example, (?i:abc) sets the case-insensitive flag, so it matches abc, ABC, aBc, and more.
Character classes are a shorthand for common group patterns.
The most general form of a character class is [ ], matching any character contained in the square brackets. For example, you can abbreviate the pattern (a|e|i|o|u) to [aeiou]. Contiguous ranges of characters can be shortened by using a hyphen. For example, [a-e] is equivalent to [abcde]. The range notation disregards locale-specific collation.
You can negate character classes by placing a caret (^) after the opening bracket. For example, [^aeiou] matches any character that is not a vowel.
Most metacharacters have no special meaning inside a character class (For example, [.] matches a literal dot). The exceptions are \\, ^ (when at the start for negation), - (for ranges), and ]. To include a literal ] in a character class, put it first (for example, []abc]) or escape it [ab\\]c]. To include a literal -, put it first or last, or escape it.
| Pattern | Description |
|---|---|
[[:alpha:]] | alphabetic characters ([A-Za-z]) |
[[:upper:]] | upper case letters ([A-Z]) |
[[:lower:]] | lower case letters ([a-z]) |
[[:digit:]], \d | digits ([0-9]) |
[[:xdigit:]] | hexadecimal digits ([0-9A-Fa-f]) |
\D | non-digits ([^0-9]) |
[[:alnum:]] | alphanumeric characters ([0-9A-Za-z]) |
[[:word:]], \w | word characters ([0-9A-Za-z_]) |
\W | non-word characters ([^0-9a-za-z_]) |
[[:space:]], \s | whitespace ([\t\n\f\r ]) |
\S | non-whitespace ([^ \t\n\f\r]) |
[[:blank:]] | blankspace ([\t ]) |
[[:punct:]] | punctuation ([!-/:-@[\-\`{-~;]) |
Anchors match a position rather than a character. These patterns can be used to anchor a match to a specific position in a string. For example, you can ensure that a given string fully matches a pattern by surrounding it with ^ and $ anchors.
| Anchor | Description |
|---|---|
^ | Start of string (or line, if m flag is set) |
\A | Start of string |
$ | End of string (or line, if m flag is set) |
\z | End of string |
\b | ASCII word boundary (\w on one side; \W, \A, or \z on the other) |
\B | Not an ASCII word boundary |
Find products starting with a specific prefix.
Search regardless of case.
Extract phone numbers from text.
Check whether values match an expected format.
Remove unwanted characters.