String Pattern Matching
Applies to: ✅ Data Cloud SQL ✅ Tableau Hyper API
There are two approaches to pattern matching.
- SQL
LIKE
operator SIMILAR TO
operator
The LIKE
expression returns true if the <string>
matches the supplied <pattern>
. (As expected, the NOT LIKE
expression returns false if LIKE
returns true, and vice versa. An equivalent expression is NOT (string LIKE pattern)
.)
If <pattern>
doesn’t contain percent signs or underscores, then the pattern only represents the string itself. In that case, LIKE
acts like the equals operator. An underscore (_
) in <pattern>
matches any single character. A percent sign (%
) matches any sequence of zero or more characters.
Some examples:
LIKE
pattern matching always covers the entire string. To match a sequence anywhere within a string, start and end the pattern with a percent sign.
To match a literal underscore or percent sign without matching other characters, precede respective character in <pattern>
by the escape character. The default escape character is the backslash, but a different one can be selected by using the ESCAPE
clause. To match the escape character itself, write two escape characters.
It’s also possible to use no escape character by writing ESCAPE ''
, which disables the ability to escape characters.
The keyword ILIKE
can be used instead of LIKE
to make the match case-insensitive according to the active locale.
The operator ~~
is equivalent to LIKE
, and ~~*
corresponds to ILIKE
. There are also !~~
and !~~*
operators that represent NOT LIKE
and NOT ILIKE
, respectively.
The available operators for pattern matching that use POSIX regular expressions are:
Operator | Description | Example |
---|---|---|
~ | Matches regular expression, case sensitive | 'thomas' ~ '.*thomas.*' |
~* | Matches regular expression, case insensitive | 'thomas' ~* '.*Thomas.*' |
!~ | Doesn’t match regular expression, case sensitive | 'thomas' !~ '.*Thomas.*' |
!~* | Does not match regular expression, case insensitive | 'thomas' !~* '.*vadim.*' |
Regular expressions provide a more powerful means for pattern matching than the LIKE
operator. Many Unix tools such as egrep
, sed
, or awk
use a pattern-matching language that is similar to the one used here, which is briefly described in the Regular Expression Syntax section.
A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it’s a member of the regular set described by the regular expression. Pattern characters match string characters exactly unless they’re special characters in the regular expression language. A regular expression can match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string.
Some examples:
The regexp_replace
function provides substitution of new text for substrings that match POSIX regular expression patterns. It has the syntax regexp_replace(source, pattern, replacement[, flags ])
. The source
string is returned unchanged if there isn’t a match to the pattern. If there is a match, the source
string is returned with the replacement string substituted for the matching substring.
The replacement string can contain \N
, where N
is 1
through 9
, to show where the pattern is inserted into the source substring. Write \\
to put a literal backslash in the replacement text.
pattern
is searched from the beginning of the string. By default, only the first match of the pattern is replaced. If the g
flag is given, then all matches at or after the start position are replaced. The i
flag enables case-insensitive matching. Flags can be combined in a single string.
Some examples:
A regular expression is defined as one or more branches, separated by |
. It matches anything that matches one of the branches.
A branch is zero or more quantified atoms or constraints, concatenated. It matches a match for the first, followed by a match for the second, etc. An empty branch matches the empty string.
A quantified atom is an atom sometimes followed by a single quantifier. Without a quantifier, it matches a match for the atom. With a quantifier, it can match some number of matches of the atom. The tables show the available atoms and quantifiers.
Atoms:
Atom | Description |
---|---|
. | matches any single character |
[<chars>] | a bracket expression, matching any one of the <chars> |
<k> | (where <k> is a non-alphanumeric character) matches that character taken as an ordinary character, e.g., \\ matches a backslash character |
{ | when followed by a character other than a digit, matches the left-brace character { ; when followed by a digit, it is the beginning of a <bound> (see below) |
<x> | where <x> is a single character with no other significance, matches that character |
Quantifiers:
Quantifier | Matches |
---|---|
* | a sequence of 0 or more matches of the atom |
+ | a sequence of 1 or more matches of the atom |
? | a sequence of 0 or 1 matches of the atom |
{<m>} | a sequence of exactly <m> matches of the atom |
{<m>,} | a sequence of <m> or more matches of the atom |
{<m>,<n>} | a sequence of <m> through <n> (inclusive) matches of the atom; <m> cannot exceed <n> |
The forms using {<...>}
are known as bounds. The numbers <m>
and <n>
within a bound are unsigned decimal integers with allowed values from 0 to 255 inclusive.
A constraint matches an empty string, but matches only if specific conditions are met. A constraint can be used where an atom can be used, except a quantifier can’t follow it. The simple constraints are:
Constraint | Description |
---|---|
^ | matches at the beginning of the string |
$ | matches at the end of the string |