Search like google

Oracle Database 23ai Fuzzy Matching and Similarity Searches

Search like Google in Oracle

On my blog post (Understanding pg_trgm in PostgreSQL) , I showed how you can make similarity searches in PostgreSQL using pg_trgm extension.

Oracle’s UTL_MATCH package includes functions for performing fuzzy matching between two strings, with support starting from version 11g. The 23ai release introduces several additional algorithms to complement the existing ones (Jaro-Winkler Distance and Levenshtein Distance). New algorithms include bigram, trigram, whole_word_match, and longest_common_substring.

First i will create the sample data for my tests.

Now, I will show you how to perform similarity searches in Oracle versions 11g and later. The results vary depending on the chosen algorithm and are case-sensitive. You can use the UPPER and LOWER functions to handle case sensitivity.

Oracle Utl_match usage
Oracle Utl_Match

The newly added algorithms on Oracle Database 23ai are:

LONGEST_COMMON_SUBSTRING finds the longest common substring between the two strings.

LEVENSHTEIN corresponds to UTL_MATCH.EDIT_DISTANCE or UTL_MATCH.EDIT_SIMILARITY and gives a measure of character edit distance or similarity.

DAMERAU_LEVENSHTEIN distance differs from the classical LEVENSHTEIN distance by including transpositions among its allowable operations in addition to the three classical single-character edit operations (insertions, deletions and substitutions).

JARO_WINKLER corresponds to UTL_MATCH.JARO_WINKLER (a percentage between 0-1) or UTL_MATCH.JARO_WINKLER_SIMILARITY (the same but scaled from 0-100).

BIGRAM and TRIGRAM are instances of the N-gram matching technique, which counts the number of common contiguous sub-strings (grams) between the two strings.

WHOLE_WORD_MATCH corresponds to Word Match Percentage or Count comparison in Oracle Enterprise Data Quality. It calculates the LEVENSHTEIN or edit distance of two phrases with words (instead of letters) as matching units.

With the Oracle 23ai release, you can take advantage of the new algorithms by using the newly added FUZZY_MATCH function. While the UTL_MATCH package is still available, the FUZZY_MATCH function offers a broader range of algorithms, providing greater flexibility and better alignment with developer needs.

Oracle 23ai Fuzzy Match usage
Oracle 23ai Fuzzy Match usage

Much more information available Data Quality Operators section of Oracle 23ai SQL Language Reference.

Hope it helps.


Discover More from Osman DİNÇ


Comments

Leave your comment