UpStart Commerce Search
Spell Correction

How Does Spell Correction Work?

3min

NoChannel Search offers two spell correction solutions, each with its unique advantages:

Option 1: Like a Built-in Dictionary (OpenSearchBased)

  • This option acts like a helpful store employee, checking dictionaries and suggesting alternatives.
  • It uses a built-in "dictionary" within your search engine to propose corrections for misspelled searches.
  • This straightforward approach is effective for common typos.

Option 2: Smarter and Faster (NoChannel Search MLBased)

  • Similar to a tech-savvy assistant, this option utilizes NoChannel Search’s artificial intelligence (AI) and machine learning (ML) technology.
  • It tackles complex typos, understanding the intended search term even if the misspelling is a real word.
  • This advanced approach ensures precise results, particularly for unusual typos.

Both options have their strengths:

  • OpenSearchBased: Easier to set up and works well for most cases.
  • MLBased: More accurate and handles even unusual typos, requiring a bit more technical work.

When a user searches for a term, NoChannel Search creates a set of modified words by making modifications. This process generates a list of candidate words based on modifications.

Example:

Imagine you search for "shoel" on a website. Since "shoel" isn't found in the site's specific dictionary (built from the content in product types and attributes), NoChannel Search uses a custom approach to suggest corrections:

  • shoe: Remove an “l” is a common type Adding an "s" is a common typo.
  • shoelace: Adding “a”, “c” and “e” is another possibility.

NoChannel Search doesn't rely on a traditional dictionary but builds a custom one from the website's data. This ensures corrections are relevant to the website's content. Additionally, context is considered through a process called n-gram modeling. This technique analyzes sequences of words (n-grams) to predict the most likely next word based on the surrounding text.

How Does It Work?

Here's a breakdown of how NoChannel Search suggests corrections:

  1. All enabled spell-check fields and their data are used to create a corpus, essentially a collection of text used for analysis.
  2. The corpus goes through preprocessing, which might involve removing duplicates and preparing the text for analysis.
  3. An n-gram model is built from the corpus. This model analyzes sequences of words (n-grams) to understand the relationships between them.
  4. Assuming the first character is correct, Spell Correction finds similar words based on the n-gram model.
  5.  Each suggested correction receives a score based on its likelihood based on the n-gram model.
  6. NoChannel Search suggests the corrections with the highest scores, even if they might seem nonsensical in a general context.