How Does Spell Correction Work?
NoChannel Search offers two spell correction solutions, each with its unique advantages:
Option 1: Like a Built-in Dictionary (OpenSearchBased)
- This option acts like a helpful store employee, checking dictionaries and suggesting alternatives.
- It uses a built-in "dictionary" within your search engine to propose corrections for misspelled searches.
- This straightforward approach is effective for common typos.
Option 2: Smarter and Faster (NoChannel Search MLBased)
- Similar to a tech-savvy assistant, this option utilizes NoChannel Search’s artificial intelligence (AI) and machine learning (ML) technology.
- It tackles complex typos, understanding the intended search term even if the misspelling is a real word.
- This advanced approach ensures precise results, particularly for unusual typos.
Both options have their strengths:
- OpenSearchBased: Easier to set up and works well for most cases.
- MLBased: More accurate and handles even unusual typos, requiring a bit more technical work.
When a user searches for a term, NoChannel Search creates a set of modified words by making modifications. This process generates a list of candidate words based on modifications.
Example:
Imagine you search for "shoel" on a website. Since "shoel" isn't found in the site's specific dictionary (built from the content in product types and attributes), NoChannel Search uses a custom approach to suggest corrections:
- shoe: Remove an “l” is a common type Adding an "s" is a common typo.
- shoelace: Adding “a”, “c” and “e” is another possibility.
NoChannel Search doesn't rely on a traditional dictionary but builds a custom one from the website's data. This ensures corrections are relevant to the website's content. Additionally, context is considered through a process called n-gram modeling. This technique analyzes sequences of words (n-grams) to predict the most likely next word based on the surrounding text.
Here's a breakdown of how NoChannel Search suggests corrections:
- All enabled spell-check fields and their data are used to create a corpus, essentially a collection of text used for analysis.
- The corpus goes through preprocessing, which might involve removing duplicates and preparing the text for analysis.
- An n-gram model is built from the corpus. This model analyzes sequences of words (n-grams) to understand the relationships between them.
- Assuming the first character is correct, Spell Correction finds similar words based on the n-gram model.
- Each suggested correction receives a score based on its likelihood based on the n-gram model.
- NoChannel Search suggests the corrections with the highest scores, even if they might seem nonsensical in a general context.