N-gram Language Model Based Next Word Suggestion — Part 2
My previous article outlined N-gram language models and provided a simple approach to performing next-word suggestion. The approach is centred around the use of bigrams. This article outlines the use of a higher-order n-gram model to extract meaningful associations between words and phrases and provide context-appropriate next-word suggestions.
Let’s Put This Word Into Context
To recap the previously mentioned approach. Using a bigram model, it is possible to suggest the next words given one previous word. So for example, take the corpus:
“The Tortoise never stopped for a moment, right to the end of the course.”
Given the word ‘the’, the possible next words that are suggested are: ‘tortoise’, ‘end’, and ‘course’. Each with a probability of 0.33. Frankly, this suggestion is not very helpful. Furthermore, training on a larger corpus will lead to many more possible words after ‘the’, making the suggestion even less helpful. One possible idea to solve this problem is to use more than one previous word to suggest the next.
Providing More Context
Increasing the order of the n-gram language model makes it possible to extract meaningful associations between an ordered collection of words (more commonly known among humans as phrases). Let's take for instance a trigram model which requires being provided two previous words to make a suggestion. Given the words ‘to the’, the word that is suggested next is ‘end’. Likewise, given the words ‘of the’, the next word is ‘course’. That’s looking much better wouldn’t you say?
One obvious pitfall of increasing the order of the model is that it requires more previous words to make a suggestion. While this might not be a problem when suggesting the ending word of a common phrase. It becomes a challenge when suggesting words near the beginning of a sentence or when common phrases are slightly altered. Consider a 4-gram model trained on the above corpus. While it will have no problem suggesting that ‘end’ comes after the phrase ‘right to the’, it will yield no suggestions for phrases like ‘to the’ or ‘next to the’. In some use cases, this is perfectly acceptable but I would prefer the model to be a little more robust.
Combining the Best of Both Worlds
Low-order models suggest the association between words and can provide a next-word suggestion given only one or two previous words. As the order of the model increases, these models can suggest the association between entire phrases and a word. Combining these two yields a robust model that can provide suggestions given limited input while refining the suggestions given longer phrases.
Let’s Take a Look at Some Code
The first step is to calculate the n-grams for the input corpus. Instead of just creating single-order n-grams, the model will make use of 1..n order n-grams. For example, for an n=3 model, unigrams, bigrams and trigrams will need to be calculated. The following code snippet calculates the probabilities of 1..n-grams from the input tokens.
Next, a dictionary structure is created and populated that will serve as a means of looking up n-grams and their next words.
While it may seem strange to create this convoluted dictionary structure when the probability data is readily available. Preprocessing the data into a dictionary allows for efficiently looking up the next possible words given a tuple of previous words in O(1) time instead of iterating through the n-gram list in O(n) time.
With all the n-gram probabilities calculated and the next words preprocessed into a nice dictionary, it is possible to start suggesting! The next question becomes, how can these n-grams be combined to provide a useful suggestion?
Making Suggestions
The approach to making a suggestion is as follows, assuming an n-order model and an input sequence of n-1 words:
- Generate n-1 sequences by removing the first word from the input sequence until only 1 word remains
- Iterate over the generated sequences and find the next words, if any, by using the generated n-gram dictionary
- For each found next word, add it to a dictionary of suggestions along with its probability as a score. If the word already exists, take the maximum between the existing and new probability
For example, given the n=3 order model visualized above and the input ['of', 'the']
Generate the sequences
[('of', 'the'), ('the',)]
Get the next word probabilities for each sequence
> ngram_dict[('of', 'the')]['next']
{
'course':1
}> ngram_dict[('the',)]['next']
{
'course':0.5,
'end': 0.5
}
Build a dictionary of maximum next-word probabilities
{
'course': 1,
'end': 0.5
}
The next bit of code works just as advertised, iterating through the subsequences and building a dictionary of the next words. It is worth noting that the likelihood of the word occurring is no longer referred to as a probability but instead a score. I referred to it as score because the sum of these scores doesn’t add up to one, so to be honest, I am not sure if I can refer to it as a probability anymore ¯\_(ツ)_/¯
Conclusion
A simple bi-gram language model is able to suggest the next word(s) given a previous word. Often though, this simple language model suggests too many possible next words to be useful. On the other hand, a higher-order n-gram model is able to extract meaningful associations between words and phrases. Making it possible to provide context-appropriate suggestions. This comes at the cost of needing to provide the model with more previous words as context, which is not always available. Combining multiple n-grams in the same language model results in a robust model which can provide suggestions appropriate to however much or little context is available.
That’s All Folks
If this article has interested you enough to want to see some code, feel free to check out the GitHub repo below.
Like Share and Subscribe
If you enjoyed this article I would really appreciate it if you share my article. If you would like to read more of my content in the future please follow me.