Hello, friends.
I've written code to auto-complete words in editboxes, using the global environment to scan for and break apart strings to generate a weighted dictionary. On an enUS client, the initial run generates approximately 5000 words, which is a pretty good starting point.
Now, when iterating the table of words to find potential matches, I'm using these conditions:
- Literal matches in words
- The union of characters in the current input as a length barrier
- The existence of the union of characters of the current input
- The dictionary word is not exactly equal to the input word
- The priority of suggestions is based in similarity, length difference and frequency
The downside to my current approach is:
- If the input word is misspelled, suggestion for anticipated word will be omitted
- If the input characters are in the wrong order, the first suggestion might be unexpected
Example:
- Misspell: "abhieve" -> "achieve", "achievement", "achievements" (these are omitted)
- Skewed: "abhieve" -> "behaving", "behavior", "behaviors" (these are suggested)
Also, to reduce "unnecessary" calculations, I've put a 20 index max on the generated suggestion list, meaning it will stop comparing words of the dictionary to the suggestion list after hitting the 20th index. The word will still be pushed to the list, though. Without optimizing to some degree, there will be a small FPS stutter when using extremely common combinations of characters, such as "es", "er", "et", "ae", "an", etc.
I was wondering if anyone has ideas to improve this algorithm and make it smarter.
This is the code used to generate suggestions:
Lua Code:
function Keyboard:GetSuggestions()
local word = self:GetCurrentWord() -- get the currently chosen word in the editbox
wipe(suggestions) -- remove all old suggestions before proceeding
if word then
word = strlower(word)
local length = word:len()
local dictionary = self.Dictionary
local chars, numChars = Union(word) -- returns the union of characters in a word
local this, priority, valid
-- Iterate through dictionary and push valid words to suggestion list
for thisWord, thisWeight in pairs(dictionary) do
valid = true
-- skip exact matches and matches that are vastly longer than the union of input characters
if thisWord == word or numChars * 4 < thisWord:len() then
valid = false
end
-- check if word contains all characters to elicit a match
for c in chars:gmatch(".") do
if not strfind(thisWord, c) then
valid = false
break
end
end
if valid then
priority = 1
this = {
word = thisWord,
weight = thisWeight,
match = strfind( thisWord, word ),
length = abs( length - thisWord:len() )
}
-- calculate priority in relation to already suggested words
for index, compare in pairs(suggestions) do
-- don't calculate order beyond the 20th index, just push to list
if ( index >= 20 ) then
break
end
-- fix: if the best suggestion was pushed first, make sure its priority isn't nudged down
priority = #suggestions > 1 and index or 2
-- words with literal matches have higher priority
if ( this.match and not compare.match ) then
break
-- if both have literal match and this word is more favorable, or neither have literal match
elseif ( this.match and compare.match and ( this.match <= compare.match ) ) or
( not this.match and not compare.match ) then
-- if this word is shorter, or equally long but has a higher weight
if ( this.length < compare.length or
( this.length == compare.length and this.weight > compare.weight ) ) then
break
end
end
end
tinsert( suggestions, priority, this )
end
end
end
self:SetSuggestions(1)
end
This is the code used to generate the dictionary:
Lua Code:
function Keyboard:GenerateDictionary()
-- generates a localized game-oriented dictionary (5000+ words on enUS clients)
-- scan the global environment for strings
local genv = getfenv(0)
local dictionary = {}
for index, object in pairs(genv) do
if type(object) == "string" then
-- remove escape sequences, uses a table of patterns for gsub
object = Unescape(object)
-- scan each string for individual words
for word in object:gmatch("[%a][%w']*[%w]+") do
word = strlower(word)
if not dictionary[word] then -- first occurence
dictionary[word] = 1
else -- weight increment
dictionary[word] = dictionary[word] + 1
end
end
end
end
return dictionary
end
Pastebin: dictionary dump