Thread Tools Display Modes
Prev Previous Post   Next Post Next
11-06-15, 11:34 AM   #1
MunkDev
A Scalebane Royal Guard
 
MunkDev's Avatar
AddOn Author - Click to view addons
Join Date: Mar 2015
Posts: 431
Efficient string and table manipulation for an in-game dictionary

Hello, friends.

I've written code to auto-complete words in editboxes, using the global environment to scan for and break apart strings to generate a weighted dictionary. On an enUS client, the initial run generates approximately 5000 words, which is a pretty good starting point.

Now, when iterating the table of words to find potential matches, I'm using these conditions:
  • Literal matches in words
  • The union of characters in the current input as a length barrier
  • The existence of the union of characters of the current input
  • The dictionary word is not exactly equal to the input word
  • The priority of suggestions is based in similarity, length difference and frequency

The downside to my current approach is:
  • If the input word is misspelled, suggestion for anticipated word will be omitted
  • If the input characters are in the wrong order, the first suggestion might be unexpected

Example:
  • Misspell: "abhieve" -> "achieve", "achievement", "achievements" (these are omitted)
  • Skewed: "abhieve" -> "behaving", "behavior", "behaviors" (these are suggested)

Also, to reduce "unnecessary" calculations, I've put a 20 index max on the generated suggestion list, meaning it will stop comparing words of the dictionary to the suggestion list after hitting the 20th index. The word will still be pushed to the list, though. Without optimizing to some degree, there will be a small FPS stutter when using extremely common combinations of characters, such as "es", "er", "et", "ae", "an", etc.

I was wondering if anyone has ideas to improve this algorithm and make it smarter.

This is the code used to generate suggestions:
Lua Code:
  1. function Keyboard:GetSuggestions()
  2.     local word = self:GetCurrentWord() -- get the currently chosen word in the editbox
  3.     wipe(suggestions) -- remove all old suggestions before proceeding
  4.     if word then
  5.         word = strlower(word)
  6.         local length = word:len()
  7.         local dictionary = self.Dictionary
  8.         local chars, numChars = Union(word) -- returns the union of characters in a word
  9.  
  10.         local this, priority, valid
  11.  
  12.         -- Iterate through dictionary and push valid words to suggestion list
  13.         for thisWord, thisWeight in pairs(dictionary) do
  14.             valid = true
  15.  
  16.             -- skip exact matches and matches that are vastly longer than the union of input characters
  17.             if thisWord == word or numChars * 4 < thisWord:len() then
  18.                 valid = false
  19.             end
  20.  
  21.             -- check if word contains all characters to elicit a match
  22.             for c in chars:gmatch(".") do
  23.                 if not strfind(thisWord, c) then
  24.                     valid = false
  25.                     break
  26.                 end
  27.             end
  28.  
  29.             if valid then
  30.                 priority = 1
  31.  
  32.                 this = {
  33.                     word = thisWord,
  34.                     weight = thisWeight,
  35.                     match = strfind( thisWord, word ),
  36.                     length = abs( length - thisWord:len() )
  37.                 }
  38.  
  39.                 -- calculate priority in relation to already suggested words
  40.                 for index, compare in pairs(suggestions) do
  41.  
  42.                     -- don't calculate order beyond the 20th index, just push to list
  43.                     if ( index >= 20 ) then
  44.                         break
  45.                     end
  46.  
  47.                     -- fix: if the best suggestion was pushed first, make sure its priority isn't nudged down
  48.                     priority = #suggestions > 1 and index or 2
  49.  
  50.                     -- words with literal matches have higher priority
  51.                     if  ( this.match and not compare.match ) then
  52.                         break
  53.                     -- if both have literal match and this word is more favorable, or neither have literal match
  54.                     elseif  ( this.match and compare.match and ( this.match <= compare.match ) ) or
  55.                             ( not this.match and not compare.match ) then
  56.  
  57.                         -- if this word is shorter, or equally long but has a higher weight
  58.                         if  ( this.length < compare.length or
  59.                             ( this.length == compare.length and this.weight > compare.weight ) ) then
  60.                             break
  61.                         end
  62.  
  63.                     end
  64.                 end
  65.                 tinsert( suggestions, priority, this )
  66.             end
  67.         end
  68.     end
  69.     self:SetSuggestions(1)
  70. end

This is the code used to generate the dictionary:
Lua Code:
  1. function Keyboard:GenerateDictionary()
  2.    -- generates a localized game-oriented dictionary (5000+ words on enUS clients)
  3.    -- scan the global environment for strings
  4.    local genv = getfenv(0)
  5.    local dictionary = {}
  6.    for index, object in pairs(genv) do
  7.       if type(object) == "string" then
  8.          -- remove escape sequences, uses a table of patterns for gsub
  9.          object = Unescape(object)
  10.          -- scan each string for individual words
  11.          for word in object:gmatch("[%a][%w']*[%w]+") do
  12.             word = strlower(word)
  13.             if not dictionary[word] then -- first occurence
  14.                dictionary[word] = 1
  15.             else -- weight increment
  16.                dictionary[word] = dictionary[word] + 1
  17.             end
  18.          end
  19.       end
  20.    end
  21.    return dictionary
  22. end
Pastebin: dictionary dump
__________________

Last edited by MunkDev : 11-07-15 at 09:39 PM.
  Reply With Quote
 

WoWInterface » Developer Discussions » General Authoring Discussion » Efficient string and table manipulation for an in-game dictionary


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off