View Single Post
Unread 05-27-13, 12:36 AM   #10
Phanx
A Pyroguard Emberseer
 
Phanx's Avatar
AddOn Author - Click to view addons
Join Date: Mar 2006
Posts: 4,437
You're getting that ? character because you're actually breaking the Russian character \208\176 (а) in half, keeping only \208 which is not a valid Unicode character by itself. The string functions in WoW are not Unicode aware; they only look at bytes. If you want to support languages with multi-byte characters, you can either use the UTF8 library which provides UTF8-aware versions of some string functions, or you can split it up, count bytes, etc. yourself.

Either way it's going to take more than a simple gsub. You'd probably want to just split it up into "Russian clients use this code path" and "everyone else use this code path" since Korean and Chinese (the other WoW locales with multi-byte characters) generally don't use spaces between words, and cannot be meaningfully abbreviated anyway.

Code:
local old, new = "Echo of a Pandaren Monk"
if GetLocale() == "ruRU" then
    -- complicated version
    new = ""
    for word in string.gmatch(old, "(%S+)%s") do
        new = new .. string.utf8sub(word, 1, 1), " " -- uses UTF8 lib function
    end
    new = new .. strmatch(old, "%S+$")
else
    -- simple version
    new = gsub(old, "(%S[\128-191]*)%S+%s", "%1. ")
end
-- do something with new here
__________________
Author/maintainer of Grid, PhanxChat, ShieldsUp, and many more.
Troubleshoot an addonTurn any code into an addonMore addon resources
Need help with your code? Post all of your actual code! Attach or paste your files.
Please don’t PM me about addon bugs or code questions. Post a comment or forum thread instead!
Phanx is offline   Reply With Quote