View Single Post
05-25-13, 02:04 AM   #8
ravagernl
Proceritate Corporis
Premium Member
AddOn Author - Click to view addons
Join Date: Feb 2006
Posts: 1,176
Originally Posted by Rainrider View Post
I also don't understand what exactly is .[\128-\191]* supposed to mean but it makes it work for non-english characters (haven't tested chinese and korean though, but I do not look for such functionality).

Sorry for writing in between.
I think range 128 - 191 match the diacritic characters used to compose a unicode character (first 128 are basically ascii characters).

EDIT: Found this on http://lua-users.org/wiki/LuaUnicode:
Happily UTF-8 is designed so that it is relatively easy to count the number of unicode symbols in a string: simply count the number of octets that are in the ranges 0x00 to 0x7f (inclusive) or 0xC2 to 0xF4 (inclusive). (In decimal, 0-127 and 194-244.) These are the codes which can start a UTF-8 character code. Octets 0xC0, 0xC1 and 0xF5 to 0xFF (192, 193 and 245-255) cannot appear in a conforming UTF-8 sequence; octets in the range 0x80 to 0xBF (128-191) can only appear in the second and subsequent octets of a multi-octet encoding. Remember that you cannot use \0 in a Lua pattern.

Last edited by ravagernl : 05-25-13 at 02:45 AM.
  Reply With Quote