Thread Tools Display Modes
05-10-06, 07:05 PM   #1
rophy
A Fallenroot Satyr
AddOn Author - Click to view addons
Join Date: Feb 2006
Posts: 24
Parsing combat events

When I tried to create my first addon, I found that parsing WoW's combat event is a very non-trivial problem, especially if I want to make the parser to work on all language.


== Lack of information ==

1. You have to know what event will have the message you want.
2. You have to know what different messages are there in a event.
3. You have to know what can be parsed from every single message.

I tried to find these information from the internet resources such as wowwiki but there are so few of them, and so I have to spend a long, long time trying to figure out which event produces which pattern myself.

== Making it works on all languages ==

1. You must parse COMBATHITCRITOTHERSELF and not "%s crits you for %d."

2. For some languages, parsing the patterns in a wrong sequence can get you the wrong result, for example: Your greater heal critically heals pig for 500.
the spell name can be parsed "correctly" as "greater heal critically" with the "Your %s heals %s for %d." pattern

3. A parsing sequence which works in one language, might not work in another language, conflect exists. I remember there exists a few very rare patterns, which do NOT have a sequence working in all languages.

4. To solve 2, 3, you can parse through all the possible patterns first, and then if there are multiple results, try to find out the correct one. But then this will have a great impact on the performance.

5. Some badly translated patterns simply cannot be parsed with 100% accuracy. For example I have seen a pattern in Trad. Chinese WoW, I can't remember exactly which, but it is something like "You hit %s%d." (no space between %s and %d). So if you hit a mob called "Terminator X-21" (exists in gnomeregan) for 123 dmg, the message will be:

You hit Terminator X-21123.

5 is very, very rare, so I simply use (.-) for strings to minimize the impacts, but anyway the problem exists and I do not know a perfect solution.


== Everyone repeatly doing the same thing ==

I also found that while most of the addons have to parse the messages to get the information they want, they do the work on their own.

So if I use SCT, Recap, DamageMeters, then my client is parsing the same patterns 3 times for every single combat message. I think this is a very bad thing on performance.



== So what ? ==

1. Create a detailed documentation about what combat event produces what patterns, and what the patterns look like in each language.
2. Create an common combat event parser library for everyone to use, so that each event will be parsed once only.
3. Keep on trying to solve the pattern localization problem, try to get better performance while still works on all languages.


What I wanted to say is, I has been trying to create these on my own for a while.

I made a simple addon which listens to all combat events and record what patterns they produce, to assist in the documentation. Then I should read each patterns and decide what information can be retrieved from them, then create a library which parses patterns and return useful information.

The library addon is still in very alpha stage, and I'm just a noob on addon coding.
So while I'm continuing my works, I would like to ask the experienced addon coders about my thoughts.
If I do created the documentation and addon, are they useful to other addon authors? Is there anyone else already doing the same job?
If there are already someone working on these, then I'd like to try to contribute something, or may be I can just stop doing this time-consuming work and wait for the release of documentation / addon?

oh, sorry about my bad english, I hope everyone understand what I was talking about.
  Reply With Quote
05-10-06, 08:05 PM   #2
Gello
A Molten Giant
AddOn Author - Click to view addons
Join Date: Jan 2005
Posts: 521
I believe Tem was working on a project like this. Maybe she'll have some input. But it was months ago so maybe it's been set aside for a bit. It really is a huge project.

The documentation would be invaluable to a lot of people so that in itself is worth doing, even if the library isn't created.

The library would be useful but I think less as a common parsing point than you may expect:
1. Most mods that watch the combat log only register for a handful of events. Only a few register for almost all of them.
2. string.find is fast. While filing down through a bunch of filters takes time, it's an easy trap to believe that if two mods do the same string.find then it's a monumental waste. When one mod only registers for a few events, it's not, really. Performance is in no way related to perception. The CPU doesn't ask how a user feels when it processes a command. Different things take different amounts of time to process and some things are fast enough to be considered instant. Test and compare always.
3. It would depend on mods adopting the library. There is a natural hesitation to use dependencies. I suggest an embedded library. But even as an embedded library it would be unlikely to be used by established mods that parse combat. Some mods don't even use combat logs but use UNIT_COMBAT instead.

Its greater use would be lowering the barrier of entry for new mods. These are all likely things you've thought of, but I would keep in mind:

1. Mods often care when damage happened as much as what happened.
2. There will be huge demand for sync'ing the data.
3. You'll need to decide where the data lives. Do you store every conceivable effect, resist and combatant in the library's tables? Or do you let the separate mods handle that?

I think a sane approach to a parsing library would be to let the other mods handle the data themselves.

MyMod registers with the library, telling it to send stuff to MyMod.CombatEvent()

Bob's Smite hits Joe for 100 Holy damage (10 resisted).
library calls MyMod.CombatEvent("damage","Bob","Joe","Smite","Holy",100)

The MyMod.CombatEvent can work from generic to specific for its particular needs, ignoring stuff it doesn't care about and stowing away the info it does.

Actually now that I think about it, I think this is the only possible type of library implementation. Otherwise if you reset the data in one mod you reset them all, unless you plan to keep a massive--MASSIVE--amount of information.

On localization, I speak from months of experience and ulcers: I will never again attempt to do localized combat log parsing. The german possessives, the ambiguous french heals, etc make me nauesous thinking back to it.

Good luck with it. It's an ambitious project.

Last edited by Gello : 05-10-06 at 08:08 PM.
  Reply With Quote
05-11-06, 01:23 AM   #3
rophy
A Fallenroot Satyr
AddOn Author - Click to view addons
Join Date: Feb 2006
Posts: 24
Originally Posted by Gello
I believe Tem was working on a project like this. Maybe she'll have some input. But it was months ago so maybe it's been set aside for a bit. It really is a huge project.

The documentation would be invaluable to a lot of people so that in itself is worth doing, even if the library isn't created.

The library would be useful but I think less as a common parsing point than you may expect:
1. Most mods that watch the combat log only register for a handful of events. Only a few register for almost all of them.
2. string.find is fast. While filing down through a bunch of filters takes time, it's an easy trap to believe that if two mods do the same string.find then it's a monumental waste. When one mod only registers for a few events, it's not, really. Performance is in no way related to perception. The CPU doesn't ask how a user feels when it processes a command. Different things take different amounts of time to process and some things are fast enough to be considered instant. Test and compare always.
3. It would depend on mods adopting the library. There is a natural hesitation to use dependencies. I suggest an embedded library. But even as an embedded library it would be unlikely to be used by established mods that parse combat. Some mods don't even use combat logs but use UNIT_COMBAT instead.
Yes I have thought about all of these. I'm not that good at creating good codes, but I will mostly likely "learn" from good codes such as yours and addons like Ace, FuBar libraries etc.

1. Addons register to lib for specific events they want to listen, and lib only register an event when someone actually wants to listen.
2. If there is a common library, when we "optimize" something in the parsing code, all client addons benefit from it. If there are many addons using it, I personally think there should make some actual differences on performance. And by (1) since lib only register an event if there is at least one event interested in it, performance should only be better, not worse IMO?
3. Yeah, most likely I'll just learn from existing libraries such as BabbleLib, CompostLib.


Its greater use would be lowering the barrier of entry for new mods. These are all likely things you've thought of, but I would keep in mind:

1. Mods often care when damage happened as much as what happened.
2. There will be huge demand for sync'ing the data.
3. You'll need to decide where the data lives. Do you store every conceivable effect, resist and combatant in the library's tables? Or do you let the separate mods handle that?
1. Yes, damage part is the core, and it's why I wanted to do this. The lib can already parses all attacks, hits and misses. But if I have to made a common library, I should I should add all event messages which requires parsing, such as.... spell started casting, making items, gaining rep, raising to another level of rep against some factions etc. If I have the knowledge to make an addon which knows ALL patterns, then at the same time I should be able to complete the documentation.

2. What kind of sync'ing do you mean?

3. For now the approach is to pass a table recording parsed information to the client functions. I know creating a table for every event is wasteful so this is gonna get changed, my thought is stay on passing tables, but use CompostLib to recycle them. So that's an embedded library in an embedded library. Client addons are expected to record down required information immedately when they receive the table, since the table is gonna get recycled.


I think a sane approach to a parsing library would be to let the other mods handle the data themselves.

MyMod registers with the library, telling it to send stuff to MyMod.CombatEvent()

Bob's Smite hits Joe for 100 Holy damage (10 resisted).
library calls MyMod.CombatEvent("damage","Bob","Joe","Smite","Holy",100)

The MyMod.CombatEvent can work from generic to specific for its particular needs, ignoring stuff it doesn't care about and stowing away the info it does.

Actually now that I think about it, I think this is the only possible type of library implementation. Otherwise if you reset the data in one mod you reset them all, unless you plan to keep a massive--MASSIVE--amount of information.
Yes, these are very close to my thoughts, although I create a table storing "damage", "Bob", "Joe" etc and pass it to the client function on each event, I know this is not a good behaviour.




On localization, I speak from months of experience and ulcers: I will never again attempt to do localized combat log parsing. The german possessives, the ambiguous french heals, etc make me nauesous thinking back to it.

Good luck with it. It's an ambitious project.

Localization is what taking me the most time, I learned most of the codes from your Recap, it's really a good addon, much better than what I can make. But when I try to learn from your language converting part, I have found quite many problems on each language.
Since then I have always been trying to solve this problem, so I think if there're someone doing the same thing, I can share my results.
The basic approaches are explained on the previous posts, but in addition I have made a "testing function" to simulate all patterns, and warning on wrong info being parsed. Then I just copy the whole globalstring.lua into the addon (you can't replace globalstring.lua now).

It's not a trivial project, that's why I hope someone are already working on them, so that I don't have to do all these on my own. But anyway I have already made something for both documentation and addon.

For the documentation, basically I created a simple addon which stores ALL possible patterns, and registers all pattern-related events. Then I just play the game normally and let it records what event will produces what patterns. Then I just need to refer to the stored variables to create the documentation.

For the addon I have just made the bare bones: clients register to it, it registers the events if there are clients interested on, if there are events then it parses for the inforation and then pass to client function as a table. The clients will check the "category" variable, and refer to the addon documentation to know what variables will be passed to each category. Currently I believe the patterns related to damage part should be complete ( those which Recap, DamageMeters and CombatStats would interested on ).

Last edited by rophy : 05-11-06 at 01:26 AM.
  Reply With Quote
05-23-06, 09:57 AM   #4
rophy
A Fallenroot Satyr
AddOn Author - Click to view addons
Join Date: Feb 2006
Posts: 24
current progress

I submited my current work to wowwiki at here, it's still far from complete, but I hope the power of wiki can speed up the progress.

The Parser Library has most of the concepts implemented I think, but the current problem is that, since there are so many patterns and events, it results in almost 2500 lines of pattern and event information.

I made the pattern and event table starts out empty, and only loads the necessary information when needed ( by a very long if eventName == "XXXX" then return xxx elseif eventName == "XXXX" then return xxxxxx ... ), some events can be combined to reduce the file size, but is there any better method to implement such idea than then if elseif elseif chain?


If any experienced coder may spend some time to take a look at my code and give some comments, I'd really, really appreciate it.

http://220.134.137.44/~rophy/ParserBench.zip

Last edited by rophy : 05-23-06 at 10:00 AM.
  Reply With Quote
05-23-06, 11:34 AM   #5
Iriel
Super Moderator
WoWInterface Super Mod
Featured
Join Date: Jun 2005
Posts: 578
It would seem that GetPatternInfo would be much better implemented as a table

patternTable = {
AURAADDEDOTHERHARMFUL = { type = lib.DEBUFF, victim = 1, skill = 2 };
}
etc

Ditto for GetPatternList, and in that case, you should sort the lists once at the start, rather than re-sorting every time you query them, a lot of wasted CPU time there.

Table driven code will run faster too, since your if/elseif will have O(N) performance (For an average lookup you have to go through half of the available entires), and the hash lookup will be closer to O(1)
  Reply With Quote
05-23-06, 03:48 PM   #6
rophy
A Fallenroot Satyr
AddOn Author - Click to view addons
Join Date: Feb 2006
Posts: 24
Originally Posted by Iriel
It would seem that GetPatternInfo would be much better implemented as a table

patternTable = {
AURAADDEDOTHERHARMFUL = { type = lib.DEBUFF, victim = 1, skill = 2 };
}
etc

Ditto for GetPatternList, and in that case, you should sort the lists once at the start, rather than re-sorting every time you query them, a lot of wasted CPU time there.

Table driven code will run faster too, since your if/elseif will have O(N) performance (For an average lookup you have to go through half of the available entires), and the hash lookup will be closer to O(1)

Actually both are tables, just that they start out empty:

patternTable = {}
infoTable = {}

AddEventHandler() : if not eventTable[event] then eventTable[event] = GetPatternList(event) end

FindPattern() : if not patternTable[pattern] then patternTable[pattern] = GetPatternInfo(pattern) end

... something like this, The pattern list for each events is sorted once only when it's added to eventTable.


I did some test on memory usage by comparing a fully loaded table vs a long if elseif chain to load the elements on request, the result was as follows: (Measured with Warmup)

Full Table
  • no EventTable, no PatternTalbe : 83 KB
  • no EventTable, full PatternTable : 226 KB
  • Full EventTable, no PatternTable : 472 KB
  • Full EventTable and PatternTable : 593 KB


Load both table elements on request by the long, long if elseif chain:
  • No event registered : 148 KB (So the two GetPatternInfo() and GetPatternList() function cost something like 148 - 83 = 65 KB for the raw code?)
  • All event registered, no pattern loaded : 368 KB
  • All event registered, all pattern loaded : 628 KB

So I personally think that load the element on request looks like a better approach?

I'm thinking about changing string index to numeric index, may be it can further reduce the required memory? But then it'll be much harder to add or remove an event / pattern, so probably not before I make sure that I know all the patterns required.

Last edited by rophy : 05-23-06 at 04:09 PM.
  Reply With Quote
05-23-06, 06:38 PM   #7
Iriel
Super Moderator
WoWInterface Super Mod
Featured
Join Date: Jun 2005
Posts: 578
Changing to numbers from strings will have negligible impact, beyond the length of the strings themselves. Why you're using numeric constants for things like HIT instead of just the string "HIT" is also worth asking, lua uses shared strings so all instances of "HIT" are references to the same set of bytes. That might give you some performance advantages also, since you'd eliminate a lot of 'self' dereferences.

Just sanity checking the size numbers...

As a table, patternInfo would be a table with 241 entries:

32 + 80 * 256 = 20,512 bytes

Plus the subtables, of which there are 241 with a total of 1034 table entries, I'm going to guess about half have 3 or fewer keys, and half have more, and all have less than 8

241 * (32 + 80 * ((4 + 8) / 2)) = 123,392 bytes

Total of the two is 140KB ish, which is consistent with your observations.

If you were to encode that all as a big array with alternating key/value pairs, then it ends up much smaller:

32 + (241 * 2 + 1034 * 2) * 16 = 40,832 = 40KB ish

So if your goal is minimal memory use, you'd be better off keeping your if/elseif/else structure, or using a packed array format plus a generic decoder (And with metatables you can make both of those invisible and act like a simple table), but you'll incur some penalty at early runtime while the necessary entries are built.

From a maintainability perspective, the pre-built tables have a certain appeal.
  Reply With Quote
05-23-06, 09:34 PM   #8
rophy
A Fallenroot Satyr
AddOn Author - Click to view addons
Join Date: Feb 2006
Posts: 24
I don't know which approach is better (self.HIT or "HIT"), so I make it self.HIT first, then I can replace them easily if I want to change it.

Sounds like for 'type' I should just use "HIT", but numeric 'victim', 'source' has an advantage that it will prevent conflict with actual parsed string. I mean like..... source = "SELF", and then if a boring hunter made a pet called "SELF" they'll become ambiguous . If 'self' dereferences have a performance hit, how about just make them global constants like ParserLib_SELF etc?

for the "big array with alternating key/value pairs", do you mean something like...

{ "AURAADDEDOTHERHARMFUL", "type", "DEBUFF", "victim", 1, "skill", 2, "NEXT", "AURAADDEDOTHERHELPFUL", "type", ...... } ?

Why would that results in smaller size than regular key-value tables?

and how exactly do I build a metatable to make it act like a simple table? By a decoder function? I would guess the metatable is something like..

{ "AURAADDEDOTHERHARMFUL"=1, "AURAADDEDOTHERHELPFUL"=10, .... }

That 2nd table can be built with one iteration, then when I want some information I call with a simple decoding function like.... GetField("AURAADDEDOTHERHELPFUL", "type") -> returns "DEBUFF"

something like this?

Sorry if I asked so many dumb questions. I want to create a library with minimal overhead on both performance and memory so that people would actually use it.

Current overhead on memory with no event registered is 148 KB, I wish I can further push it to 100, may be by combining the similar events or whatever.

For fully loaded memory usage, I think there is an obvious trade off between memory and speed. Most of the time addons won't register all of the events so I think I'll consider speed more important than memory here.

Last edited by rophy : 05-23-06 at 10:27 PM.
  Reply With Quote
05-24-06, 12:13 PM   #9
Iriel
Super Moderator
WoWInterface Super Mod
Featured
Join Date: Jun 2005
Posts: 578
By the array of key value pairs I did indeed mean something like:

patternArray = {
0, "AURAADDEDOTHERHARMFUL",
"type", "DEBUFF",
"victim", 1,
"skill", 2,
0, "AURAADDEDOTHERHELPFUL",
"type", "BUFF",
"victim", 1,
"skill", 2,
...
}

The reason for the compact memory use is due to the way in which lua represents array style table entries versus hash style table entries, for array style ones created as literals in compiled code, the size is:

32 + 16 * numEntries

(For those which are 'grown' at runtime in code numEntries increases by doubling so is always a power of 2)

Whereas for hash tables, the size ends up being:

32 + 80 * numEntries (and numEntries is generally a power of 2, though I'm not sure for literal tables)

Since a hash entry needs a key, a value, chaining pointers, and some hash information, compared to the 'just a value' of the array style ones.


As for metatables, for self-configuring tables you use the __index metamethod, it works something like this:

function magicIndex(t, k) -- t is table, k is requested key
local v = functionToFindValueForKey(k);
if (v ~= nil) then t[k] = v; end
return v;
end

magicTable = {};
setmetatable(magicTable, { __index = magicIndex; }

For table references that are already stored in the table, they happen the normal way, accessing a key which is not present in the table causes the __index metamethod to be called with the table and key in question, the code above then calls a function to build a new value, and if it's found, inserts it into the table (Subsequent requests for that key will be a direct hit and avoid the metamethod)
  Reply With Quote
05-24-06, 12:45 PM   #10
rophy
A Fallenroot Satyr
AddOn Author - Click to view addons
Join Date: Feb 2006
Posts: 24
hmm......... so the final result table, after accessing all elements once, will have both numeric index and string index? won't that make the table size larger than the table with only string index?
  Reply With Quote
05-25-06, 12:12 AM   #11
Iriel
Super Moderator
WoWInterface Super Mod
Featured
Join Date: Jun 2005
Posts: 578
I'd imagine you'd have two tables, the virtual one that fills itself in, and then the 'seed' array-table.

There's nothing to stop you removing entries from the seed table as they're made real (using table.remove), that allows lua to resize the seed table and reclaim space if it feels like it, and if you do that, accessing ALL of the entries will give you an empty seed table and a populated 'virtual' one.

Of course, if you think that users really would be hitting ALL of your events then you may as well just start off with the full table, since you'd end up there anyway!
  Reply With Quote
05-27-06, 12:10 AM   #12
rophy
A Fallenroot Satyr
AddOn Author - Click to view addons
Join Date: Feb 2006
Posts: 24
hmm. I think I'll stay with the if elseif chain then, it doesn't need to create another table or remap the index, just that the source code looks long.

Thank you so much for explaining all my questions, it really helped me a lot on understanding how lua works, really appreciate it.
  Reply With Quote

WoWInterface » Developer Discussions » Lua/XML Help » Parsing combat events


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off