WoWInterface - Looking for someone to help code a Trade Chat Filter

WoWInterface (https://www.wowinterface.com/forums/index.php)

- AddOn Search/Requests (https://www.wowinterface.com/forums/forumdisplay.php?f=6)

- - Looking for someone to help code a Trade Chat Filter (https://www.wowinterface.com/forums/showthread.php?t=41486)

Looking for someone to help code a Trade Chat Filter

I'd like to find someone to help implement a Trade chat filter addon idea I have. I'm not sure I can get the GUI part done in a reasonable amount of time.

I want to use a Bayesian algorithm to classify trade chat into multiple buckets. The SpamBayes addon uses a modified algorithm that only does 3 buckets. I'd like to use the algorithm implemented in Perl by POPFile that can be trained for a large number of categories.

Each category (including "unclassified") needs GUI to display the chat in that category with the ability to re-classify each chat to any other category. Each category also needs to toggle show or not show in the regular chat window.

What will be unique about this addon is that I envision classifying trade chat into categories such as Trade, LookingFor, GoldSpam, GuildAds and once trained, leaving Trade and LookingFor turned on and turning off GoldSpam and "unclassified" which should contain almost all of the gossip, trolling, trash, etc. It would be desireable to have the ability to report players in each category which can be turned on only for GoldSpam.

I have attempted to train the Spambayes for WoW addon but my "undesireable" trade chat covers too wide a range to successfully train on it. On the other hand, I believe I can train on the "desireable" trade chat and turn off all the rest. While turned off, it still needs to be captured (in a circular buffer) so that training can still occur.

If Blizzard increased the size of the ignore list by an order of magnitude, then I might be able to get some peace. I don't want to turn Trade off completely because I like to respond to trade/profession requests. I believe this addon could actually return "my" Trade chat to what I believe was originally intended.

Quote:

Originally Posted by bsmorgan (Post 245047)

If Blizzard increased the size of the ignore list by an order of magnitude, then I might be able to get some peace. I don't want to turn Trade off completely because I like to respond to trade/profession requests. I believe this addon could actually return "my" Trade chat to what I believe was originally intended.

I'm not experienced in the field of algorithms, but I know some addons exist that bypass Blizzard's built-in ignore list and keep track of their own that can be indefinite in size. The method I would use to achieve this is to keep track of a list of players you want to ignore and register a filter through ChatFrame_AddMessageEventFilter(event,function).

Here's an example of a public channel ignore filter.

Code:

local IgnoreList={

        "Bob";

        "Tom";

};



ChatFrame_AddMessageEventFilter("CHAT_MSG_CHANNEL",function(self,event,...)

        local msg,sender=...;

        for i,j in ipairs(IgnoreList) do

                if j==sender then return true; end--        Filter out matches

        end

        return false;--        Let the message through

end);

This will filter out any player named Bob or Tom that sends a message through any public chat channel. (General, Trade, etc.)
See WoWPedia:Events/Communication for a list of events and the arguments they supply.

Slightly off-topic, what is more efficient:

1.

Code:

local t = {

  "Name1",

  "Name2",

  "Name3",

  ...,

  "NameN"

}

function f(n)

  for k, v in pairs(t) do

    if n == v then

      return 1

    end

  end

end

Code:

local t = {

  ["Name1"] = 1,

  ["Name2"] = 1,

  ["Name3"] = 1,

  ...,

  ["NameN"] = 1,

}

function f(n)

  return t[n]

end

Thanks for the input. What I didn't express in the original post was that while a huge ignore list would certianly help, it isn't ideal because some of these habitual motor-mouths occasionally say something trade-appropriate, and their gold is just as good as anyone elses.

So while implementing a larger ignore list is possible, I still prefer my original approach.

If you could post a link to sorce code that uses such algorithms, it would be helpful. I'll have to analyze it to get an idea of specificly how the algorithm works and what it exactly does.

Quote:

Originally Posted by Vladinator (Post 245058)

Slightly off-topic, what is more efficient:

Intuitively I'd choose the latter, thinking that Lua itself can do it faster than doing it in Lua. But if you have really large tables, it's probably better to benchmark it beforehand.

Quote:

Originally Posted by SaraFdS (Post 245096)

Intuitively I'd choose the latter, thinking that Lua itself can do it faster than doing it in Lua. But if you have really large tables, it's probably better to benchmark it beforehand.

The later is faster while the first would be simpler for someone that knows nothing of Lua, the end user perhaps, to maintain.

Thanks for the replies on the efficiency matter.

Just out of curiosity, how would one go about to benchmark using the WoW client? I mean, GetTime() is the only API returning milliseconds so basically call that right before and after and subtract the time difference -right? :)

Quote:

Originally Posted by Vladinator (Post 245104)

Create an incredibly large table so that the function takes a long time to run. Makes it easier to measure. Print a GetTime before, and a GetTime after. :) Run each test multiple times to get a good sample size.

This is what I found, obviously the built in code handles it much faster.

The code I used: http://pastebin.com/Xfc2wgpF

1. There are over 10k names and the code picks 1/10th random names from the whole table, and makes sure they are not duplicates.
2. The function runs, either method 1 or 2, (1) find by key (2) traverse and match.
3. They return the time the finding of the names took, i.e. not the whole but only the matching parts that found the names in the huge table(s).
4. The times are compared and printed, the print lines while the tests ran are rounded down so not accurate, the final number is on the other hand.

Code:

[13:31:18] [1] 6810.301 to 6810.301 (0.000 ms) - 1024/1024 of 10239

[13:31:18] [1] 6810.301 to 6810.301 (0.000 ms) - 1024/1024 of 10239

[13:31:18] [1] 6810.301 to 6810.301 (0.000 ms) - 1024/1024 of 10239

[13:31:18] [1] 6810.301 to 6810.317 (0.016 ms) - 1024/1024 of 10239

[13:31:18] [1] 6810.317 to 6810.317 (0.000 ms) - 1024/1024 of 10239

[13:31:18] [1] 6810.317 to 6810.317 (0.000 ms) - 1024/1024 of 10239

[13:31:18] [1] 6810.317 to 6810.317 (0.000 ms) - 1024/1024 of 10239

[13:31:18] [1] 6810.317 to 6810.317 (0.000 ms) - 1024/1024 of 10239

[13:31:18] [1] 6810.317 to 6810.317 (0.000 ms) - 1024/1024 of 10239

[13:31:18] [1] 6810.317 to 6810.317 (0.000 ms) - 1024/1024 of 10239

[13:31:18] test1: 0.0015999999999622 ms

[13:31:26] [2] 6817.181 to 6817.883 (0.702 ms) - 1024/1024 of 10239

[13:31:26] [2] 6817.961 to 6818.663 (0.702 ms) - 1024/1024 of 10239

[13:31:27] [2] 6818.741 to 6819.427 (0.686 ms) - 1024/1024 of 10239

[13:31:28] [2] 6819.505 to 6820.207 (0.702 ms) - 1024/1024 of 10239

[13:31:29] [2] 6820.285 to 6820.972 (0.687 ms) - 1024/1024 of 10239

[13:31:29] [2] 6821.050 to 6821.752 (0.702 ms) - 1024/1024 of 10239

[13:31:30] [2] 6821.830 to 6822.532 (0.702 ms) - 1024/1024 of 10239

[13:31:31] [2] 6822.610 to 6823.296 (0.686 ms) - 1024/1024 of 10239

[13:31:32] [2] 6823.374 to 6824.060 (0.686 ms) - 1024/1024 of 10239

[13:31:33] [2] 6824.138 to 6824.856 (0.718 ms) - 1024/1024 of 10239

[13:31:33] test2: 0.69730000000009 ms

Traversing used 0.6973 ms while using the table key used only 0.0016 ms. So with a lot of data it's more efficient to find using keys, rather than traversing and matching, really. :)

Quote:

Originally Posted by SDPhantom (Post 245077)

If you could post a link to sorce code that uses such algorithms, it would be helpful. I'll have to analyze it to get an idea of specificly how the algorithm works and what it exactly does.

The inspiration for this request comes from the POPfile open source project at http://getpopfile.org/. Their website includes downloads for Windows, Linux, OS X, and a cross platform version. The sources are available, the project is implemented in Perl.

It is possible to download and install POPfile without hooking into your email client. You can then browse through the user interface. The online documentation is also excellent.

Quote:

Originally Posted by Vladinator (Post 245111)

This is what I found, obviously the built in code handles it much faster.

The code I used: http://pastebin.com/Xfc2wgpF

...

Traversing used 0.6973 ms while using the table key used only 0.0016 ms. So with a lot of data it's more efficient to find using keys, rather than traversing and matching, really. :)

This is because the internal storage method for table keys uses a hash indexing, Lua doesn't need to transverse an array internally to find a value. Even if it did, the C code would run many times faster than running Lua code through its own scripting engine.

For future reference, I'd suggest using debugprofilestart() and debugprofilestop() to benchmark CPU time spent. debugprofilestop() returns the time from when debugprofilestart() was last called with much higher precision than comparing timestamps from GetTime(). This lack in precision is what's causing the 0 ms readings and happens a lot when trying to compare Lua C functions and internal Lua processing.

The limit on GetTime() precision is because it's merely a Lua wrapper for the C function gettickcount() from the system kernel that returns an integer describing system uptime in milliseconds. GetTime() simply returns this integer divided by 1000.