Quantcast
Data Compression with Bitfields - WoWInterface
Thread Tools Display Modes
10-30-15, 12:12 PM   #1
Elstarin
A Murloc Raider
Join Date: Oct 2015
Posts: 5
Data Compression with Bitfields

I've been working on a large addon project for a long time. My addon gathers lots of data about the user's performance in combat and tries to convert it into a simple format to quickly tell the user what parts of combat they are doing well, and what they need to improve on. A big part of how it displays data is with graphs. My function that draws the graphs is pretty good, it's efficient, it can easily handle lots of graphs with thousands, or tens of thousands, of data points each.

However, it's pretty important for my addon to be able to store lots of combat sessions in the saved variables for later viewing, like for comparisons and stuff. I'm worried about how big the saved variables can get with storing all the graph points for lots of fights.

Each of my graphs has an array, containing the X and Y plot points, but the X points can be safely discarded before saving as they are taken at specific intervals, so I can rebuild them later. That leaves me with lots of arrays of Y points, like this:

Lua Code:
  1. graph.data = {
  2.    [1] = 87.93,
  3.    [2] = 38.2,
  4.    [3] = 54.76,
  5.    [4] = 76.3,
  6.    [5] = 65,
  7.    [6] = 34.7,
  8.    ...
  9. }

These can be quite long, easily over 1,000 entries, sometimes over 10,000 per graph, and there are a lot of graphs per fight, and a lot of fights stored. So, to finally get to the point, I'm wondering if I can compress this data down before saving it using bitfields, and then uncompress it when I also rebuild the X points when the graph is loaded.

Unfortunately, I'm finding bitfields to be incredibly confusing. I'd share my code for them, but I actually haven't managed to get anything even remotely close to working yet, so there's nothing to show.

Is this practical? Is it even possible? I've been unable to find any examples of this type of thing online, which makes me think that either it's really obvious and I'm being dumb, or it just isn't a good idea, and I'm being dumb. If this can work, can anyone point me in the right direction of how to get going with this?

Thanks!
  Reply With Quote
10-30-15, 12:40 PM   #2
Dridzt
A Pyroguard Emberseer
 
Dridzt's Avatar
AddOn Author - Click to view addons
Join Date: Nov 2005
Posts: 1,257
Have you found, ran any tests on your data with LibCompress?

Discussion thread
  Reply With Quote
10-31-15, 12:07 PM   #3
Elstarin
A Murloc Raider
Join Date: Oct 2015
Posts: 5
Hmm, somehow I managed to miss that. I've done some testing, and it does seem to compress it down quite a bit. Here's what I was doing for testing:

Lua Code:
  1. local data = {}
  2.  
  3. for i = 1, 10000 do
  4.   local y = random(1, 10000) / 100 --  Fake having decimals, just for testing
  5.   data[i] = floor(y * 100)
  6. end
  7.  
  8. local compressed = libC:Compress(table.concat(data, "~"))
  9. local decompressed = libC:Decompress(compressed)
  10.  
  11. wipe(data)
  12.  
  13. local num = 0
  14. for y in decompressed:gmatch("(%d+)~?") do
  15.   num = num + 1
  16.   data[num] = y / 100
  17. end
  18.  
  19. print(#compressed, #decompressed)

This seems to compress it pretty well. Generally it was compressing a string of 10k table entries from around a length of 49k to a length of 21k, which is about 42% of the original. If you're wondering what all the dividing and multiplying by 100 is, I'm just removing the period for the decimals, because it just wastes space.

I haven't done any profiling tests, but I'm guessing this method is pretty inefficient. Of course, in practice, these are two different functions, one for compressing and saving the data, and one for decompressing and loading it, so they don't happen at the same time, unlike my test. However, I would certainly like to be able to cut out the necessity of doing it in strings, if possible. I know very little about bitfields, but as far as I can tell, it should be possible to skip the string conversion.

Thanks for pointing out the library, Dridzt, this gives me much more saved variable space to work with. If anyone has any ways to improve my method with LibCompress, please let me know. Also if anyone knows of a relatively simple way to compress the data more directly, I would absolutely love to hear it, but this is still good.

EDIT: I did a few basic efficiency tests, and on my computer, which is moderately good, it takes about 1.1 seconds per 100k table entries to compress and decompress. The compression portion seems to be about 70% of that time. I'll probably want to try and do the compression steadily in chunks to avoid causing any FPS drop.

Last edited by Elstarin : 10-31-15 at 12:36 PM.
  Reply With Quote
10-31-15, 12:47 PM   #4
semlar
A Pyroguard Emberseer
 
semlar's Avatar
AddOn Author - Click to view addons
Join Date: Sep 2007
Posts: 1,059
I would personally just interpret the data and generate a summary of it, as it's unlikely the user needs to keep thousands of separate data points for a graph.

If you truly need to keep all of that information and there's no way to simplify it, I'm not sure compressing it is going to be worth the trade-off of slowing access time to save space, which isn't particularly limited.
  Reply With Quote
10-31-15, 01:43 PM   #5
Elstarin
A Murloc Raider
Join Date: Oct 2015
Posts: 5
Obviously, this is a very contrived example, but were you thinking something more or less along these lines?

Lua Code:
  1. local newData = {}
  2. for i = 1, #data do
  3.   -- Mathy stuff here to get the angle of this point relative to data[i - 1] and data[i + 1]
  4.   -- Figure out if the current data entry can be safely discarded or not
  5.   -- If angle is basically the same, like it would be between the points 20, 30, and 40,
  6.   -- then skip adding the one for 30, it isn't necessary
  7.  
  8.   if pointIsNecessary then
  9.     newData[#newData + 1] = data[i]
  10.   end
  11. end
  12.  
  13. data = newData

I had thought about this before, and decided it wouldn't work in my system, but since then I have changed how the graph data gets read and I probably can do this with minimal rewriting. I just never came back to this idea...

Oh, and to be clear, I know nobody can help me actually write the calculations to determine if the point is necessary or not, because I haven't shown any of my graph's code. I'm not worried about that part, I'm sure I can figure it all out, and I don't like asking people to spend their time on my projects. I just want to make sure this is actually the type of thing you mean, Semlar.

Anyway, I guess I was completely on the wrong track when thinking about using a compression system. It seems kind of obvious now that Semlar pointed it out... Oh well. Thanks for the help!

Last edited by Elstarin : 10-31-15 at 01:51 PM.
  Reply With Quote
10-31-15, 02:03 PM   #6
semlar
A Pyroguard Emberseer
 
semlar's Avatar
AddOn Author - Click to view addons
Join Date: Sep 2007
Posts: 1,059
If you don't need to show frequent spikes in your graph you can use a smoothing algorithm to reduce the noise and show something closer to a moving average.

What kind of filtering you use depends on the type of data and how close your graph should be to the original values.

You can store the initial points the same way you're doing it now, and then apply your filter on the whole set of data when it's complete to reduce what you need to keep, rather than try and determine whether you should add a point based on the previous entry since it really needs to be processed as a whole.

You might take a look at something like the Ramer–Douglas–Peucker algorithm if you want to keep the same general shape of the graph while reducing the number of points required.

Last edited by semlar : 10-31-15 at 02:12 PM.
  Reply With Quote
10-31-15, 04:13 PM   #7
Elstarin
A Murloc Raider
Join Date: Oct 2015
Posts: 5
I do use a type of smoothing when I actually draw the graph so that each visible graph only uses a maximum of around 300 - 500 textures even if it's drawing from a table of thousands of points. However, my method of ignoring points in that is fairly arbitrary, and I'm not happy with it.

I had never heard of that algorithm from the wiki, but it looks really interesting. I think I'll be able to apply it to my data table when I am saving the graph without too much trouble. Making it work with a graph that's currently active may be slightly more complicated, but I'm pretty sure I'll be able to work that out.

I'm not familiar with the language used in the code example for the algorithm (it's Java, right?), but I found a Lua version of it, so I'll just try to mold that into my code.

Thanks for the help! If I can make this work well, it will make my graphs dramatically more efficient and the storage space won't be an issue at all.
  Reply With Quote
11-01-15, 02:36 PM   #8
Elstarin
A Murloc Raider
Join Date: Oct 2015
Posts: 5
After playing around with it for a while, here's what I have come up with:

Lua Code:
  1. local badPoints = {}
  2. local newData = {}
  3. local startPoint, stopPoint
  4. local function smoothingAlgorithm(data, first, last, tolerance, callback)  
  5.   if not callback then -- True the first time it is called, not when it calls itself
  6.     wipe(badPoints)
  7.    
  8.     startPoint, stopPoint = first, last -- When it matches these values again, it *should* be done running
  9.   end
  10.  
  11.   local maxD = 0
  12.   local farthestIndex = 0
  13.  
  14.   for i = first, last do
  15.     local x1, y1 = data[i], data[-i]
  16.     local x2, y2 = data[first], data[-first]
  17.     local x3, y3 = data[last], data[-last]
  18.    
  19.     local area = abs(0.5 * (x2 * y3 + x3 * y1 + x1 * y2 - x3 * y2 - x1 * y3 - x2 * y1)) -- Get area of triangle
  20.     local bottom = sqrt((x2 - x3) ^ 2 + (y2 - y3) ^ 2) -- Calculates the length of the bottom edge
  21.     local distance = area / bottom -- This is the triangle's height, which is also the distance found
  22.    
  23.     if distance > maxD then
  24.       maxD = distance
  25.       farthestIndex = i
  26.     end
  27.   end
  28.  
  29.   if maxD > tolerance and farthestIndex ~= 1 then
  30.     badPoints[farthestIndex] = true -- Flagged for removal
  31.  
  32.     smoothingAlgorithm(data, first, farthestIndex, tolerance, true)
  33.     smoothingAlgorithm(data, farthestIndex, last, tolerance, true)
  34.   end
  35.  
  36.   if first == startPoint and last == stopPoint then -- Should mean it's done running
  37.     wipe(newData)
  38.    
  39.     for i = startPoint, stopPoint do
  40.       if not badPoints[i] then -- The point was not flagged to be removed
  41.         local num = #newData + 1
  42.         newData[num] = data[i]
  43.         newData[-num] = data[-i]
  44.       end
  45.     end
  46.    
  47.     for i = startPoint, stopPoint do
  48.       if newData[i] then -- Reconstruct the data table
  49.         data[i] = newData[i]
  50.         data[-i] = newData[-i]
  51.       elseif data[i] then -- I assume this is cheaper than calling wipe(data) first, since I'm already doing the loop anyway
  52.         data[i] = nil
  53.         data[-i] = nil
  54.       end
  55.     end
  56.   end
  57. end

In my graph data tables I store the X values as positive, and the Y as negative.

This function is moderately expensive to run, but I did a bunch of profiling tests and it wasn't terrible. It shouldn't cause any FPS loss, as long as I'm careful. I feel like there must be a much better way to handle the tables at the end, when it's done running. I hate looping through it twice, but at the moment I can't see a better way to do it while keeping my data table indexed, without making a throwaway table. A lot of my addon is currently built around the data table being indexed, and it would be a big pain to change that.

I haven't done a ton of practical testing of it yet, but from what I have seen, it seems to be doing a good job. My current problem is that I cannot find any way to call it gradually while a graph is active (new data points are being added), but for now I can keep using my old smoothing method on active graphs and use this when I save them. It isn't ideal, but I doubt any users would notice the difference anyway...

I'm always happy to have the flaws in my code pointed out, so if anyone sees a problem with it or a way it could be improved, I'd love to hear it.

The source code that this is more or less based on can be found here: https://quangnle.wordpress.com/2012/...ts-of-a-curve/
  Reply With Quote

WoWInterface » Developer Discussions » Lua/XML Help » Data Compression with Bitfields

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off