09-26-18, 10:56 AM | #1 |
oUF performance
Hello,
I'm finding that oUF this expansion has been a huge cpu hog compared to other expansions. I think in large part this is due to the way that blizzard has changed it's lua implementation and how we need to fetch Auras and CombatLog entries now. However I'm wondering if there are plans to implement performance improvements into the oUF core? I'd be happy to submit merge requests on the official github in this effort, but I'm not sure if that steps on toes or if those are largely ignored with such a large userbase. I'll put some performance notes below that I see oUF having problems with. Things oUF could do better for much faster function cycles: * Localizing common functions in each script. Things like UnitAura / UnitBuff / UnitDebuff / UnitReaction / UnitThreatSituation and so on. There are easily 100 functions that could be localized and localized function references are a minimum of 30% faster (i've profiled in some cases up to 300% faster) * Localizing variables outside of for loops, also a massive performance increase * Creating tables or table templates(key sizing) outside of loops and just updating their reference inside of loops * Result memoizing - I believe this is possible in WoWs implementation, but storing some function results so that when the same function parameters are given, it simply returns the same result as the last time rather than recalculating. This can be useful for common calls such as UnitName or internal functions that use unique string names as input. `unit` changing frequently might make this difficult to implement on things like nameplates or raid frames. Food for thought. * I've noticed that OnShow can result in all of a frame elements forcing an update, which seems likely unecessary and a huge resource hog for frames that hide and display frequently. * There are also OnUpdate script in a number of default elements which do a lot of calculation that I think should be revisited * Default blizzard addons tend to continue to run even when hidden and their main driver has events unregistered. Once their frames are created they have subevents registered and still seem to be firing an unbelievably high amount. I think when we spawn raid frames, we should DisableAddon on Blizzard_CompactRaidFrames, on nameplates DisableAddon Blizzard_Nameplates, so on so forth. I've been profiling these frames and they are absolutely decimating performance when they should be disabled. I was finding that addons like WAs were holding up > 3s of cpu time over the course of a raid fight, but that just CompactRaidFrame_Unit1 was upwards of 80s. Same goes for nameplates. BFA has been a poorly optimized expansion, Uldir has been a poorly optimized raid, and these days more than ever people are running crappy addons or WAs that eat up a ton of CPU usage. I think oUF can help fix some of that. I have 3 addons that all use oUF and the 3 of them together are starting to get kinda cpu-heavy just from their oUF elements. |
|
09-26-18, 02:33 PM | #2 |
I'm curious, have you tried other layouts to see if this is a problem with them also? I do not fully understand if the issues you are describing are problems with the core code or could it be with something in the layout. Again, I'm not the most knowledgeable, so I'm just asking.
|
|
09-26-18, 03:49 PM | #3 |
I haven't tried other layouts, but I've tried a few things with my own layouts to try and narrow this down. It seems especially bad with the nameplate implementation, even with my layout function returning on line 1 and the nameplate callback function returning on line 1.
I'm also able to see when it's oUF elements vs layout-specific elements because oUF is technically implemented as a separate addon from my 3 addons that use it. So CPU profiling shows it as it's own cpu instance. I think part of the problem may be that the DisableBlizzard functions are not covering what they used to, I'm investigating more when I get home today. |
|
09-26-18, 07:26 PM | #4 | |||||||
However, if it's something major, e.g., the 300% cases you've just mentioned, we, at least I personally, would like to hear about those.
As for Blizz compact raid frames, it's up to layout devs to disable them. I think Blizzard_ArenaUI is the only exception we make, which is shady enough, and we'll prob rework it. Last edited by lightspark : 09-27-18 at 06:14 AM. |
||||||||
09-27-18, 04:02 AM | #5 | ||||||
Optimizations are always welcome, but they should be backed with proper profiling. You're listing a lot of micro optimizations that probably won't have a lot of impact on the bigger picture.
Putting locals outside of loops has a even smaller gain. The only place that would have made sense is in the aura element and I'd rather take the cleaner code over a minor optimization there.
__________________
「貴方は1人じゃないよ」 |
|||||||
09-27-18, 04:41 PM | #6 | |||||
So i'll reference this document here, because it probably has better examples than what i'll list. What i've profiled so far is incomplete on it's own so i'll try and get more profiling done this weekend.
I tested the following calls in seperate 10,000 loops. Depending on the layout, frequent health updates, number of units on screen these calls frequently hit & exceed 10k in a given fight so I thought it would be a good test number. The first number is what they clocked without localizing the API call first, the 2nd is with the api call localized UnitIsConnected 1.775 -> 1.676 UnitExists 1.918 -> 1.826 UnitReaction 4.301 -> 4.254 UnitIsUnit 1.904 -> 1.874 UnitAura 1.925 -> 1.843 UnitIsPlayer 1.657 -> 1.589 UnitIsTapDenied 1.626 -> 1.607 UnitPlayerControlled 1.683 -> 1.596 UnitHealth 4.950 -> 4.872 UnitHealthMax 4.996 -> 4.913 total time: 26.735 total time optimized: 26.050 avg improvement: 2.62% So granted, not large - but keep in mind this was to localize what is already a single reference, no table lookups or anything involved. Just making local UnitHealthMax = UnitHealthMax. I think total time is a really important stat here, but I'll touch back on that. When we make the call include a lookup on a multidimensional table things look a lot different. Let's analyze the the health element since basically every layout uses it. I can't do a 1:1 comparison right now but even just looking at the lookup to unpack reaction color we see a large improvement. Before my profile I set this table: Code:
local parent = {} parent.colors = {} parent.colors.reaction = {} parent.colors.reaction[4] = {.1, .2, .3, 1} Code:
profile("unitreaction_color", function() for i = 1, 100 do local unitreaction = UnitReaction('nameplate1', 'player') local color = unpack(parent.colors.reaction[4]) end end) Code:
profile("optimized_unitreaction_color", function() local unpack, UnitReaction = unpack, UnitReaction local r_table = parent.colors.reaction for i = 1, 100 do local unitreaction = UnitReaction('nameplate1', 'player') local color = unpack(r_table[4]) end end) imrprovement: 13% That table is as simple as it gets. This difference gets more and more pronounced the bigger the table reference is and what else the function does. We unpack colors from the self element in these cases and these self tables can often get really large, especially when layouts use many of the elements available in oUF. I tried unpacking color from my bdCore library table, which is really pretty lean, and that increased the difference to 21%. I'll try and get exact stats on oUF layouts when I get home, right now I don't have an easy way to test. Again with all of the above in mind, I think it's important to note just how often these functions call. Maybe not from just player, target, tot, and pet but when you have raid frame and nameplate layouts then all of these call counts go up drastically.
Take the following code as an example Code:
for i = 1, 1000000 do local a = {} a[1] = 1; a[2] = 2; a[3] = 3 end Code:
for i = 1, 1000000 do local a = {true, true, true} a[1] = 1; a[2] = 2; a[3] = 3 end
If we pass `UnitIsTapDenied(unit)`, UnitIsPlayer(unit) = UnitIsPlayer(unit) and select(2, UnitClass(unit)) or false, and UnitReaction(unit,unit2) then we have a unique set of parameters that always return the same colors. that we could cache and return the next time we call it. I've implemented this on my nameplates because UNIT_THREAT_LIST_UPDATE and UNIT_HEALTH fire so frequently. Memory is far cheaper than processing power, and that is especially true in the case of WoW. It is absolutely worth trading some off. We could further optimize this by storing self.class, self.reaction, self.isplayer and updating those variable on the correct events - but that is definitely cumbersome. It can't be used often though, since the whole job of oUF is to take a bunch of variable data and make it easily usable. But in the case of memory here, we're talking about creating 100kbs of table caches to save hundreds if not thousands of cpu loops.
I'll try and get more profiling numbers this weekend and really dig into some of the FPS problems people are reporting to me. |
||||||
09-28-18, 02:11 AM | #7 | ||
I'm well aware of that document, I've read the whole book back in the day.
Moreover, debugprofilestop returns time in milliseconds. Lua Code:
This takes 581.90662911534ms or ~0.6s on my machine w/ i5-7500. Lua Code:
This takes 332.96345540881ms or ~0.3s. However, in oUF we mainly have this scenario: Lua Code:
This takes ONLY 60.121539920568ms or 0.06s, the lowest I've seen while benching was 0.05s. While this Lua Code:
Takes 59.989032864571ms or 0.06s, given that results' fluctuation is ~0.01s, I think you understand what I'm implying here... I'm still curious about this bit:
Last edited by lightspark : 09-28-18 at 02:21 AM. |
|||
09-28-18, 02:54 AM | #8 | |
Dunno if you're talking about compact raid frames themselves, or various CompactUnitFrame_* functions, if it's the latter, then their high usage numbers often come from Blizz nameplates. Blizz nameplates reuse a lot of their compact unit frame code. Nameplates or their driver will clock high regardless because they're implemented in Lua now, we do disable nameplate health, cast, etc bars, but we don't stop Blizz nameplate driver from doing its job because it's a risky thing to do, I even left a comment in our code that explains the reason why we do it: Lua Code:
On a side note, I'll be adding a way to nuke compact raid frames w/o disabling Blizz raid addons, I'll also rework how we disable arena frames, as I said earlier, the way we do it now is a kinda iffy. Last edited by lightspark : 09-28-18 at 03:09 AM. |
||
09-28-18, 04:42 AM | #9 | ||
For me fully disabling the Blizzard addons Blizzard_CUFProfiles and Blizzard_CompactRaidFrames is the way to go since I have my own raid manager frame for world markers and such. They can be reenabled quite easily too. Why the hassle?
__________________
| Simple is beautiful. | WoWI AddOns | GitHub | Zork (WoW)
|
|||
09-28-18, 05:58 AM | #10 | |
Technically, disabling those two addons is enough, but only if you keep them enabled by default AND you provide an option to toggle them via in-game config, so your addon's users do it themselves. That's what Grid2 does. But in general almost all major UF addons abandoned this approach. For instance, SUF and ElvUI simply disable and hide frames on the fly w/o disabling those two addons. Some addons, e.g., VuhDo and Grid, do nothing at all, it's up to users to figure out how to disable Blizz raid frames via tutorials and whatnot. It's not that easy for your average addon user to reenable them, if it actually was, there wouldn't be numerous threads on this topic. Actually, the more I think about this issue, the less I want to add this raid disabler to oUF. But overall, only SUF/ElvUI approach is good for oUF, because oUF shouldn't leave any traces and affect the UI after it's fully disabled. Last edited by lightspark : 09-28-18 at 06:10 AM. |
||
WoWInterface » Featured Projects » oUF (Otravi Unit Frames) » oUF performance |
«
Previous Thread
|
Next Thread
»
|
Thread Tools | |
Display Modes | |
|
|