Saturday, April 13, 2019

Girls' Frontline statistics, parts of the missing chapter.

Alright, you primitive screwheads, listen up. See this? gf-db.github.io This is your new GFDB.

Well, to be fair, it's somewhat unfinished, but better than nothing, right?

There is one more thing that complicates straightforward analysis of rates during various periods that I didn't mention in the previous post. "dev_time" parameter is the time you take the item (doll, equip or fairy) out of production. However, since it is known in advance what will be produced, because most items are 1:1 mapped to their production times, probability roll happens when you start the construction. Thus some rolls will leak: they will be produced using rates of one crafting period, but will be taken out and appear in some subsequent period. You can exclude such crafts at the cost of slightly reducing overall amount of samples: for every user for every their build slot discard first craft in every production period. The effect we're talking about here is not that big: if you run dumb query that doesn't account for it, you introduce bias in estimated rates proportional to the difference between true period rates divided by the average amount of crafts per user during second period.

Yet another effect to look out for is regression towards the mean. Suppose you took the list of recipes for any particular item and arranged it in descending order by mean value of item probability estimate, then noted top ones. If you came back later after more data was gathered more likely than not you would notice that all supposedly "good" recipes' mean values fell. This does not mean that fresh new recipes get "originality boost" or something like that. Since by their very nature results are random in any distribution you will get recipes that perform both better and worse than their true rate on a small sample size and with more data all these fluctuations will smooth out.

A sidenote: there are times when one could begin to wonder if all the hate of microsoft products is really warranted and not a mindless parroting of another camp's fanboys. And then usually something like this happens: I needed to add regular statistic updates for the site. Should be easy, right? Since I run the capture on my main machine, and it has to run win7 because games... open Task Scheduler, create a task, done? Well, I did that and for some reason, call it intuition or curiosity, added timestamp logging, to see how much time updates take. When launched manually 1 hour of files is typically added in 2 seconds and the process to recreate statistics takes about 20 seconds, which is a lot and is a result of unoptimized queries/indices doing full scan on a gigabyte-sized database and the fact that I wrote it once and then used to run it every other month, but that can be easily fixed and is not the point now. So, testing showed that the data was added, generated, so I left it alone. And then after about half a day I peeked at the log and did "O_O" face when I saw update times of 20-40 minutes (?!?!?!1). After a quick... ducking? (like googling, but without the privacy invasion part) I found out that (in increasing order of retardation):

  • all processes launched from windows task scheduler run with lower CPU, I/O and memory priority
  • there is no setting in the GUI to change it
  • there is a way to change it that involves exporting the job to XML, editing "priority" variable, and importing it back, but
  • this variable affects both CPU and I/O priority together and there is no way to set memory priority at all
And even if we adjust CPU+IO priority to that of an interactively-launched process and leave memory alone, because it should only affect how soon our process' data gets evicted from the page cache under memory pressure, the updates still take 40-60 seconds. Why does that happen on an idle machine is still a mystery to me. So yeah.

Suppose you have a nearly complete log of crafts with all the data I described in the previous post. What interesting/useful information can you gain from it? Let's see:

  • A long list of real user-supplied names; more than 175k entries at the moment of this post.
  • User level as a function of time with 1 day or better resolution for each player (assuming they do daily crafting). Typical/fastest/slowest growth rate, average time it takes to levelup at level x for entire server population.
  • History of all name changes. For example, there are 3 users who changed their name twice and 236 users who changed their name once at the moment.
  • Ability to do username-to-id lookups for dorm-visiting or other purposes.
  • Item/class/total craft rates as a function of time. Popularity of various recipes.

[I was supposed to put cool-looking charts here, but then got lazy. I might revisit them if anyone is interested.]

Chinese-quality code found in game (I mean, other than UI "responsiveness"):

  • Client-side name validation. Your name cannot have any of these strings as a substring (case-insensitive comparison): "delete", "drop", "truncate", "set", "database", "table", "field", "alter", "select", "update", "insert". I don't know whether to laugh or cry here.
  • On "Combat" page accessible from the start menu if you rapidly switch between any 2 selector buttons (Combat mission, Logistic support, Combat simulation) game will reliably crash.
  • Not joking, happened to me once: the game crashed when I opened Formation window and started rapidly removing dolls from an echelon one by one. When I relaunched it I found myself in the defense drill battle. Lost 5 extra energy to this.
  • When moving from dorm to dorm using the Next button right after you press it everything starts looking pretty crappy, especially noticable on the condenser and wall portraits. No wonder why - they take a screenshot of the scene to slide it and for some reason (nobody gave a shit, that's the reason) it is temporarily saved as jpeg with high compression, so all small details are blurred and artifacts are scattered all over. If you're playing on a phone it might be less noticable because angular pixel size is smaller for typical viewing distances.

Interesting observations:

  • There are no "Oath" method/variable names in the game code, only "Marry" / "Wedding" ones.
  • Christmas bean bag chair furniture piece can be used as a bed, but only by G11.
  • I went and plotted 5* SG rates from the most popular recipe over time. On 2019-02-26 rates effectively tripled.
  • On 2019-03-05 (after the update) development log update frequency was increased from once every 10 minutes to once every 5 minutes, except last 4 hours of each server day where it was decreased to once every 40 minutes. On 2019-04-02 (after the update) they rolled it back.
  • On the chart image in the previous statistics post perceptive readers would notice a small bump during the maintenance. It is not an artifact. Looking at the database these are crafts by a single user named "Hiden", apparently as tests on the production server. You may visit this tester-kun at UID 4422 and say hello (not so hiden now, eh?).

I trust you, reader, not to be a retard, who ignores everyting I've written and rushes to craft recipes with the highest average ignoring low epoch capture ratio and/or high stdev due to low amount of crafts.
However, even if you are one, then no significant harm is done, because the landscape looks pretty smooth to me, so the most impact you can have on your results is to simply do more crafts while keeping resource efficiency in mind.

6 comments:

  1. Thanks for the write-up, it's a very interesting read.
    Here's hoping the new client coming soon ends those lag issues already.

    ReplyDelete
    Replies
    1. Don't know how everyone else who was affected dealt with lags, but, since it forced me to discover Leapdroid, for me it was a temporary annoyance followed by permanent improvement (playing on PC).
      Just hoping that new client doesn't ruin everything again, I'm out of fallback options now.

      Delete
    2. I'm having mixed reaction to the new client.
      On one hand, there are definite improvements: multiturn planning, fairy reclassification, fixed crash in the menu, several annoying modal dialogs changed to sliding notifications, dorm size expansion.
      On the other, now entering repair bay takes 2 seconds even on emulator, battles jerk and lag just before the ending summary, dorms look worse with the black bars, first retreat on a map sometimes hangs for a second before displaying confirmation dialog, planning mode fails to continue if an enemy runs into you on SF turn and the client still leaks memory.
      It's almost like their quality control doesn't exist, so developers fix one thing and break another in the process.

      Delete
  2. Pretty interesting article! I was linked this by a friend in discord, and just wanted to mention a couple of things that might be of some relevance to you.

    In case you weren't already aware, production log data only returns a max of 11 results for any one specific result (was it 11? It may have been 12 or 13...it's been quite some months since I've taken a look at it, so I forget the exact number). What this means is that once you hit 11 entries for any particular T-Doll, additional entries for that specific T-Doll gets dropped from the list and it no longer shows up.

    The end result of this is extremely skewed data if you do not take this into consideration - this is actually the main reason that the chinese GF statistics website is considered useless, because it does not account for this and as a result the data is incredibly unreliable. Let's take, as an example, a theoretical situation where you have a craft pool of 4 dolls total - one of each rarity at one 2*, one 3*, one 4*, and one 5*.

    Per the official craft rates given by the developers, the correct rarity breakdown is 60% for a 2*, 27% for a 3*, 10% for a 4*, and 3% for a 5*. This means that given a slice of 50 craft results, you would expect ratios on the level of 30:13:5:2 for each rarity.

    However, because any entry past the 11th gets dropped from the data list, the actual results you will return from a slice of 50 are incredibly different. Each specific T-Doll is capped at 11 entries, meaning that even if one T-Doll is far more common than the others, you will only return 11 copies of it from one particular slice. Thus, an example of the data that you would get would be something like this:

    2*
    2*
    2*
    3*
    2*
    2*
    4*
    2*
    2*
    5*
    2*
    3*
    4*
    2*
    2*
    2* (this is the 11th entry - thus, 2*s will no longer show up after this entry)
    3*
    3*
    3*
    3*
    4*
    3*
    4*
    4*
    3*
    3*
    5*
    3*
    3* (this is the 11th entry - thus, 3*s will no longer show up after this entry)
    4*
    4*
    4*
    5*
    4*
    4*
    -- and so on


    As a result, a slice of 50 production results would most likely return a ratio of 11:11:11:11:6, because after it runs out of 2*/3* rarities to show, it can only show 4*/5* rarities afterwards. If you ran the statistics on this slice without knowing of the issue, you would mistakenly conclude that the rates for higher rarity T-Dolls are massively higher than they actually are.

    This is the reason why the chinese GF statistics database shows ridiculous 5* rates for specific recipes - since it pulls production log data without accounting for this, the ratio of high rarity dolls is skewed and you end up with recipes that claim to have a 15-20% 5* rate, even when this is clearly demonstrably false by anyone actually using the recipe.

    This issue is also exacerbated by so-called "voodoo" recipes, where one person will spam the same unique recipe a couple hundred times in a row - when doing so, that person passes 11 copies of a specific T-Doll in a short amount of time, and thus all of the common T-Doll results are dropped out of the data. When the data scraper then proceeds to pull production log data, it will detect this user and thus conclude that the "voodoo" recipe must have a disproportionately high success rate, when in actuality it's simply skewed data from overloading the production log with high craft amounts in a short duration.

    Anyway, this is fairly simple to account for - when you are data collecting, you either need to confirm that no results have been dropped from the data set by ensuring that each data update contains no more than 11 new entries of any specific T-Doll (very difficult to do given the update frequency of the refresh log, as high-activity periods will overload the production log), or by only collecting data up until it contains 11 copies of any specific T-Doll (though this has the side effect of not being able to collect much data at a time, making large data sets harder to get and data collection slower).

    ReplyDelete
    Replies
    1. Whoops, a quick check ingame shows that it's actually 10 results now, not 11! (I may either have been remembering incorrectly, or they may have changed it at some point)

      Delete
    2. >In case you weren't already aware
      I wasn't aware. In fact it never occurred to me to check amounts of result ids in one log, that'd be instantly noticable. And indeed, this is how the length each log is decided - they return at most 10 most recent results for every unit (doll/equip/fairy) type except 3-5* equips, which get 20 [since the fairy update]. Sometimes there are less, due to an additional temporal cutoff maybe.
      As I noted in the previous post most of the time it's not a problem, craft rates aren't that high outside of rateups. After the update on 2019-03-05 the situation got slightly worse: they disconnect the cache from the live database at around Kalina daily bonus time (04:00 in server UTC-8 timezone), likely to do some scheduled automatic DB maintenance - it is visible (for me) through log updates which stop refreshing and return the same results for about 45 minutes. Even then I still get CR [capture ratio] > 0.98 for dolls. During the rateups... yeah, it's pretty bad (again, see the previous post's chart). I didn't get around to doing something about it yet, but at least each epoch has CR values attached so you can see the degree to which results could be trusted.

      Delete