Saturday, November 24, 2018

Girls' Frontline Statistics

Putting a bit of "everything else" in the blog. This post is about a mobile game and applied statistics.
There will be absolutely no mentions of touhou (other than this one) in the post. Estimated time to read: 15 minutes.

Part 1, introduction.
Traditional gacha mobile games have a simple core mechanic and a number of units that participate in it. Several units are given to the player as starting units, some more can be farmed from maps, but the majority of most desirable, stronger units, come from the gacha - an RNG-powered slot machine that eats game premium currency (real money disguised as another game parameter in a pretty successful psychological trick to mask true monetary cost of player addiction). As the player progresses further in game gacha units become effectively mandatory - only the most hardcore players can do the "no gacha" challenge. Then there is also the collection aspect...

Girls' Frontline is what I'd call a second generation gacha mobile game. The trend started with Kancolle, if I had to guess. Other than that, the list includes Azur Lane, Cuisine Dimension (no english version yet) and maybe some other smaller titles. Their (our?) gacha is different - all unit and equipment crafts utilize resources that can be freely farmed in-game. Premium currency is still there, the game is not a charity after all, but it can be spent either to expand your infrastructure (free handouts can cover that fairly well), to roll cosmetics gacha with various costumes and decorations that are not necessary "to win" or to buy resources at ridiculously high implied time-to-money conversion rate[1]. There are still whales as gacha rates for rare cosmetics items are rather low (they wouldn't be rare otherwise) and the game seems to be doing fine profit-wise. Overall such games feel much more "fair" (for whatever subconscious definition of fairness) for both f2p and paying players, that's one of the reasons I picked it up.

The game has 4 types of basic resources and a crafting system which can be fed user-defined quantities of each resource and then produce units (dolls, in game vernacular) or equipment. Results are random, but there are certain consistent patterns that players can notice if they experiment with various resource amounts and ratios a lot or just read and confirm experiment results of earlier players. Game designers/admins give no information about the effects recipes (resource combinations) have on construction results, making figuring out best recipes another interesting part of game experience. Here "best" mean ones that give highest rates either for dolls/equipment of highest rarity or specific rare dolls, because in the process of hunting for rares you will naturally produce more than enough of common ones.

Typical consistent patterns mentioned earlier are hard cutoffs, which guarantee that you will exclude certain categories of stuff from crafting if you put in amounts of certain resources below or above some threshold. For more details, you may visit a calculator, such as this one. More interesting question is what kind of effect does varying amounts of resources within these bounds have on individual item appearance rates. Theoretically game designers could have made rules extraordinarily complicated, but I suspect that realistically there are only 3 possibilities: we need to find out whether crafting probability landscape[2] is spiky, smooth or flat.

Amount of crafts a single player can do with their limited resources is woefully inadequate to obtain reasonably precise estimates. Manually collecting results from several players is a logistics nightmare and a futile exersize in filtering out deliberate misinformation and accidental logging errors. If that was all we had I'd give up day one. However, fortunately for the curious types, developers left a small hole to slither through. The game features a PRODUCTION LOG, which shows recent recipes used by players and what did they obtain. Intercept game's network connection, figure out the API and results format, and then repeatedly query the server and obtain full and complete log of all crafts for you to analyze - the concept is obvious and trivial. The devil is, as always, in the details. It's a small project that can take some time to get right. It requires you do sit down and actually do stuff. Instead of that, how about some instant gratification in the game?

That's how I thought, putting off interesting in favor of easy, until one day of 16 October 2018. In the course of yet another game maintenance the game client was updated to the new and improved version. These improvements consisted of additional list filtering options (could live without, but fairly useful), slight interface reshuffle (newer one looks like a hack-job by someone with zero experience in UI/UX, but the changes are relatively minor, so oh well), some new game elements (recovery of the last copy of retired dolls - rather useful... for retards who do things first and maybe think later) and absolutely inacceptable and completely horrible interface delays in reaction to user input. Old interface in this regard was somewhat crappy, but bearable, especially as delays were masked by the inevitable for the online game network requests to the game server. Having to wait half a second because game needs to roundtrip to the AWS and sync state with the central DB is one thing. Having to wait three seconds while the game struggles to show you an empty list menu with no network activity in the background is something else entirely. Maybe in the age of websites which spike usage of multiGHz multicore CPUs when scrolling a fucking page with text in the browser and even then sometimes cannot keep up with display refresh this is considered acceptable performance by the general public. For someone with at least half a brain, however, ridiculous amount of bloat happening behind the scenes that leads to this, is shocking, to say the least.

In short, incompetent monkey developers fucked up the client and then incompetent testers, if they exist at all, failed to catch this performance regression. The client was shipped and now everyone including me is stuck with it[3]. Public relations guys tried to "address the issue", even organized some questionnaires to check if only some part of the userbase was affected and I'm still not sure whether it is so, given how many active players are there (more than 50000) and how few complained (50-100 or so). One month later it looks like they didn't even start fixing the bug. The end result for me was that leveling efficiency of my dolls dropped 30% because every screen transition was taking so much longer, enjoyment dropped even more because nobody likes being fed horseshit and I stopped actively playing the game as intended and started playing with the game. I poked it a little, found it was written in C#/Unity with hotpatches in Lua, avoided cracking protection on encrypted main dll by finding someone's deobfuscated version, decompiled and located the relevant parts and made a scraper. After fixing some stupid url copypaste errors it even worked. Then I found out the server was sharded and spent a couple more evenings rewriting the scraper into the multithreaded version. Half a day before 01 Nov 2018 I started full capture and it has been running ever since on my PC. If I had another host I'd add fault-tolerance, but so far I only lost an hour of data due to power outage.

Part 2, gathering data.
It's funny, but developers who wrote the backend part were actually sane. Results are sent to the client as JSON, so I didn't need to bother with pulling custom parser out of game's insides. Result is an array of crafts with various information: user name, user level, item id which is converted into item name and picture by the game, mp/ammo/mre/part (4 resources) input, "crafting level" (relevant for heavy construction) and amount of crafting contracts and cores spent (which is strictly derived from crafting level, so I don't know why they put it here). All of that is good already, but developers were super generous and added some extras which don't appear in the game UI, but help us immensely (I'll show just a bit later why): crafting time as unix epoch, item unique id in the shard DB, user UID, dev_type (again, parameter derived from crafting level) and build slot (we have 2 by default and can add more, up to 8).

It is not enough to capture some data then throw it at statistics and expect informative results. As the saying goes, Garbage In = Garbage Out. You need to understand what you're doing to not do something silly without realizing. My capture idea was not new. The game english server opened in May 2018, while the chinese one was running for more than two years already, then there are also taiwanese, hong kong and korean versions (and japanese, but it's fairly new). Chinese users also made a scraper for their game version, then captured results and published aggregate statistics. Their website shows that the capture process was stopped for some reason on 2018-09-22, but results up to that time are still available. Whether they are usable and to what extent is another question.

First completely obvious problem with them that was noticed by several posters in various places is that new dolls are regularly added to the production pool, but their pages display only one set of aggregate results that shows no signs of being reset. Since sum of all craft probabilities must add up to 1, as you always get something and always get only one thing each craft, something is happening to the probabilities of old dolls, as new dolls dilute the pool. If you just concatenate old and new results you'll end with blurred garbage. Then there's an issue with rate-ups. Sometimes events happen which boost craft rates of particular rarities or particular items. That site also doesn't explain anything about it. Maybe internally they did the good thing and threw it all out (instead of doing the best thing of capturing and presenting it separately), but seeing how they dealt with doll addition... does not instill confidence. And mixing rate-up with normal rate is not a trivial uniform low-intensity noise: if, during rate-up, some obscure recipe is crafted disproportionately many times, it will continue to appear alongside more popular recipes, with the perceived [non-existent during normal times] rate that can approach boosted rate during rate-up, whatever it is.

Back to the raw data. If you look at one capture result you will notice something interesting (since you don't actually have it you'll have to trust me or make your own scraper). Item unique ids are continuously decreasing, well, almost (dev_time resolution is 1 second and if several results happen in the same second they are in reverse order instead - either an oversight or with this sort order queries executed faster or nobody cared). At first. If you go down the list holes in item unique ids start to appear. I could have missed this effect if developers didn't include item unique ids and dev_time's and I didn't do sanity checks on results. "Who cares about holes?", you might say. "We just got less data from the server, we'll compensate with more time spent gathering". Not so fast. If you're curious, try to guess what could be wrong with this data and I'll carry on typing this sentence for a while, so that speed readers have enough time to stop reading, look away from the answer on the next line, collect their thoughts and think for themselves.

Ok, here's the explanation. As you go further and further back in time according to a single log file, there are more and more results missing. Don't know about you, but my first thought was that the server was returning last N results from its current table of users' items. In other words, to save space and, much more importantly, cpu time of the DB server(s), once you retire something, it is erased from the table. Then it no longer appears in the last craft list. Oh, could it be that common items get retired relatively more often than rares? I looked at the equipment list to confirm - rows at the bottom of the list were almost exclusively top 5* rarity. If you used only one such list for your estimates you can guess how skewed results could become.

In normal conditions actually that's not a very big deal. Inside the client there is a switch that allows it to resend request for fresh "last craft" list after more than 1 hour elapsed since the last request, otherwise it uses local cache (although you can always force refresh by restarting the game). Outside, the API endpoint caches list result on the server side for about 10 minutes (I should probably test it's really per-shard cache, not per-user, but common sense suggests it is). So, just ask for results every 3-5 minutes, get 100 or so new dolls (typical craft rate for the shard) in the list out of roughly 1000 (typical list length, there are no switches to ask for more or less) once it refreshes, append to your local list, wait, repeat. Now you're capturing everything you can without insider access. In doing so I achieved >0.99 capture ratio and it's possible to tell that only thanks to provided item unique ids.

Wait, "in normal conditions"? Yeah, there has been one exception so far. Care to guess?
These are your crafts [left part]. This is rate-up [maintenance pause in the middle]. These are your crafts during rate-up [right part].

During rate-up, quite predictably, craft rate goes up too. For example, during the last rate-up of 13-15 Nov 2018 (which included Contender,Spitfire,Zas M21,Ribeyrolles,AEK-999) craft rate was 2 times higher than average during the second day and up to 30 times higher than average during the first hours of the first day. Many of those rapid-crafters run out of free space quickly, as they consume tens or hundreds of crafting contracts, so they have retire majority of obtained dolls immediately after getting them. On the chart blue is the visible craft rate averaged over 1-minute intervals based on count of reported crafts, red is the true craft rate averaged over 1-minute intervals based on delta in unique doll ids. The true rate should be the upper envelope of the visible rate, but as you can see in the beginning server reports can't keep up even during the very recent (relative to cache refresh) parts. All of this means that, if you look at the zoomed-in graph once again, at best only less than 1 minute out of 10 from each request is usable and first several hours should be discarded completely. Fortunately, this only affects estimates of things during the rate-up boost. Individual recipe/item rates outside of rate-up can still be derived from long periods of calm normal crafting.

Part 3, analyzing data.
To be added later.

Part 4, EN GFDB.
I'm still not certain if I want to release all these results as a public continuously updated website, similar to chinese GFDB. On one hand, letting the effort rot locally would certainly be a waste. On the other, all user-driven activity about and around the game is ultimately a free gift to the company, as it increases user engagement (therefore revenue). Given their shitty treatment of users (see the part about lag bug) right now I don't feel like gifting them anything. In case they release an update that COMPLETELY eliminates lags introduced by the 2.0221_268 client, I might change my mind, if it happens before I lose interest in the game.

[1] Kids' grade dimensional analysis tells us that [USD/time] = [USD/gem] * [gem/resource] * [resource/time]. Resetting logistics takes, say, 20 seconds per logistic, and gives ~ 350-500 weighted resource (with up to 1.3 multiplier for great success). Current special ("discounted") offer is 9k mp/ammo/mre or 3k parts for 480 gems. Gems cost 80 gems/USD (without first purchase bonus, since we're talking about whales). All of this translates into equivalent rate of about 40-80 USD/hour, which is surprisingly less than I expected. If you earn more than that then it would be cheaper to buy resources directly, rather than resend logistics, under the assumption that purchase happens instantaneously (it doesn't). One advantage to buying resources is that you can perform a lot of it, while you run out of logistics to send very quickly.

[2] Four input resources may be visualized as coordinates in 4-dimensional space, and true item probabilities (you aren't seriously expecting anything more than Bernoulli process here, are you?), which we attempt to estimate, as scalar fields inside the hypercube.

[3] Technically I could install old client and MITM the connection, pretending to the client that it was talking to the older server version and pretending to the server that it was talking to the newer client version. In practice I'd risk being banned, or worse, open a whole new can of my own personal bugs because of that.

1 comment:

  1. That was an incredibly interesting read, thanks for the post. Looking forward for the data analysis.
    And yeah, ever since the update playing on my phone is a chore, although I haven't noticed problems on Nox.