We normally hear there’s a problem with a game, wait for it to be fixed, learn it’s been fixed then get on with our lives, none of us the wiser as to what actually just happened. So it’s cool to see Bungie taking a step back and filling everyone in on how a bug gets fixed.
When Destiny 2 had to be taken offline this week (the second time this has happened in a month), with some players having lost valuable currency and items, it was a big deal. One that we now know can be traced back to a previous fix that affected the game’s inventory system.
From Bungie’s report of the issue:
Several months ago, players reported that quest log sorting wasn’t working properly, and we wanted to fix that. The team investigated and found that the clean-up process was resetting the timestamp on a subset of quests, which was breaking chronological sorting. We decided to fix this by disabling the timestamp-resetting behaviour for quests. That fix was conceptually reasonable but, through subtle side effects, it ended up disabling too much of the clean-up process. The net result was that the game calculated the wrong cap quantity for stacked items (such as currencies and materials), which caused items above the cap to be lost. We knew this code was critical and, per our typical process, we had two domain experts provide code reviews for the change – but sadly, we didn’t spot the bug.
A few days later, our internal test teams caught this issue. However, we incorrectly concluded that it was caused by a tooling failure with debug workflows we use for testing, and not an actual bug within the game. Having dodged all our diligence, the issue went live in 2.7.1.
As for why the issue reappeared this week, Bungie say that’s because a server reset wiped the progress they’d previously made:
Fast forward again to today, February 11th, when we rolled out the 188.8.131.52 update coinciding with the launch of Crimson Days. After launch, some of the WorldServers once again crashed on startup because of a high volume of servers starting simultaneously. Once again we manually restarted those servers and thought everything was fine. We were wrong.
Unbeknownst to us, this crash resulted in those WorldServers not applying the previous character data corruption fix. This meant that a small percentage of WorldServers were running the old code and the bug that was corrupting character data.
They also go into detail as to how they’re going to try and avoid mistakes like this happening again, like adding “further safeguards to our process for “hot-patching” our servers to ensure that they cannot start with an unexpected version”.
It’s a really interesting read, for both technical and PR reasons, and anyone who’s played Destiny – or just wondered what goes into this process beyond “bug magically appears and is then magically fixed” can check it out here.