3.4 million concurrent players.
Last weekend, Fortnite passed a milestone, and more. They reached a concurrent playerbase of 3.4 million, completely passing PUBG’s 3.2 million. But no victory comes without a cons.
That weekend, due to the number of players, Fortnite’s servers took a beating. From their postmortem (here, if you want to read it in full):
The extreme load caused 6 different incidents between Saturday and Sunday, with a mix of partial and total service disruptions to Fortnite.
MCP database latency
Fortnite has a service called MCP (remember the Tron nemesis?) which players contact in order to retrieve game profiles, statistics, items, matchmaking info and more. It’s backed by several sets of databases used to persistently store this data. The Fortnite game service is our largest database to date.
[…]At peak we see an issue where the matchmaking shard begins queuing writes waiting on available writer resources. This can cause db update times to spike in the 40k+ ms range per operation causing MCP threads to block. Players experience unusually long wait times not just attempting to matchmake, but with all operations. We have investigated this in detail and it is currently unclear to us and support why our writes are being queued in this way but we are working towards a root cause.
In addition to MCP problems, they also had a slew of other outages, including one with XMPP:
Here is a quick summary of the incident:
- Timeline:
- 2018-02-04 22:00 UTC – 2018-02-05 00:15 UTC
- Root cause:
- Friends Service internal load balancer – the one on critical path for XMPP – got overloaded and pushed into an error state.
- ELB could not quickly recover due to specifics of failover process and outdated network configuration – ELB subnet was short on free IPs to provision replacement.
- Incident Details:
- Due to a recently introduced memory leak, XMPP was on a monitored path to falling into unstable state.
- We planned to replace it with another ready and stand-by cluster with the leak already fixed.
- We expected cluster to survive through weekend, so that we could schedule a proper maintenance during working week days.
- Unfortunately, Game Services and Account Service instability significantly increased the effect of the leak. And at 22:00 UTC on Sunday we started losing cluster nodes and disconnecting players.
- A decision was made to immediately failover to a stand-by cluster via green/blue deployment strategy, when we instantly flip all the traffic to another set of endpoints.
- Unfortunately, landrush of reconnecting people at the time has effectively killed one of the Friends Service load balancers paralyzing our ability to setup presence flow on new connections.
- Impact:
- As a result, though people did actually connect to XMPP, the UI showed everyone as offline due to missing presence flow.
- Effectively, a “dark room” situation.
- Next steps:
- We’re in a process of upgrading load-balancer solution for Friends Service and other platform services to address issues like above.
- We’re fixing our VPC configuration to ensure subnet capacity.
- There are also longer-term problems team is actively working on. For example with current architecture XMPP cluster represents a full mesh. Each cluster node is connected to each other. With 10 connections between each node and 101 nodes in cluster it effectively spends 1k sockets per node just on cluster connections.
- Each XMPP node can hold only up to N connections with current solution. Hence there is a theoretical limit on optimal number of cluster nodes (and hence CCU capacity) we can maintain without solution redesign.
They are, however, still grateful to their players:
It’s been an amazing and exhilarating experience to grow Fortnite from our previous peak of 60K concurrent players to 3.4M in just a few months, making it perhaps the biggest PC/console game in the world! All of this has been accomplished in just a few months by a small team of veteran online developers — and we’d love to welcome a few more folks like yourself to join Epic Games on this journey!
Fortnite Battle Royale is currently free to play. Hopefully, all of the problems they’ve been experiencing will be ironed out soon.