Guest Column: On the Overwatch Matchmaker as a Trolley Problem

07.09.2016

It’s sometime before nine AM. I’m hoping to get one quick game of Overwatch in before starting work. I launch the game, get on Quick Play, and the game has me backfill a team. I choose the best class for the team composition, and the game ends in defeat for my team before character models have finished loading.

It’s sometime in the mid-afternoon. I’m due to leave in twenty minutes, but I figure I have time to play some Overwatch before that. I get on a quick play match, pick Lúcio, get on the payload, and stay there. We never stop moving until the match ends a handful of minutes later. I quit Overwatch to do something else while I wait, since I don’t have time for a second game.

Games, as design objects, have failure modes. You can sit down to play a game and have a bad experience that was never intended by the designers. Motion sickness in first-person games, unbalanced multiplayer matches, and so on.

Bad matches don’t require a matchmaker. I’ve played enough Team Fortress 2 to encounter numerous matches that were grossly unbalanced. But, of course, there’s something particularly infuriating about the matchmaker’s role in this. You didn’t choose a server with bad team balance; that server was chosen for you. At one point, one can start to feel like what you do in a match is beside the point; the outcome was decided by the algorithm ahead of time.

Blizzard has grappled with matchmaking for a long time now, starting with PvP gameplay in World of Warcraft. If Overwatch sometimes feels a little bit clumsy, with its too-complex level design and its too-ambitious class structure, one is often tempted to write this off as the result of a studio with vast resources and expertise, but who is new to the genre. But one would think that matchmaking systems are something that Blizzard has to have down; something that they certainly have done before.

And yet, Overwatch matchmaking is a mess at times. The truth of it is that matchmaking is a hard problem, even more so in a game like Overwatch. Optimal results are never going to be guaranteed. But Overwatch is also the latest in a long run of incremental improvements to matchmaking.

In the beginning, we had self-selection; players would pick their own opponents. In Chess, the grandfather of competitive gaming, this was mostly a reputational process. In the server lobbies of Starcraft, this was performed through some combination of trash talking ability and pure randomness. In games with permanent dedicated servers, like Team Fortress 2, players would not so much pick opponents as a server to play on, with individual servers eventually finding some equilibrium of regulars that knew one another.

Arpad Elo, creator of the Elo rating system.

And then, we had Elo. Invented in the mid-20th century as a way of rating Chess players, Elo is a simple statistical model that can be calculated by hand. The key assumption is that if you play against a higher rated player and win, your rating should go up because your “true” rating is probably higher, while the other player’s rating should go down, since they are probably overrated. This sounds sensible, but it’s based on the assumption that the winning player performed at a higher level than the losing player. That’s true of Chess, but doesn’t translate to most digital games that have hidden information, variance, asymmetry, and nonlinear player skill.

Elo as statistically sound enough to tell us that two players with similar ratings should have a doubtful outcome; it told us who the underdog was, and it let us arrange good matches. Elo also had many problems: it was a transparent system that could be gamed by savvy players who turned its assumptions on their head. It encouraged not playing the game to “protect one’s rating.” It was widely misapplied to rate players in games that are not at all like Chess; and in those games, it produced results that were often quite bad.

So, developers of multiplayer games made a bit of an evil pact. Elo, flawed though it is, is a product of a time when computer programming involved physically switching cables around. Anyone with a calculator and some know-how can calculate Elo ratings. But its inadequacy has driven video game developers towards using increasingly opaque methods that are usually proprietary, complex, and subject to constant adjustment. Where before we had a simple equation, we now have algorithms.

Here’s the thing, though: Matchmaking systems, strangely enough, are an ethical issue. Because when they fail, they create a bad experience for someone, sure. But more than that, matchmaking systems inevitably end up prioritizing one player’s experience over another. Someone is always going to be on the bottom end of the skill gradient of a match.

At times, it’s hard not to feel like the matchmaker is press-ganging you into being a jobber for someone else’s fantasy of being a hero. And how do you, as a game designer, manage this? You can wash your hands of the issue and live with the fact that some players will experience frustrating runs of bad matches. Or you can try to compensate somehow and make sure players are not being placed as the underdog in matches too often; but when does that cross over into trying to decide matches yourself or micromanage player’s win rates?

Because of a system designed to weed out problem players, one of the top Widowmaker players ended up having a great deal of trouble finding matches.

Ultimately, the Overwatch matchmaker is tasked with deciding whose fun is more important, and which standards of fun should apply. Back when the “avoid this player” feature was in place, Blizzard found that one of the world’s top Widowmaker players was finding it hard to get a match; people were misusing this feature to steer clear of people who were too close to the skill ceiling.

But who is to say that someone who is so good at the game that they frustrate the players around them is entitled to getting a match in a timely fashion? I’m not saying that isn’t the case, but the counterfactual idea that players have an expectation that they won’t encounter pro-level quality play in pub games doesn’t seem totally unreasonable. Those two preferences can’t be satisfied simultaneously, so someone at Blizzard has to make a decision about which preference is more important. Do we care that the high-skill players can find matches in quick play more than we care about isolating players from frustrating experiences on the other side of that rifle?

We are not used to thinking of game design as zero-sum. We generally assume that we can only make the game better, we can only add enjoyment to it. But when it comes to multiplayer games, those situations arise where one player’s enjoyment is another player’s frustration. We don’t have an ethical calculus that tells us that making one player wait for one extra minute is worse than making six players experience a game skewed by a single player operating near the skill ceiling of their class. And yet, an answer has to be provided one way or another.

Those are complex questions without direct answers. They’re also not terribly important. Ultimately, the consequence of failure when designing a matchmaking system is that someone has a bad, frustrating time with your game. This is pretty good compared to the consequences of failure when designing, say, a car or a skyscraper. Nobody is going to die because Overwatch’s matchmaker is sometimes stupid. It’s a toy problem, literally, one where the stakes are defined in notional units of fun. But the wonderful thing about toy problems is that they’re clarifying; that it’s easier to talk them through when there’s not the pressure of life or death hanging over the conversation.

Important or not, someone at Blizzard has to answer those questions. In the case of “avoid this player”, they chose that individual high-skill players deserve to get a match quickly even if it’s not ideal for the enjoyment of lower-rated players. But the important thing here is that decisions like this then get encoded into an algorithm. They become a policy that is implemented as software.

And this is why this is a conversation worth having: Because we live in a world where, increasingly, policy is implemented in the form of software. Someone at Uber makes decisions about how their drivers get compensated and how their users are charged, but that decision exists in the form of an algorithm. There is an entire strain of Silicon Valley salesmanship built around using algorithms to replace human decision-making.

Algorithmic decision-making has become a huge part of day-to-day life.

Except at no point are human decision-makers ever replaced. Instead, they’re put behind a curtain made out of software. Where before you had bosses cutting pay and jacking up prices, now you have an algorithm telling you the value of something. Algorithms are useful in many ways, but Silicon Valley has made an artform out of using them to sublimate responsibility. When an algorithm targets you with a sales pitch in a moment of vulnerability that it detected by trawling your data; when an algorithm slashes your compensation; when an algorithm decides that you are not worth prioritizing relative to some other user, to some other actor in the system, or simply relative to sheer profit: Who do you blame? Who do you get angry at? Where do you picket, how do you strike? The buck must eventually stop at a human; algorithms are not laws of nature, they are the product of programmers working under the direction of management. But Silicon Valley PR has given them an aura of infallibility, fairness, and impersonality. As though being hurt by a mathematical abstraction is supposed to hurt less.

And yet. And yet, when the Overwatch matchmaker wastes my time or frustrates me, somehow I am more, not less, angry than I would be if I had just joined TF2 server with bad team balance. Sometimes, it takes something that doesn’t really matter to crystallize the import of something that actually does. There’s an immediacy to it that doesn’t come across most of the time; the software systems that increasingly bind and direct our lives are soft, slow, invisible. They’re doing things very incrementally. The effect of all the advertising that gets served to our eyeballs, all the ways we are deprioritized, all the surge pricing, is aggregated over a matter of years. Often, we don’t see it at all; if a company uses algorithms to filter job candidates, you don’t really know that, and you don’t really know an algorithm sent your CV to the bit bucket.

But the matchmaker? The matchmaker expresses its preferences in a timescale of minutes. Sometimes seconds. Sometimes you know the algorithm fucked you before the character models load.

And so: If we can resent Blizzard for how they tune their matchmaker, then maybe we can resent Uber for how they pay their not-quite-employees. Maybe we can resent Facebook for their abuse of our personal lives to serve us ads. Whenever games act as a microcosm of broader changes in society, they act as an immediate barometer for our own feelings about those changes; and so, as online multiplayer moves more towards a matchmaker-based approach, examining what that means ethically becomes very important. Not so much because of how it affects your skill rating, or your play experience, or your feelings towards the game itself, but because of what it reveals about how we relate to algorithms and the control they exert.

Source

Chess, Overwatch