Git Gud: StarCraft II and CS:GO’s Matchmaking Systems

Hey folks,

Today I’ll be comparing the matchmaking systems in Counter-Strike: Global Offensive and StarCraft II. These games feature very different types of competitive play – 1v1 for StarCraft and team-based 5v5 for Counter-Strike. Despite this structural difference, both games feature ranked ladders with ostensibly similar goals. Let’s take a closer look.

2017-04-19_sc2GrandMasterLadder.png

The Ranked Ladder

The existence of a competitive mode – i.e. a fixed set of rules considered to be ideal for competitive play – is itself an interesting thing to think about. Our understanding of the optimum competitive rules has changed dramatically over the years. For instance, many people consider 1v1 to be the default competitive experience in traditional real time strategy games. But it was only the early 2000s when the World Cyber Games included team competition in Brood War and Age of Empires II. Similarly, modern Counter-Strike’s convergence on MR15 in competitive play stands in contrast to the popularity of Chargers Only back in the beta days.

(For more on that, I recommend this wonderful video on the Chargers rule set and its history.)

The focus today is on each game’s ranked ladder, so I’ll emphasize that my point isn’t related to professional play. The reason I cite tournaments is only as evidence that rule sets have evolved and continue to do so.

What do these rule sets – in other words, competitive modes – accomplish? At a high-level, I’d argue the following: to provide competitive players with good games. I’d characterize a competitive player as someone who values winning in an environment that they find compelling. People have a tendency to interpret this very negatively, but this is misguided. It’s not just about winning – it’s about winning in a particularly compelling environment.

For instance, I believe that human beings intrinsically enjoy personal growth and improvement. Competitive games offer that experience in a predictable and enjoyable way – by striving to win in a title that gives you the opportunity to visibly grow and advance, you can satisfy a fundamental human need. In addition, the lessons that you learn while doing so are often applicable to things outside of the game itself, like the realization that hard work is quietly satisfying and enjoyable, not painful.

Incidentally – and not coincidentally – traditional sports offer similar experiences. In both cases, a game’s success hinges on whether it establishes the right environment – if it feels fair, if it’s enjoyable enough to continue playing after a string of losses, if it accurately reflects differences in work and commitment to personal improvement, and so on.

Example: Fairness in Practice

The path from the theory of competition to the nuts and bolts of how matchmaking systems are implemented is the crux of this article, so let’s sink our teeth into a concrete example. One way that we could define a matchmaker as fair is to say that it always creates matches between teams with a 50% chance of winning. As much as possible, neither side is favored to win. This enables the emergence of small mismatches in skill to become interesting points of contention. For everyone involved, the game feels winnable but not trivial; challenging but not overwhelming.

Is this ideal? Not necessarily. Here’s Rob Pardo, the lead designer on Brood War and Warcraft III, discussing the idea in an interview prior to the release of StarCraft II:

“If your matchmaking is really good, it means that for every single game, you’re kind of [on] the edge of your seat… After you play an hour or two of games like that, you’re kind of exhausted. So we’re actually talking about, ‘Is that the right matchmaking approach?’ You might want to add a little sloppiness to the matchmaking. Maybe that means sometimes you get stomped, but sometimes you have easier games. And sometimes you have the really competitive games. It’s got better pacing.”

Whether this is actually how the ladder was implemented in StarCraft II is anyone’s guess. A 2011 balance snapshot mentions skewing effects in the matchmaker but doesn’t go into detail. In any case, my larger point is that even this “sacred cow” of the system – whether or not it should strive to create games that are as even in outcome as possible – isn’t a known optimum.

Skill and Outcome

Predicting outcomes is something that matchmaking systems do very well. This is how they fairly allocate points to winners and take them away from losers. Valve even trusts their system enough to enable promotions on tie games:

2017-04-19_promoFromTie.png

But can we go a step further? Can we claim that equality in outcome – two teams having exactly a 50% chance of winning a game – signifies equality in skill?

Philosophically, you could make a practical and utilitarian argument that skill and outcome are identical – what is skill if not the ability to win? Do marine micro or proper smoke setups have a context outside of their respective games?

Counter-Strike provides an interesting test case for this question. One of the first things I noticed as I ranked up the ladder were the immense skill differences between players of the same rank. At first I assumed it was smurfing, but the pattern was durable long after my entrance into Prime Matchmaking. At times I would observe a teammate and witness play similar or better to mine – other times, I’d be shocked by misplays, misjudgments and horrific aim.

Some of this is attributable to gaming the system or having a bad day – but not all, and the problem lay in my definition of skill. I focused almost exclusively on basic moving and shooting. Over time, I started to realize that this was inaccurate, and that there were lots of different ways to contribute to a win.

For instance, I play at the Gold Nova Master level, where a lot of players don’t know any smoke setups. I’ve won more than a few games thanks to someone who knew how to land important smokes without getting picked. What surprised me was that their individual performance, good or bad, was not always relevant. It was easily dwarfed by the immense contribution they made to me and my teammates’ ability to safely enter and secure a bombsite.

I’ve had similar experiences with strong in-game leaders, particularly friendly teammates, and players who focused on support (at my level, that means baiting, giving up weapons to top scorers and taking a pistol for themselves, etc). On the flip side, I’ve seen players with relatively stronger aim and game understanding fail to translate it into wins, often times due to character flaws (e.g. alienating teammates to the point where no one will drop them a gun).

It’s undeniable that long-term outcomes for these different skill sets will diverge. A great fragger who alienates their entire team can still carry a match by themselves, whereas a good teamplayer depends heavily on having strong teammates. Does that difference in outcome also imply a difference in skill?

Theoretically, no. Outcome is contrived – the relative value of fragging is determined solely by the ruleset. We can independently change outcomes without changing anyone’s skill set. Imagine a hypothetical scenario where landing a specific smoke setup would automatically win a round.

Practically speaking, however, the difference in skill is very real. The outcomes defined by the designer ultimately determine what’s considered skillful on the competitive ladder – a player in Global Elite or Grandmaster is better than a player in Silver, period. That said, a properly designed matchmaking system within a properly designed game will ideally align the two concepts – in other words, a player’s theoretical notion of skill (things like aim and team work in Counter-Strike, micro and macro in StarCraft II) will be fully encapsulated by the outcomes in the game (i.e. becoming more skillful will lead to winning more and climbing the ladder).

How can we achieve that?

Skill-Outcome Convergence

StarCraft and Counter-Strike take different approaches to this question. StarCraft arguably has an easier problem to solve because it’s 1v1. Every skill gap will inevitably be punished because there are no teammates to compensate for it. This forces players to develop genuine breadth – good outcomes must, eventually, pair with real skill. As I argued in my video on Brood War and StarCraft II, misunderstanding this can be genuinely frustrating – cheesing your way up the ladder will only cause you to lose every macro game you play, which will make the game feel arbitrary and coin-flippy rather than fun.

While StarCraft benefits from being a 1v1 game, that isn’t enough. For instance, if it were badly balanced, it would drive players to exploit the current meta rather than deeply study the game in its entirety. Balancing properly is a separate and very hard problem, but it needs to be done right in order to achieve skill-outcome convergence. I won’t go into that topic in detail in this article, but I’ve written about it at length.

The details of Counter-Strike’s system are not as transparent as StarCraft’s, but we can still learn things from external observation. There’s evidence indicating that individual performance matters in a player’s ranking. A couple posters on Reddit discovered it actually matters quite a lot, at least in placement games. They bought brand new copies of Global Offensive and played all of their placements together, with one player playing normally and the other merely acting as support. By the end, the normal player ranked a full two ranks higher than his supporting counterpart.

How Counter-Strike measures individual performance is anyone’s guess, but it’s a good jumping off point to discuss the game’s points system. Points are obtained by getting kills, getting assists, planting and defusing bombs, and a few other things. That might not seem like a lot, but it’s substantially more granular than simply winning or losing. Rewarding players for playing well is a good thing for the player experience, and it makes losing a lot less painful. That all being said, it would be pure speculation to say that points are related to rating – furthermore, there’s no evidence that a player can get demoted off of a win or promoted off a loss, meaning there’s a limit to how much individual performance matters to begin with.

I mention points because of their potential to create a spectrum of outcomes beyond the fixed win-lose paradigm. Now don’t get me wrong, a competitive game benefits from the concept of winning. But what I appreciate about Counter-Strike is the way it makes every game enjoyable, not just the victories – no matter how tough the match, you always get in a lot of great shots, a few good rounds, maybe even a clutch or an ace. The rule set is designed such that losing doesn’t necessarily make playing feel like a waste of time.

I think you could theoretically expand this to the matchmaking system by enabling points-based victories. You could imagine a real time strategy game decided by the winner of the most key encounters over the course of a game rather than just the player who won in the end. This would be such a massive shift in design philosophy that it would likely need to be built into a game from the ground-up – I mention it simply because it’s interesting to think about.

Convergence in Team-Based Play

If Counter-Strike limits how much influence individual performance has on a player’s rating, how can it have any hope of generating accurate ratings?

There are a couple of interesting factors here. One is that individual performance has a surprisingly substantial impact on the overall team’s outcome. About a month or two ago I began recording the results of my Global Offensive matches and my rank within each game. Here were the results:

My Score Ranking (End of Game) Proportion of Games Won
1st 57%
2nd 60%
3rd 64%
4th 36%
5th 41%

If I was at the top of the score board – i.e. contributing significantly – my team tended to win. If I was at the bottom – i.e. not pulling my weight – my team tended to lose. It’s true that the sample size is small (~100 games) and rank is not necessarily the same as contribution, but the point is nonetheless the same. I was rewarded for playing well and punished for playing poorly.

A larger conclusion I reached from this experience is that for every game lost due to bad teammates, there was a game that was won due to good teammates. I did not deserve to win many of the games where I ranked 5th, much the same way I carried worse players when I was the top scorer (both based on my own subjective evaluation, of course). Teammates’ impact on your rating evens out. The only variable that changes in the long-run is you.

(Here’s another nice video on the overemphasis that’s placed on bad teammates.)

Is this simply the nature of team games? I don’t think so, much the same way it’s not the nature of 1on1 games. For one, Counter-Strike’s design equalizes the five players on a team far more than many other games. A defensive lineman in American Football simply isn’t going to score as many points as a wide receiver – and if that’s what his team needs, his ability to change the outcome of a game will be limited.

For another, Counter-Strike empowers players to perform individually by facilitating creativity. I previously wrote about this at length.

Hidden Ratings

The opacity of the Global Offensive system is itself interesting to think about. Counter-Strike doesn’t provide players with their exact rating, choosing a league-based system instead. It also doesn’t reveal many details of how these leagues are calculated. By contrast, StarCraft II revamped its ladder in 2016 and cited increased transparency as one of its primary goals. This resulted in publicly visible matchmaking ratings, more fine-grained leagues and greater transparency around Grandmaster placement.

Which way is better? Speaking personally, I always considered StarCraft’s hidden ratings to be a mistake. Skills development and healthy competition are huge driving factors in competitive play. Hiding skill ratings detracts from that and makes it hard to know how you’re doing and what your trajectory is. It may be anxiety-inducing to see a number change after every game, but that’s not better than losing a game and misunderstanding your likelihood to have won in the first place. In any case, folks who stress over lost ladder points will stress regardless of whether or not they knew how many points there were to begin with.

I still believe in those ideas when it comes to StarCraft, but whether they can be applied to Counter-Strike depends on how much the game factors in individual performance. If it does, then hiding ratings prevents players from gaming the system. It’d be difficult to make a team game immune to this problem, so opacity is a simple but effective band-aid to avoid it altogether.

I think in the future we’ll see advancement in this area as computing power becomes cheaper and easier to leverage. For instance, Valve could use machine learning to identify common situations – say, a CT attempting a retake of the B bombsite on Inferno after a rotation through CT spawn. A player who loses could then be shown a replay of a professional player executing the exact same retake to identify where the player went wrong. A sufficiently advanced implementation could even identify quote-unquote “perfect” moves from professionals, and reward ladder points depending on how close players get to the ideal.

This is all speculation. The implementation of such a system would not be trivial. Furthermore, we’d never want to place emphasis on process over outcome. If a player retakes a site then they retake a site, it doesn’t matter whether they theoretically should have gone about it differently. I mention this mostly because I think it’s cool.

Leagues and Demotions

Even though Counter-Strike hides its ratings, it doesn’t spare its players’ feelings – it demotes them in real-time. StarCraft does this on a season-by-season basis.

The existence of promotions and demotions assumes a league system, which both games have. We should first convince ourselves that this is a good thing; for instance, does StarCraft benefit from a league system now that matchmaking ratings are public?

I think the answer is a clear yes. Back when I played competitive Age of Empires, the only thing you had was a public ELO rating. That didn’t prevent the creation of leagues. Rather than the official ladder providing them, they were informally specified by the community. Ratings ranged from 1400 and 1500 all the way to the mid-2000s, and players tended to discuss skill level as a function of the hundreds digit – for instance, by stating they were a 1900+ player.

The problem is that ELO ratings are difficult to compare across different eras. When a game is popular and lots of new players are laddering, they feed their points upward and inflate the ratings of the best players. When a game is declining and new players are improving faster than other new players are feeding, they take points away from the top and deflate the ratings of the best. Ratings at any given time become a function of the ladder population than a true measurement of player skill.

(For a more in-depth look at rating inflation and deflation, I recommend the Wikipedia entry.)

This happened most clearly in Age of Mythology, whose competitive scene dwindled after the release of Age of Titans and a multi-year hiatus from the World Cyber Games. The best AoM players had a rating of 2400, whereas by the end of Titans the top of the ladder was 2100. It would be inaccurate to compare these numbers directly, especially since there were players who were at the top of both games. They didn’t get 300 points worse – their rating just deflated along with everyone else’s.

Leagues fix this problem by formalizing skill ranges. StarCraft explicitly uses a percentile-based system. Counter-Strike appears to leverage a bell curve based on publicly available data, although how it’s actually implemented is once again anyone’s guess. In both cases, ranks are directly comparable across eras and thereby provide a more accurate notion of player performance (modulo changes to the ladder system itself). You can implicitly understand what someone means when they say they were a Diamond player in Heart of the Swarm – you simply couldn’t do the same by comparing MMR.

Leagues have lots of other benefits, too. They’re a visible and easy-to-remember categorization of player skill. If players take pride in working hard and climbing up the ladder – as they well should – leagues are a bit like a badge of honor. It feels a lot better to stake membership in a league than it does to claim a raw rating, or even a percentile.

If we assume that leagues are a good thing for a matchmaking system, then we concede that we have to demote players eventually. Otherwise, the league system becomes meaningless. When’s the right time?

At the very least, there needs to be some sort of buffer. It would be jarring to sit on the border of two leagues and get promoted or demoted after every game. I think this would eliminate one of the core benefits of the league system, the visible and easy-to-remember categorization.

On the flip side, I think the season-by-season approach of StarCraft is too delayed. This is likely just personal preference, but it’s a bummer to know that a derank is coming. I’d rather just have it happen mid-season than wait a month or two. If it were up to me, I’d like to see StarCraft demote faster and promote slower – perhaps wait until players are two tiers away from their current rank, in either direction, before changing their league status.

That said, there are plenty of merits to a season-based system – I just don’t think alignment with demotions is one of them. The lack of a season system in Counter-Strike is noticeable. Seasons enable developers to organize gameplay updates around predictable dates, which can make engaging with the game feel more stable. It avoids things like this:

2017-04-19_dust2RemovedFromLadder
The removal of dust2 from the competitive ladder hit some players hard.

League Distribution and Top-Tier Players

If we’re going to build a leagues system, how many of them should we have? What’s the ideal division of players into leagues?

To start, it makes sense for the league system to reflect how players would organically self-categorize – after all, this is partially why it exists. In this light, the original StarCraft II implementation featured too few leagues, and it’s unsurprising that the number of leagues has risen over time. The existence of concepts like “High Diamond” was an indication that the league system wasn’t accurately labeling players. The ladder revamp fixed this by splitting each league into three tiers, creating 19 distinct leagues per region.

On the surface, this seems similar to Counter-Strike’s system, which features 18 leagues. There is, however, a crucial difference in how players are distributed. StarCraft allocates four leagues for the top 4% of its player base – Master’s 3, Master’s 2, Master’s 1, and Grandmaster. You could also conceivably argue that the Contender Ladder is its own League, a sort of high Master’s 1.

According to publicly available data, the top 4% of Counter-Strike’s matchmaking playerbase is distributed across the top three leagues – one (arguably two) fewer than StarCraft.

This might seem like a minor point, but it’s actually a big deal. The reason why relates to how skill distribution works in these games. Let’s examine the very highest tier in StarCraft II, Grandmaster. This league contains only the top 0.3% of players. To keep things simple, we’ll only look at the Korean region, although this analysis can also be applied to EU and NA.

The best player on the Korean ladder as of this writing is a Korean professional, INnoVation. He has a matchmaking rating of 6994. The player at the very bottom of the Grandmaster ladder (#200) is an anonymous barcode. He has a matchmaking rating of 5473.

For reference, a 315 MMR difference translates to a 75% win rate for the higher-rated player. By contrast, what we have here is a 1521 point difference.

That’s massive.

Skill distribution just works differently at the very top than it does for everyone else. The difference between a below-average and an above-average player – say, between Gold 1 and Diamond 3 in StarCraft, or Gold Nova I and Distinguished Master Guardian in CS:GO – isn’t all that much. It’d be reasonable to practice every day and expect oneself to improve quickly and climb up the leagues in a matter of a few months.

(That said, even though the difference isn’t a lot, that doesn’t mean it’s easy to overcome. If you’re a casual competitive player, it’s probably because you’re a more serious doer-of-something-else, meaning you can’t just add an extra hour of daily esports practice to your schedule willy-nilly.)

By contrast, the difference between the very best casual competitive players and the very best professionals is a yawning chasm. A low Grandmaster player might get off work everyday and grind out a few hours of StarCraft, whereas a professional will play the game for six or eight or ten hours a day, everyday. Solar claimed to have played 70 games a day in preparation for the SSL finals he later won – at ten minutes a game (a conservative estimate), that’s more than eleven hours.

StarCraft’s matchmaking system illustrates these differences by ranking the very best players by exact rating. It also features a fine-grained breakdown of the pretty good players – three leagues of Master’s in addition to Contender, if you count that.

Counter-Strike, by comparison, provides almost no visibility whatsoever. Its highest league, Global Elite, comprises the top 1% of players – three times larger than StarCraft II’s Grandmaster. If you assume that skill disparities in Counter-Strike follow the same pattern as StarCraft – and there’s little reason not to, given that the skill ceiling is unreachable and lots of players grind the game everyday – then the Global Elite label becomes almost meaningless. For a “casual competitive player”, reaching Global Elite is the beginning, not the end, of a journey on Counter-Strike’s competitive ladder.

I’d argue that an ideal matchmaking system would feature just as many leagues for its very best players as it does for the other 99%. Theoretically, StarCraft already achieves this through precise Grandmaster rankings.

Counter-Strike is way behind in this regard. Some players may argue against this by claiming that once players reach Global Elite, they’ve effectively outgrown the competitive ladder – it’s time for them to move to ESEA and start grinding toward A+ and Rank S (if not outright join a team and participate in full-fledged leagues). But this is a failure on the part of the matchmaking system. It’s one thing for a third-party service to focus on solving a problem that’s not important to the core gameplay experience. It’s another to essentially replace the core gameplay experience because the existing system doesn’t do the job at higher levels.

Alternatively, readers might argue that the matchmaking system shouldn’t try to solve this problem in the first place. After all, Counter-Strike is a team game. Players at the highest levels should be practicing with their teams. But twelve years ago people would have said precisely the same thing for anyone above CAL-O – the notion of a solo queue would have been laughable (as anyone who spammed “looking for pug” in mIRC can remember). Before we assume that matchmaking is unworkable for the top 1% or that a third party system is inherently required, we should at least try to make the default system work.

Placements

We’ve discussed the top of the ladder at length – but how about the bottom? Both Counter-Strike and StarCraft require players to play a minimum number of ranked games before providing them with a league placement (and, in StarCraft’s case, their exact rating).

Placements reflect the matchmaker’s uncertainty about a new player’s rating. This enables the matchmaker to pad ratings adjustments and place players faster and more accurately. This is a huge step forward from previous systems. In Age of Empires, the original Dawn of War and early days WarCraft III, top players on new accounts would play dozens upon dozens of games to climb up to their expected rating.

StarCraft has practically perfected this. Its modified TrueSkill system requires only five placement games, and it’s extremely accurate. Just today, the top account in Korea’s Contender League has only played 17 games.

Counter-Strike, by contrast, requires ten competitive wins before placement. Not merely ten games – ten wins. Assuming a 50-50 win rate, that’s twenty competitive matches. At forty five minutes a match, that’s 15 hours of competitive play before your rating is visible.

That requires some serious dedication. I don’t have any data on this, but my belief is that the more placements a player needs to complete, the more likely it is they’ll get discouraged by the process. Having a visible rank and knowing what you’re working toward are powerful motivators to keep grinding the ladder. It’d be better to place players much faster than today.

Folks might argue that Counter-Strike is solving a harder problem because it’s a team game. But I’d argue that’s just an excuse. Like I mentioned above, the whole notion of a solo queue would have been considered impossible more than a decade ago and sparked replies of “only a paid service like ESEA can pull that off”. Yet Global Offensive managed to solve that problem.

In fact, Global Offensive has pushed the medium forward in a lot of ways. An experience system, the in-game economy, automated queuing, bots out-of-the-box – all features that build player engagement and add to the Counter-Strike experience. Like I mentioned before, there are lots of ideas with potential merit for more accurately evaluating skill, even some science-fiction level solutions. Imagine, say, a “time-to-kill” metric that was calculated using replays and computer vision that determined, on average, how quickly a player killed an opponent once they appeared on their screen.

I’m not arguing that’s the best idea – or even a good idea. My point is that there’s still so much that we could be doing better. We shouldn’t write off problems as impossible until we’re fully convinced of it.

Final Thoughts

There’s a few other minor things I wanted to touch on, but we’ve already gone pretty long today. I think this is a good time to wrap-up.

The most surprising conclusion from this exercise was how similar these two games’ matchmaking systems are. Despite their structural differences, the goal of enabling competitive play remains the same. There’s a lot that each game can learn from the other, like making losses less painful in StarCraft or fixing the higher-level leagues and placements in Counter-Strike. Furthermore, there’s a lot that other games can learn from studying these two masterpieces.

Thanks for reading! If you enjoyed this piece, please consider following me on Twitter and Facebook and checking out my game-design focused YouTube and Twitch channels. All the best and see you next time.

Additional References

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s