Enough with the teasing. If you are unfamiliar with this series, click here for white and click here for blue. You can scroll down to the article if you want to refresh yourself on this study’s methodology. Now let’s get to the results. What card had the best results in black? Crypt Ripper.
Crypt Ripper: 68%***
Hideous End: 64%***
Hagra Crocodile: 61%*
Nimana Sell-Sword: 58%*
Soul Stair Expedition: 58%
Heartstabber Mosquito: 58%
Disfigure: 55%
Guul Draz Vampire: 55%
Vampire Lacerator: 53%
Blood Seeker: 53%
Bog Tatters: 52%
Surrakar Marauder: 52%
Giant Scorpion: 50%
(Vampire’s Bite): 35%
Desecrated Earth: Ineligible
Mindless Null: Ineligible
Mire Blight: Ineligible
*Statistic is significant at 90%.
**Statistic is significant at 95%.
***Statistic is significant at 99%.
A card is ineligible when there are fewer than 20 observations.
As you can see, black did very well in my study. There were only three cards that had sub-.500 performances, and only one of those had enough observations to be eligible.
Let’s go straight to the analysis:
Crypt Ripper: 68%***
The best card in black in this study was not Hideous End, Disfigure, Surrakar Marauder, or Giant Scorpion. Instead, it was Crypt Ripper, and the 2/2 for four mana did more than ten percent better than those other alleged contenders for the top slot. Whenever Crypt Ripper hit play, one of two things happened in almost every case: either the opponent had a removal spell, or that opponent died shortly thereafter. As it turns out, the Ripper is hard to remove with commons in the format. As long as you are willing to hold back one black mana (and most players were clever enough to do so), then your opponent will need either a kicked Burst Lightning or Journey to Nowhere to take care of it. Few commons have that kind of game-breaking appeal, and he only gets better in the late game. Consequently, Crypt Ripper finished with the second best record out of all the commons, and the best out of all the non-land cards that are statistically significant at the 99% confidence level.
Hideous End: 64%***
Even if Hideous End comes in second place, its 64% should not come as a shock to anyone. It is truly the premiere removal spell in the format, and its 99% confidence level backs up that statement. This is in contrast to most of the rest of the removal spells in the format, which came up empty statistical significance. Some have argued this might be a selection bias—since everyone knows to play removal spells, they are more likely to come out as average. Meanwhile, some non-removal spells are only known to the “good” players as being playable cards, hence their averages are artificially pushed higher. While I do not have enough data to figure out whether that is the case, I think Hideous End got such a great result because it combines removal with direct damage to the dome, which really handicaps its opponent.
Note: Just because Hideous End appears lower on this list than Crypt Ripper does not mean Hideous End is a worse card. The statistical tests only check to see whether we can say with certain degrees of confidence that there was a correlation between playing that card and winning or losing the game. To infer whether one card had a greater or lesser correlation, I would have to run a difference in means test between those two cards. Given that this requires an A versus B set up and that I tracked more than 100 cards in this study, it would be impossible to test every card against every other card. Thus, you will not see any difference in means tests in these articles unless otherwise noted.
Hagra Crocodile: 61%*
This study continues to be extremely friendly to the landfall creatures, as Hagra Crocodile joins Steppe Lynx as statistically significance. A lot of people will be surprised by this result, but I have been running Hagra Crocodile with great success ever since the prerelease. Unlike Dross Crocodile, which would only get one chance to swing into the red zone if there were any blockers around, Hagra Crocodile usually has multiple opportunities and can do some serious damage. So what if it can’t block? Zendikar limited was not meant to turn into a battle of creature versus creature combat, and this result only further goes to demonstrate this as fact.
Nimana Sell-Sword: 58%*
Nimana Sell-Sword continues the same old story for allies—any that is good on its own is great overall. Although the Sell-Sword is frequently the last ally played for the game, the extra counter it contributes to Oran-Rief Survivalist or Umara Raptor is enough to send the opponent to the ropes. And, of course, if you continue to play more creatures, you are gravy from there.
Soul Stair Expedition: 58%
This will surprise people, but Soul Stair Expedition is a pretty solid card in sealed. Assuming the landfall triggers—and it should, seeing as you should be playing it on turn one. Once it gets online, it essentially draws your best creatures that have already been removed or destroyed. Think about how extremely disheartening that is for your opponent—after he or she finally took care of Vampire Nighthawk and Crypt Ripper, Soul Stair Expedition brings them back for round two. At that point, I don’t care how good your sealed deck is—you are in deep, deep trouble.
Whether this result translates into draft is another question. While sealed and draft share a number of similar qualities, draft tends to be a bit faster than sealed. As such, this Expedition might not be as strong there, though certainly I would be including it into my decks. But, regardless, Soul Stair Expedition is certainly the best of the cycle we have seen so far.
Heartstabber Mosquito: 58%
The cliché is that removal is good, and players will pay any amount of mana to get to it. To be a worthwhile spell, Heartstabber Mosquito costs seven mana, but its 58% was still respectable regardless. That being said, I think players have the wrong impression of the Mosquito overall. You should not think it is spot removal—the mana cost makes it too inefficient to be that. Instead, consider Heartstabber Mosquito a simple two-for-one spell, or a mere 2/2 flyer for four if you desperately need blockers.
Disfigure: 55%
A number of people on the message boards felt that Disfigure would come out on top, so they are going to be shocked to see that it landed at a mere 55%. However, after watching more than a thousand players sling cards, I can say that it does not actually do very much. Sure, it is spot removal. Sure, you have the opportunity for card advantage. Sure, it is cheap. Sure, it makes blocking a headache for your opponent. But all that added up does not equal an amazing card that can single handedly win a game around (like Crypt Ripper) or a potential game-ending threat into grave fodder (like Hideous End). Consequently, Disfigure only get this average rating. That is not to say it is bad or that it does not belong in your deck. But are you running black just so you can get access to Disfigure? Maybe not so fast.
Guul Draz Vampire: 55%
Guul Draz Vampire should be relevant in the early game—even if briefly—and eventually comes back around into the late game as a legitimate threat to your opponent. Consequently, I think that he flies under the radar of a lot of players, especially those who would include Vampire Lacerator over it…
Vampire Lacerator: 53%
According to some draft data, players are taking Vampire Lacerator higher than Crypt Ripper and Nimana Sell-Sword. Why?! All you get here is a Grizzly Bear for one mana. Sure, in the early game, you get to go on the beatdown. Great. But you are losing a bunch of life in the process as well. Not so great. In the late game, you get a 2/2. Wonderful. At that point, the fact that Vampire Lacerator only costs one mana is moot—you need something more swingy even if you have to pay more to produce it. Although these percentages come from play, I do not see why there is some sort of fascination with this card. And it certainly isn’t doing anything special in sealed.
Blood Seeker: 53%
Long ago, a player much better than I was instructed me that Suntail Hawk was bad—even in the best case scenario (you play it on the first turn and your opponent never plays a flyer for the entire game), it still is not a compelling card. The moral of the story is that dealing one damage per turn to your opponent’s dome is not a very good strategy. I more or less believe that Blood Seeker is just that, and that we see a 53% here only because it comes off the backs of all the actual good cards in black. Feel free to disagree.
Bog Tatters: 52%
Black finished just a hair below red for most frequently played color in limited, so you will frequently get an unblockable creature out of Bog Tatters even if you maindeck him. However, three damage a turn starting on turn six has less of an impact on a game than you might hope it would. The result is a very average 52%
Surrakar Marauder: 52%
Don’t let the 52% fool you—Surrakar Marauder is still probably better than the last four creatures on this list. Whereas Bog Tatters gets going late, Surrakar Marauder starts pounding evasive damage away beginning on turn three. Even if it is only two damage per turn, that still chips away fairly quickly and helps get to a point where Vampire Lacerator goes offline and Guul Draz Vampire goes online. And that makes for a pretty good card.
Giant Scorpion: 50%
Like Disfigure, some speculated that Giant Scorpion would be at the top of this list. Again, it is worth reiterating that some cards don’t single-handedly affect games. Giant Scorpion is such a card. Think about what Giant Scorpion does in average game. Most of the time, your opponent will attack with a reasonably sized creature (that is, three or more power), and Giant Scorpion will block. Both creatures will go into the graveyard. How game breaking is that? Answer: hardly. Moreover, this Scorpion will hardly do much in the red zone. Perhaps he is a solid card to have in your deck, but he is going to do a very boring job.
(Vampire’s Bite): 35%
It is beyond me why people seem to be so high on Vampire’s Bite. Other than ridiculous cards like Overrun, combat tricks tend to be best when they simultaneously save your creatures while destroying your own. Both Vines of Vastwood and Slaughter Cry have these quality, and both came in higher than 50%. In contrast, Vampire’s Bite will only help you with the latter, which means you are probably going to lose the spell and a creature just to remove something of your opponent’s. That’s an automatic two-for-one against you, and we aren’t even considering when your opponent removes the Bite’s target in response. Spending four mana total to kick the spell hardly seems worth the life, either. From that perspective, a 35% should not be surprising.
Desecrated Earth: Ineligible (1-2)
Would you want to play a poor land destruction spell in a limited format? Would you want to play any land destruction at all in Zendikar limited, outside of Mold Shambler (which is a land destruction spell only by technicality)? The only thing more surprising than the fact that someone one a game while playing Desecrated Earth is that three people were willing to play it in general.
Mindless Null: Ineligible (7-11)
I have heard that some people are willing to play Mindless Null just because they need another black card to justify playing all of the other great ones they have available in their sealed pool. However, with a 39% mark, even if it is in only eighteen games, you have to wonder whether there is another card—any card—in the pool that could go into the deck other than this one.
Mire Blight: Ineligible (No Observations)
An entirely unsurprising result.
Overall, black’s cards are extremely strong—only one eligible card clocked in below 50%. This should come as no surprise considering it packages good creatures (Crypt Ripper and Heartstabber Mosquito) with a variety of removal in various forms (Hideous End, Disfigure, and Heartstabber Mosquito) and card advantage (Soul Stair Expedition and…you guessed it…Heartstabber Mosquito).
Join me next time when we get to red. Is Burst Lightning the top common? Does Plated Geopede live up to its reputation? And is Zektar Shrine Expedition a playable card? I will answer all those questions and more in a few days.
William Spaniel
williamspaniel@gmail.com
Appendix: Methodology
I assume that skill, luck, and the quality of a player's deck determine who wins any particular confrontation. While undoubtedly skill matters, this study is focused on the luck and card quality factors. Players actually have a great deal of control over both of these, as a poorly-constructed deck will win less often than a well-constructed one. From this, we can conclude that some cards contribute to wins more frequently than others. If an average card ever reached play in a game, we would expect its controller to have only won that game around 50% of the time. But if a truly exceptional card reached play, we would expect its controller to have won upwards of 70% of the time.
Watching replays of Pro Tour San Diego qualifiers on Magic Online (and carefully avoiding the qualifier in which the system malfunctioned and everyone played a 140 card deck—yes, this actually happened), I recorded the results of more than a thousand players. Every time a card hit play, I would record it as either a win or a loss, depending on what ultimately happened in that game. If the card reached play multiple times (perhaps because of a (Grim Discovery)), it only counted once. But if a player cast multiples of a single card, I counted that card multiple times.
Such a large number of observations were necessary to remove the play skill bias that would have shown up in a smaller-n study. It also shrinks the margins of error, allowing for better hypothesis testing, which I ran at 90%, 95%, and 99% confidence.
For those of you unfamiliar with hypothesis testing, here is a brief explanation for what each of those means:
90% Confidence: When I say we can be 90% confident that a card positively contributes to victories, it means that there was only a 10% chance that the card has no impact and the data came back so eschewed based on pure luck. While the odds of being wrong here are only 1/10, we should be very skeptical of these results as statisticians. Generally, it is only a good idea to accept these results if we have a good theory behind them. For example, I would accept (Burst Lightning)—a quality removal spell as being true—but I would cast doubt on whether (Blood Seeker) was actually affecting things.
95% Confidence: This is the gold standard of statistics. When a card meets 95% confidence, the likelihood the card is merely average but we got this extreme of data back is only 1/20. At this point, it is a good idea to start thinking of theories to justify the results if you do not have one already.
99% Confidence: While rare (there were only four in this study), a 99% confidence virtually guarantees that a given card has an impact on the match—there is only a 1/100 chance that this result is wrong. You should pay careful attention to these.
Just because a card does not show up a significant does not mean you should not care about the results. But you should not treat them as gospel, either. The best analogy I can draw is to that of a baseball team. It is possible that your star hitter goes through a minor slump at the beginning of the season and an average player goes on a torrid streak at the same time. That does not mean the average player is better than the star; it just means he was better during that period of observation. So don’t be surprised if the study ranks an average card lower than one you perceive as a top-pick. My card-by-card commentary will help qualitatively decide whether this was just statistical coincidence or if it might be part of a larger trend.
Additionally, just because a card is sub-50% does not mean you should automatically stop playing it in all of your decks. Going back to the baseball analogy, a team cannot field nine players with batting averages all over .300. But it can maximize its performance by putting its best players in the lineup. So if you need to play Vampire’s Bite to have enough black cards to justify running (Sorin Markov), go right ahead; but if (Vampire Lacerator) is floating around in your sideboard, the data indicate you should swap out the Vampire’s Bite.
18 Comments
Once again I think the results of this study are murky at best. In an early sealed event 6/8 of the top 8 decks were playing black, black is just a better color than the others in the set, so what comes out on top here? The black commons that cannot be splashed well, meaning that the players who played them played mid to heavy black and therefore probably opened a good black pool and so had a better chance of succeeding in sealed. Hagra croc is a filler black card, once again something that you only play because you have alot of other good black, so once again not so surprising to see it high on the list, soul stair is certianly great in sealed, in draft much less so since games tend to be alot shorter, same thing goes for the mosquito, marauder is artificially low because people in sealed scrounge for playables and play their scrabblers and pumas more than in draft, and play black if they can, scorpion is better in draft because of the high number of tuned aggro decks in draft vs sealed where it shines holding off a pile of offenders, though not amazingly better, I still think people overvalue it, Im not surprised to see blood seeker so low, as he really isnt that good (though that draft where he stopped me from recovering with my conq pledge he was pretty dang annoying) Still great to see these articles, even if I'm even less convinced of the data's usefulness.
Even if you test Crypt Ripper against only black cards, the result still comes out as significant to 95%. I agree that black cards are getting a coattail effect from other good black cards. The question, of course, is which of the black cards is making all of the other ones better. And, either way you test it, Crypt Ripper seems to be doing favorable things.
well, you also have to remember that in sealed, you are probably going up against a lot of other black and red decks. Against those decks, Crypt Ripper is really good, because it doesn't die to Hideous End and it can pump up out of burn range of a lot of the red removal spells. It kills nearly every creature in any of those colors. So of course it is good there. Disfigure is probably lower because it gets played in so many more games. If your mana is bad, then you aren't going to be able to play a Hideous End, but you are going to be able to play Disfigure in every game that you play. Surrakar Marauder is going to be bad because everyone else is playing black, so he is really just a 2/1 for 2. Here is another example, Bog Tatters still clocks in at 52%, even though it is a terrible card. But if everyone is playing black, then it is decent. Not as good as a Crypt Ripper because it dies to every removal spell in the format, but decent nonetheless.
That's a really good explanation for Surrakar Marauder. I will be sure to mention it in the red article.
Hiya
I found your articles very interesting.
You mention some discussions going on about what people thought would do on black. Which message boards are you using? Do you have a link to the thread?
I was thinking of doing something very similar, analyzing which cards were played in order to see what cards might be best.
I don't have the patience you have though and could never go through the replays by hand.
So being a programmer, I wrote some code that would save replay information for every game and match for every round of a tournament, including the full game play log of every card played and attack made, etc.
I've gotten that code working. MTGO crashes once in a while when playing back so many replays, but with a little human intervention I'm able to gather a complete event.
Now I just have to get off my lazy butt and analyze the data and present it.
I even thought about making this automated and providing a website that would have all the data for all the events with running statistics, but again, my lazy butt hasn't been motivated to do so yet. Also, I'm not really a statistician so I'd probably mess up the stats some how.
If your interested perhaps we can talk more about what techniques you've used to analyze what was played and any ideas you might have for fun stats and how to measure them. We could do it in a message board thread so that others could chime in.
Nice articles by the way. Congrats :)
Email me. williamspaniel@gmail.com
I forgot to log in, so here's my rating :)
Again I'm left wondering how we can take this data and apply it to our approach to Zendikar limited, and I'm curious to hear your thoughts on that. Having done this research, how is it going to impact *your* approach to Zendikar draft or Zendikar sealed?
Will it impact your approach to draft at all? The data doesn't translate directly into draft picks, where you would be quite incorrect to take the mana-intensive Crypt Ripper over Hideous End or even Disfigure early.
Does it help build a sealed pool? Maybe it can contribute to some final card decisions, but once you have chosen your colors, the percentage differences between Hideous End, Disfigure, Crypt Ripper, and Sell-Sword don't matter, you are going to play them all if you play black.
If the only actionable takeaway is to inform (loosely) some bottom-end slot choices in sealed, is this kind of data mining and processing worth the effort? (I say "loosely" because even in those cases, the data is still looking at win percentage when resolved, not which card is "better.")
As it is, this little snippet from one of the WotC newsletters still tells me more about how to approach Zen drafting than any conclusions drawn from your series (I think this was already quoted in one of the other article comments, but it bears repeating):
Magic Online Factoid
Top 10 first picks of 8-4 ZEN draft winners:
1. Hideous End
2. Burst Lightning
3. Vampire Nighthawk
4. Journey to Nowhere
5. Marsh Casualties
6. Disfigure
7. Plated Geopede
8. Trusty Machete
9. Malakir Bloodwitch
10. Kor Skyfisher
The first four common picks are the top removal spells in the colors with removal. Crypt Ripper is great in heavy black, but the recipe for drafting success is not to pick your high-ranking cards first, but to pick removal, bombs, and efficiently-costed creatures first.
I absolutely respect and admire the work you have put into this series, don't get me wrong. I agree that Wizards should release the mounds of data they have on Limited so that players like yourself who are quite willing to work the data into articles have something meaty to work with. In the meantime, is there anything we can actually do with this info besides find it interesting? Which I do, and perhaps "that's interesting" makes this worth the effort, but of course, the Spike in me wants to *do* something with it!
I'm using this data when building decks, drafting, and making in-play decisions. The building decks part should be obvious, as I gathered this data from sealed pools. Using this during drafting is a bit more controversial, but I don't think that the differences between sealed and limited are that great except for in the places where it obviously is (Soul Stair Expedition, for example). I avoid discussing this in articles because I understand that would incite a riot. Eventually, I would like to test the differences between the results of sealed events with draft events and see whether there actually is much of a difference at all. Finally, it's good to know what is a threat to me and what isn't when I'm making in-play decisions, especially when it comes to removal.
Example: Vines of Vastwood (and, as my data for Magic 2010 increases, Giant Growth) greatly improves a player's chances of winning. From what I'm seeing, I explain this based on the fact that it often steals wins at the end of games and also helps remove extremely large creatures. Consequently, when people attack at suspicious times, I am more than willing to find out if they have Vines. If they don't, then I have successfully called their bluff. And if they do, I'm still better off having removed that Vines from their hand than having to face it later on in the game.
I have been playing Soul stair in almost all of my black decks and it has been excellent. I think it functions the way we wish Ior Ruin Expedition would function in this format. Black has several excellent targets for return, obivously Crypt Ripper and Nighthawk but also Gatkeeper and Scorpion. Having it on the table will even change how opponents play against you. Nobody is blocking your Gatekeeper with Souls stair sitting there to return him and swinging into the Scorpion also becomes undesirable. And lets face it returning your bomb creatures and making your opponent deal with them again is almost and auto win.
I am not saying it is a first pick card but I usually pick them up tenth or later and it is a great form of card advantage for a cheap price. Sure it is a terrible topdeck late and sometimes it just gives your opponent a target for thier sanctifiers but more often then not it is great. Try them out you won't be disappointed.
ok i can kind of understand...Crypt Ripper in a vacuum, would be good. Probably the best common black creature as far as actually getting the win...but its by far the best black card. I refuse to believe anything other than hideous end or the more splashable disfigure is the best black common.
From the article:
"Note: Just because Hideous End appears lower on this list than Crypt Ripper does not mean Hideous End is a worse card. The statistical tests only check to see whether we can say with certain degrees of confidence that there was a correlation between playing that card and winning or losing the game. To infer whether one card had a greater or lesser correlation, I would have to run a difference in means test between those two cards. Given that this requires an A versus B set up and that I tracked more than 100 cards in this study, it would be impossible to test every card against every other card. Thus, you will not see any difference in means tests in these articles unless otherwise noted."
So, no, I'm not saying that Crypt Ripper is inherently better than Hideous End. It could be, but I don't know--I would need to gather a ton more data, maybe ten times more than what I have here. The results I present are only from a sample of 1000+ players. Some cards are bound to rank higher than others even they aren't better in the long run.
Good article. As for Godot's question, I think it is up to us to determine how to correctly interpret this data. That doesn't make it worthless, but instead debatable as to its implications. He's not saying go and pick Crypt Ripper higher than Hideous or Disfigure. It's not as simple as a pick order. The data is especially complex given that each color is different and drafted/valued differently. Given the popularity of black, I understand Crypt Ripper topping this list to mean that the person who is deepest in black (ie someone who can use Crypt Ripper well) is most likely to win the draft. This confirms what we already know about black. How can that help you win more? Figure it out :)
I have enjoyed reading these articles so far and think this data is quite helpful even if some of the other commenters do not. I agree that it indicates what is more important in games and while it does have some flaws I think this gives some great information. I would love some draft information and hope that wizards will let you do it at PT: San Diego. Thank you very much.
This is a fun read. I have a couple questions and suggestions for improvement. From your methodology section I infer the probabilities you are providing are correlation coefficients between the incidence a card is played and the game outcome. You provide significance levels of tested hypotheses, but are you using a t-test with a null that says the correlation between play incidence and game outcome is zero? It would be nice to see 95% confidence intervals for the correlation coefficients to give the reader a better idea of population correlation parameter vs. the sample statistics.
I am--though exactly what having "no correlation" means is a little dicey. I think I am going to switch things up when I do the next study. I will discuss this in further depth during my wrap-up article.
I don't believe your results indicate that crypt ripper is the 'best' card to play.
Your results favor the more powerful cards in a vacuum, regardless of casting cost. If there was a card that cost 10 black mana to play and had the text "*this* is uncounterable, you win the game", your results would point out that it was the best card in the format, since every time it was played, the player that played it won.
But the fact that it costs 10 mana would actually make it a pretty crappy card, because 99% of the time it shows up in someone's deck, it will not be cast.
Obviously, the more powerful cards are the ones you want to cast, as your study shows.
But I think it would be a better study if it showed the cards you want to include in your winning deck. (Just because casting lorthos the tidemaker will win you most games is not the same thing as saying you will be able to cast him most games.)
I would be interested in seeing the results of a study that counted the cards present in the winning decks rather than the cards cast.
That said, good on you for doing some math:)
McFish makes a good point.
But William makes a note to say that cards that appear less than 20 times are not counted.
However if a card that is played 140 times is 'weighed' as the same as a card that is only played 30 times, it's possible some of McFish's concerns are relevant.
The problem with counting the cards present in a deck as McFish suggests is that MTGO replays don't show you what is in their deck, just what was played.