If you read my article last time (found here), you know the drill. If you need to brush up on the methodology of this study, I included all that information at the bottom.
That being said, before we get to the results, I wanted to use this space to address the extremely positive feedback I received with white’s article. After all, if one person asked a question, I’m sure a hundred people had it cross their minds.
This data is from sealed tournaments. What’s up with that?
Sealed is the easiest format for me to collect data on. Thus, I am doing analysis of sealed here. Although there are some notable differences between sealed and draft (sealed is slower, tends to involve more splashes, and can’t abuse archetype constructions like allies as well), I’d speculate that a lot of cards perform similarly across limited formats. As I just said, we already have theories about what makes cards better in draft than sealed and vice versa, so we should be able to spot them. I will make that effort whenever I can in the blurbs.
Okay. But I really want draft data. Can you make that happen?
I probably could, but it would take an awfully long time. Right now, Magic Online only gives players nearly unlimited access to the premiere event rooms. Consequently, I can call up the replays to a bunch of games from a Zendikar sealed tournament and start harvesting data like crazy. I cannot do the same with a Zendikar draft queue. Thus, my only access to drafts is during the top eight matches.
Yeah, but that’s still at least 14 games per top eight. So you should be able to do that, right?
Indeed I could, but that would systematically bias my results. Imagine I took your advice and did just that. I might find a draft where some guy lucks into two Sorin Markovs and drafts 43 (Mindless Nulls) with the rest of the picks. This is going to fundamentally screw with my tracking of both cards—Mindless Null will end up higher than it should be simply because Sorin Markov is so damn good, and conversely Sorin Markov will finish lower than it should be because Mindless Null is so damn bad.
As such, I only tracked a single game from every match I looked at, and only from one round of every tournament. You are right in that I could only look at one game from each of the quarterfinals matches, and I’d eventually finish. But that would take forever. And my results would be systematically biased again because I’d only be looking at good mages playing Magic.
But I really want draft data. What can be done?
I have two solutions. First, contact Wizards and tell them you want them to allow open access to draft queue replays. (This will not work.) Alternatively, contact Wizards and tell them you would like them to commission a study at an upcoming Pro Tour. I have a system that would work well for paper Magic, and I live near San Diego. However, it is up to all of you to make Wizards aware of this study and allow me to run the data at a Pro Tour.
Shouldn’t I be wary of these results? The number of observations is pretty low.
Yes and no. Some of the observations are low. Others are fairly high. Regardless, I already a took a couple of steps to address your concern. First, I disqualified all cards with fewer than twenty observations. Second, I marked all cards that are statistically significant. So even if you want to be super skeptical (and you are welcomed to—even encouraged to as a statistician), you can at least trust the cards with the stars next to them.
Is this tracking what is actually winning games, or what good players are using to win games?
Before publishing the last article, I did not consider that some cards might just be doing well because bad player aren’t intelligent enough to play them. This would partially explain why Steppe Lynx beat Journey to Nowhere—everyone plays Journey, but only good players know the power of Steppe Lynx. (Or so goes the argument.) On some level, I believe this is true. But this argument relies on the fact that Steppe Lynx is good, so it probably goes both ways.
What about the fact that some cards are only cast when a player wins?
Do you ever see Bold Defense cast in a loss? In an ironic twist, Bold “Defense” seems only to be cast during alpha strikes and rarely gets played when someone is losing. Or at least that is what I observed in the nine matches I watched, in which it went 8-1. Yes, this is a problem. I wish I could see what is in everyone’s hand at any given time, but MTGO does not allow me to do that. Again, I will point out whenever I think the “Bold Defense problem” is influencing the results.
I miss the old, curmudgeonly William Spaniel. Are we ever going to see him again?
Probably not. I’m not as active as I once was in Magic, so I don’t really have much of an opinion on most issues right now. And it is unlikely I will ever have the time to do so again—I just graduated from college (guess what my degree is in—no, really, guess), and I have the responsibilities of an adult now. But this data is interesting and evidently it has some value to the readers out there, so you’ll at least get this stuff for the time being.
With that out of the way, on to blue:
Sky Ruin Drake: 67%**
Umara Raptor: 62%**
Windrider Eel: 57%
Welkin Tern: 53%
Reckless Scholar: 47%
Whiplash Trap: 45%
Into the Roil: 42%
Ior Ruin Expedition: 41%
Kraken Hatchling: 35%
Paralyzing Grasp: 23%**
Caller of Gales: Ineligible
Cancel: Ineligible
Lethargy Trap: Ineligible
Shoal Serpent: Ineligible
Spell Pierce: Ineligible
Spreading Seas: Ineligible
Tempest Owl: Ineligible
(Trapfinder’s Trick): Ineligible
*Statistic is significant at 90%.
**Statistic is significant at 95%.
***Statistic is significant at 99%.
A card is ineligible when there are fewer than 20 observations.
You will notice that a lot of blue’s cards are ineligible because of how unpopular of a color it was—and with only four cards above 50%, it is no wonder why that is true. In any case, here is a rundown of the cards:
Sky Ruin Drake: 67%**
If I had to guess ahead of time, I would have put Sky Ruin Drake below Umara Raptor and Windrider Eel, but it finished on top nevertheless. Unlike virtually all other creatures in Zendikar, the Drake excels at blocking, not attacking, and it pretty much rules the skies once it enters play. And being statistically significant at 95%, this should end any doubt whether it belongs in your deck.
On a different note entirely, I am curious whether there is a statistical bias for higher mana spells. Some may question whether Sky Ruin Drake is so high on this list because the very fact that a player cast it indicates that they had a relatively smooth mana draw and survived at least until the fifth turn. After writing the remaining three articles, I might investigate this further. However, for now, I would guess that whatever possible bias exists in that regard is canceled out by the fact that the player may have been mana flooded as well, and that greatly hurts his or her chances of winning.
Umara Raptor: 62%**
Umara Raptor benefits from being a very solid card on its own and an amazing card when paired with other allies, and its 62% mark is statistically significant at 95%. I don’t think much analysis here is necessary—if Wind Drake has always been viewed as a playable creature in limited, a strictly superior version of it should be as well.
Windrider Eel: 57%
Continuing with the trend of landfall creatures being good (Steppe Lynx was the top white common last time), Windrider Eel pulls through with a 57% clip. As you can imagine, swinging for two damage in the air is decent, but it is another ballgame entirely when it is four or even six damage for only four mana. I think that Windrider Eel is superior to Sky Ruin Drake, but I’m unsure whether it is actually better than Umara Raptor; that would require more data to say with certainty.
Welkin Tern: 53%
Is it a coincidence that all of blue’s commons over 50% have flying? Probably not. Blue wins games when it controls the air and evades its way to twenty damage. Welkin Tern starts the process off with an evasive two mana, two power flyer. But unlike the others, it can’t block very well (like Sky Ruin Drake), and it can’t grow ridiculously large (like Umara Raptor and Windrider Eel). Hence Welkin Tern’s modest 53% on this list.
Reckless Scholar: 47%
Looters are traditionally regarded as solid picks in limited formats, as they smooth over poor mana draws and trade unnecessary land for gas in the late game. However, many writers pointed out early on in the season that Zendikar is a different world altogether—you very often want to topdeck land in the late game. Thus, as you are filtering through your library, there isn’t much useless stuff to get rid of. These suspicions were dead-on, as Reckless Scholar went a sub-par 47%. Half the time, I seriously think it is no better than a vanilla 2/1 creature, and sometimes it is worse because people foolishly loot when they would be better off attacking. Regardless, at least Reckless Scholar is better than a lot of blue’s other offerings.
More generally, I am curious whether this type of card really is that great in limited at all. Merfolk Looter is only at 50% in the few games I’ve looked through Magic 2010. Unless that number moves up as I increase the number of games I track, we might need to start reconsidering what we think of this ability in general.
Whiplash Trap: 45%
Undo used to set my opponents way back on tempo, so I was surprised to see Whiplash Trap fall this low. However, the five mana investment seems to be the problem. Whereas Undo only cost you two mana and maybe forced your opponent to spend seven to get back to their original game state, here Whiplash Trap costs five. Consequently, the relative tempo you gain is much smaller than before, and that often does not quite make up for the eventual card disadvantage you feel. (Unfortunately, the “trap” part of this card is rarely ever triggered.) The bright side is that occasionally clears the way for an alpha strike, which is probably why it ranked higher than Into the Roil…
Into the Roil: 42%
I know a lot of people favorably compared Into the Roil to Invasion’s Repulse, but that may pushing things. Bounce spells provide a lot of utility, and can occasionally provide their casters opportunities for two-for-ones. Hell, in this format, it makes Paralyzing Grasp look awful. (Wait, it is—see below.) However, at least in the games I observed, it did not do much to actually win games. Into the Roil is probably a fine card to have in your deck, as some card will always have to occupy the position of “worst card in your deck,” but it definitely not something to based your 40 cards around. Doing so requires double blue mana, and given the low percentages of all the blue cards here, that is not a very good idea.
Ior Ruin Expedition: 41%
In Zendikar, world of landfall and attacking, waiting a few turns to draw extra cards apparently does not cut it. Divination provides much needed card advantage in a slower environment in 2010 limited. Here, Ior Ruin Expedition often goes untriggered before the end of the game, wasting its caster two mana and a card. And this was supposed to provide card advantage! Alas, this Expedition is only worth 41%. While that’s an improvement over Sunspring Expedition (20%), it certainly is no Soul Stair Expedition—more on that next time.
Kraken Hatchling: 35%
In core set limited formats, Horned Turtle is a staple in blue decks. The theory goes that blue is supposed to win its matches in the air; give a blue mage enough time, and he will win with his (Wind Drakes), (Snapping Drakes), and whatever other blue fliers exist for that year. To buy time, blue needs big butts to hold the ground, lest green fatties run roughshod all over them. Horned Turtle fills that role nicely.
However, Kraken Hatchling appears to fall flat, even though it has the same toughness for two mana less than its comparable blocker. Two things account for this. One, Kraken Hatchling cannot contribute anything in the red zone without some equipment backing it up. Although attacking certainly is a lesser priority when you are playing cards like this, having that extra versatility is a nice bonus. The other issue is that Kraken Hatchling does not discourage a player from attacking with a one toughness creature. Horned Turtle can hold off one, two, three, or even four creatures, provided that all but one of them only have one toughness. Meanwhile, your opponent is free to crash away when the zero power Hatchling is no threat. Combined, these two issues lead to such a poor win percentage.
Paralyzing Grasp: 23%**
How bad is Paralyzing Grasp? Well, it’s statistically significantly bad for starters. But in limited formats, where players usually value removal spells so highly, it is easy to throw Paralyzing Grasp in a deck. However, it is particularly bad in Zendikar. As you probably already know, Kor Skyfisher is one of the top commons in this format. It also is particularly strong against Paralyzing Grasp—you simply pick up the creature that Grasp was trapping, and the removal spell is gone. But it does not stop there, as it gives your opponents great targets for their (Kor Sanctifier)s, Mold Shamblers, and Narrow Escapes.
Beyond removal issues, you also have to find a suitable target at a suitable time. In other words, you have to wait for the target creature to attack you first, and then you must cast Paralyzing Grasp the very next turn. I saw way too many instances of blue mages tossing a Grasp on an untapped creature. While this certainly deters attacking, it usually ensures the opponent a very solid wall through the remainder of the game. All these factors combine to make me extremely wary of ever placing Paralyzing Grasp into my forty cards again, even if it does tease me as a removal spell.
Caller of Gales: Ineligible (0-7)
No card had more losses without producing a win than Caller of Gales. Thus, I am very happy whenever my opponent starts off with Island, Caller of Gales. (And only slightly more happy when he or she begins with Island, Kraken Hatchling instead.) I needn’t analyzing why—it’s a 1/1 with a rather bland ability!
Cancel: Ineligible (9-2)
Cancel has me really curious, particularly because my model has it listed as good with 95% confidence, (though this is not officially listed because it does not meet my requirement of a minimum of twenty observations). Moreover, in preliminary research for Magic 2010, Cancel is getting a solid score as well. So one of two things is going on here: either countermagic is underappreciated in limited, or Cancel is benefiting from non-play bias. In other words, Cancel could rank highly because it is just sits in a player’s hand during a loss (thus the loss is never recorded) but is frequently being played during wins. Or it really is good. Either way, it is worth noting that Cancel usually counters a game-winning spell from the opponent, which leads me to believe that this result is more than just some methodological problem.
Lethargy Trap: Ineligible (0-3)
Fogs don’t win games—they merely lose the game next turn. Lethargy Trap would be a lot more useful if you could cast it on your own turn when your opponent is blocking, but you don’t get that utility here. Thus, Lethargy Trap unsurprisingly did not pick up a win in very limited play.
Shoal Serpent: Ineligible (1-7)
It is difficult to keep hitting your landfalls after the sixth turn. You would think having a 5/5 blocker would do better than produce one win in eight attempts, but nonetheless that is the result. And, moreover, this is statistically significant at 95% despite such a small amount of play.
Spell Pierce: Ineligible (No Observations)
People were brave enough to try out all but two of Zendikar’s commons. Spell Pierce was one of those that people stayed away from, and it’s easy to see why—such a narrow card rarely has a use in limited. It is possible that Spell Pierce was, in fact, lingering around in someone’s hand, but that player never even had the chance to cast it. That being the case, this card is actually worse than the non-existent data shows, as even a Paralyzing Grasp would have done something. If I ever cull more data, don’t be the first person I see playing Spell Pierce.
Spreading Seas: Ineligible (3-4)
You could certainly do worse than Spreading Seas. At the very least, it will instantly replace itself, unlike Ior Ruin Expedition. At best, you can randomly mana screw an opponent—and keep in mind this data all comes from sealed tournaments, where three colors are a lot more frequent than in drafts. If you need a 22nd card desperately, I’d be comfortable with this one.
Tempest Owl: Ineligible (1-7)
I never saw Tempest Owl kicked once. On the other hand, I saw a 1/2 flyer for two mana hit play on several occasions. Surprise, surprise, Tempest Owl is significantly worse than Welkin Tern—and statistically significant at 95%! So we can safely pass on this one here as well.
(Trapfinder’s Trick): Ineligible (2-1)
All three of the Trapfinder’s Tricks that were cast drew blanks. I think that just about sums up how useful it is in this format, even if its controller managed up a miracle to pull out a victory twice.
That’s blue for you. Next time is black. Hideous End ranks number two. Any guesses what is number one?
William Spaniel
williamspaniel@gmail.com
Appendix: Methodology
I assume that skill, luck, and the quality of a player's deck determine who wins any particular confrontation. While undoubtedly skill matters, this study is focused on the luck and card quality factors. Players actually have a great deal of control over both of these, as a poorly-constructed deck will win less often than a well-constructed one. From this, we can conclude that some cards contribute to wins more frequently than others. If an average card ever reached play in a game, we would expect its controller to have only won that game around 50% of the time. But if a truly exceptional card reached play, we would expect its controller to have won upwards of 70% of the time.
Watching replays of Pro Tour San Diego qualifiers on Magic Online (and carefully avoiding the qualifier in which the system malfunctioned and everyone played a 140 card deck—yes, this actually happened), I recorded the results of more than a thousand players. Every time a card hit play, I would record it as either a win or a loss, depending on what ultimately happened in that game. If the card reached play multiple times (perhaps because of a (Grim Discovery)), it only counted once. But if a player cast multiples of a single card, I counted that card multiple times.
Such a large number of observations were necessary to remove the play skill bias that would have shown up in a smaller-n study. It also shrinks the margins of error, allowing for better hypothesis testing, which I ran at 90%, 95%, and 99% confidence.
For those of you unfamiliar with hypothesis testing, here is a brief explanation for what each of those means:
90% Confidence: When I say we can be 90% confident that a card positively contributes to victories, it means that there was only a 10% chance that the card has no impact and the data came back so eschewed based on pure luck. While the odds of being wrong here are only 1/10, we should be very skeptical of these results as statisticians. Generally, it is only a good idea to accept these results if we have a good theory behind them. For example, I would accept (Burst Lightning)—a quality removal spell as being true—but I would cast doubt on whether (Blood Seeker) was actually affecting things.
95% Confidence: This is the gold standard of statistics. When a card meets 95% confidence, the likelihood the card is merely average but we got this extreme of data back is only 1/20. At this point, it is a good idea to start thinking of theories to justify the results if you do not have one already.
99% Confidence: While rare (there were only four in this study), a 99% confidence virtually guarantees that a given card has an impact on the match—there is only a 1/100 chance that this result is wrong. You should pay careful attention to these.
Just because a card does not show up a significant does not mean you should not care about the results. But you should not treat them as gospel, either. The best analogy I can draw is to that of a baseball team. It is possible that your star hitter goes through a minor slump at the beginning of the season and an average player goes on a torrid streak at the same time. That does not mean the average player is better than the star; it just means he was better during that period of observation. So don’t be surprised if the study ranks an average card lower than one you perceive as a top-pick. My card-by-card commentary will help qualitatively decide whether this was just statistical coincidence or if it might be part of a larger trend.
Additionally, just because a card is sub-50% does not mean you should automatically stop playing it in all of your decks. Going back to the baseball analogy, a team cannot field nine players with batting averages all over .300. But it can maximize its performance by putting its best players in the lineup. So if you need to play Vampire’s Bite to have enough black cards to justify running (Sorin Markov), go right ahead; but if (Vampire Lacerator) is floating around in your sideboard, the data indicate you should swap out the Vampire’s Bite.
32 Comments
i like how ur honest about ur results. you note that things might be higher (e.g., Sky Ruin Drake) because of a smooth mana draw and what not. Hurts me though that Kraken Hatchling is so low :( he's so good. especially in multiples of 3-4 with a coupla flyers doing the dirty work.
I've read both of your articles and found them both very interesting as well as enlightening. However the same question keeps bouncing around in my head. Why are you doing this instead of Wizards? I can think of a hundred different business reasons for Wizards to track that data, and hundred more ways in which I could use it to improve the business. I'm not just talking about the money side of the equation either. If I was part of Wizard's R & D, I could use this information to create better cards, or at least create fewer bad ones. I think you stated in both articles that Wizards doesn't currently track this information, but I'm curious as to whether they track any game/card related statistics at all? For instance it's easy to tell that Jund is winning most of the tournaments, but is it winning because the Bloodbraid Elf cascades into a Blightening or because it cascades into a Putrid Leech? That kind of information would be easy to gather on MTGO because every play is recorded. That would sure give them a good idea of why Jund wins, not just that it wins, and from a player's perspective why it loses and how to avoid that. At the very least they could aggregate the raw data and make it available to people, like you who are interested in doing something with it. Thanks for your efforts and I'm very interested in reading about the other three colors.
As I alluded to in the article, you'd have to ask Wizards why they aren't doing this themselves. They have the ability to collect so much data that you would be able to filter out luck and skill for every card. And, personally, I feel that is the only way to see what is actually good.
I've wanted to do this since the Top Eight decklists went online, but never took the time to figure it out. Great read, and I look forward to the rest of the series.
Another very cool article.
A few things that I had a big reaction to. You don't know which cards are in people's hands that they didnt play by the end of the game? (That's a huge hole in the methodology) A cancel that sits in someones hand but never finds a spell it wants to counter yet altered the way the holder played, usually by delaying actions and reducing their tempo position is pretty horrible, worse than not having it in those cases. So yea the whole issue makes me very skeptical about the meaningfulness of these numbers(Not to mention the whole smooth land draw curve thing, maybe it indicates that drawing alot of land is alot better than drawing just 4 land or 3 but still being able to cast something every turn since your landfall triggers aren't going off.
hatchling is bad because theres alot of evasion and ways to negate his presence for a turn (shortcutter hookmaster adv gear (too big to block profitably) trusty machete, windborne charge, the ally aerialist and the white uncommon aerialist, creature enchants ect. and yeah as you said even when he is blocking he's not stoping scope triggers or the horde of 2/1's.
One thing that sealed tends to do is reveal the strong colors of teh format from randomly opened product, as we saw at one event 6/8 of the top 8 decks played black, does this mean that if you dont play black in draft you'll lose? No. Drafters adjust to what's better and take it more often leaving opporunities for people to prosper using less popular colors with the extra supply of them.
Yeah like I tell people paralyzing grasp is horrible, at one mana it woudn't be near contructed playable, in limited it would be solid at three mana it's a joke.
I wont believe cancel is any good in zen, its double blue and easily to spot being palmed, only good if youre already ahead and in that case if you just kept dropping threats youd probably stay ahead more often.
While I applaud your efforts, I think you're spending a lot of time and energy trying to answer a very simple question that is ultimately not that useful in draft or sealed deck situations.
Most of your analysis comes down to "Is this card any good?" However in draft and sealed decks, we are never analyzing cards in a vaccuum.
In sealed we have to look at all the cards in our pool and make decisions like whether or not playing black is worth it for a bomb like Ob Nixilis when the rest of the black cards are underwhelming. We have to look at whether or not the cards in our deck make us want to play a specific below average card. Example: I had a sealed deck with 2 Trusty Machetes and a bunch of evasion creatures. I ended up playing a Caravan Hurda over a Pillarfield Ox in that deck because I felt the Hurda worked better in that deck even if it is a worse card in general.
In draft, you need to be keenly aware of what cards you already drafted and draft what you need over the best card in the draft. Example: If I have already picked 2 Hideous Ends, do I really want to take a third one over a decent black creature? Or, do I really need another 3 drop when most of the cards I've take so far cost 3 mana. When do I start taking worse cards that fit my mana curve over better cards that will make my deck clunky?
These are the interesting questions when it comes to limited deck-building analysis. If you want to look at the power level of individual cards, look at constructed.
In constructed you have the option to play whatever cards you want. In constructed questions like the following are actually non-trivial. "Is Blade of the Bloodchief any good in a Zendikar Block Vampire deck?" "Is Harrow worth it in a G/U Lotus Cobra deck?
By it's very nature, constructed tournaments are much more concerned about which cards help you win games and which cards don't because you can constantly tweak your decks and make small changes. In limited tournaments, you entire deck changes each time. Your analysis technique would be great for constructed while it is kind of trivial for limited.
I actually think the exact opposite is true. One of the flaws this system has is the inability to pick up on synergy--that is, Umara Raptor is okay by itself but great when two more allies hit play. Thus, half the time it is biased towards being okay, and the other half of the time it looks really amazing. Thus, it lands somewhere in between the two, even though it should have a bimodal distribution. Constructed allows you to exploit this kind of synergy, which is bad news for this kind of modeling.
Meanwhile, I think you are vastly underestimating this system's importance. While you are certainly stuck with the pool you get, you have a lot of options in choosing which colors to play. If you have a lot of cards at the top of one of these lists in one color, then you should probably be running that color. The "blue is bad" philosophy isn't universal--you should play it when your pool had a ton of flyers.
The other notable thing is that players do not have a great grasp over what is good, at least if you buy into my methods. Although sealed and draft are different formats, TCGplayer's pick rankings (which come from how the public drafts) place Steppe Lynx as number eight in the white commons--ahead of Kor Hookmaster, Kor Sanctifiers, Kor Cartographer, Cliff Threader, and Narrow Escape. I think that is a little silly.
I'm not sure how I'm vastly underestimating your results.
You've used a very detailed time-consuming statistical analysis to come up with a card ranking system for Zendikar. You have then written in detail how there could be serious flaws in your rankings based on the limits of your analysis techniques.
How is that more useful than a reasonably good player writing up a set review based on their personal experiences and opinions on cards? Sure, those kind of reviews get things wrong, but you've admitted that your analysis may get things wrong as well.
As to the idea that players don't know which cards are good. Well, that mostly depends on where you go. Most of the players that are interested enough in the game to find online articles about drafting (ie, your audience) are probably knowledgeable enough to put together a reasonably good drafting pick order. (Especially if they've read a decent amount of drafting articles from this website)
Basically the point of my original article is that the complexity of your methods don't match the complexity of your goal.
In sealed deck there are just too many extraneous factors to consider, and boiling it all down to pick ranking is not all that informative.
Standard and Block Constructed are different stories. There you have a lot of decks in which 80% to 90% of the cards are the same. Thus, you have a much more controlled environment to look at certain cards that make splash appearances in certain decks. It's still far from a perfect situation, but it's certainly a better place to analyze the importance of playing specific cards in Magic.
You are right in that both forms of analysis (regular pick orders and quantitative studies) have flaws in them. However, my methodology is crystal clear here--I don't make any hidden assumptions here that frequently find their way into qualitative approaches.
Pick orders fail for three reasons. One, they are often written before the person has had any more than a weekend or two to get adjusted to the format, which makes them pure speculation (helpful, but not exactly what we ideally want). Second, even after the player has gotten in ten or more events, they are still dealing with a very small sample size. If I only played Steppe Lynx in three of my decks, and I only drew it in 40% of the games I played, then I might not be able to detect that it is, in fact, a great card. Finally, even after getting a ton of experience with the format, you are going to get an extremely biased sample simply because the same player played in all of the games. A statistician would have an absolute fit over something like that.
"You've used a very detailed time-consuming statistical analysis"
Whoa there, in what way is this detailed?
Overall this is very useful information, but if your utilizing the data for draft I think you have to know the differences of the formats. I'd probably avoid a hatchling in my sealed deck if I could because it won't even trade with anything, but it does buy you time against some of the best archtypes in draft that your likely to run into. Like ArchGenius pointed out, we don't analyze these cards in a vacuum when we pick them over another card. Experience with the format teaches us we need answers to devastating two drops in the format, and cards like hatchling can effectively earn you 8+ life. That's not really a bad deal for one blue mana in ZZZ draft.
I'd also likely pick a tern or raptor over the drake early on in the draft, as it's more important to be geared towards the early game in ZZZ. Cards like raptor also get a lot better when you pick up some other decent allies, so synergy is a significant factor when evaluating cards in draft.
I have to disagree with ArchGenius on one thing, I don't feel the analysis is trivial. I see this as a tool for limited players, but certaintly not a bible. This data is no replacement for hard earned experience, but I think it is a reasonable supplement to it and a reasonable experiment. I question how useful it is for drafters, but the information is excellent for sealed play. Drafting experience doesn't equate to sealed experience, and I draft far more often than I play sealed so I find this info very useful. I've given a lot of thought to playing in the selaed queues because of the better payout but normally stick with draft because its what I know, but you've sparked my interest enough to probably send me in that direction more often than I normally would so I can honestly say your article has had an impact for me. I look forward to seeing the rest of your data.
if hideous end is number two...i really really want to say vampire nighthawk is number 1...but since its commons only im going to sa surrakar marauder or giant scorpion
i'm pretty sure he was trying to give us clues that the top black card is soul stair expedition.
you're probably right but ive lost more games to surrakar marauder than soul stair expedition....and won more with giant scorpion
Not to give away too much, but Giant Scorpion ranked #14 with only a 50% mark.
that i am slightly shocked by though i could understand hideous end and disfigure being top 5 maybe surrakar marauder, and from all the hints Soul Stair Expedition so there is at least4 of blacks top commons. Now if some of the really really bad vampires got in there like Bloodseeker, and Guul Draz and lacerator(decent not great) then even after that Giant Scorpion should at least be in top 10 black commons. If not Im guessing people just dont play it. But its an insane card and I have pro statements to back it up.
I am 99% sure he's hinting at Soul Stair Expedition.
His "hint" regarding Soul Stair Expedition, by my read, is that it performed above what one might expect, and well above the crappy Ior Ruin and Sunspring expeditions.
If this system rates Soul Stair Expedition as the "surprise" #1 black common in Zendikar limited, all that will tell me is that the system and its data are fatally flawed for generating any kind of pick/desirability order.
It is not Soul Stair Expedition.
Guess your degree? Archaeology?
Honestly I just don't see this being useful, the numbers are more misleading than anything. Thanks for the summaries but I think traditional pick orders are far more useful.
did you really just tell us all to lobby wizards on you're behalf, so you could collect draft data at pro tour whatever?
maybe you shuld ask them yourself. or maybe you already did and they told you no. if so aren't you just asking us to annoy them about it? seems kinda presumptuous, don't ya think?
Yes, I did just ask you to lobby Wizards on my behalf, and no I have not already asked them about it. But if you want to get better data--and another person has already commented that he does--then that would be the best way to do it.
Spell pierce works in ub control builds, protects your evaders from most removal..people tend to kill those fliers asap...hittinh harroqw with pierce on turn three is usually gg...kraken isn't too bad either...scorpion will be top black common
I echo both ArchGenius's appreciation for the effort that has gone into this series, and his concern over the usefulness/accuracy of the data for impacting the way I approach a draft or sealed event given the acknowledged gaps and problems with the methodology.
"On a different note entirely, I am curious whether there is a statistical bias for higher mana spells. Some may question whether Sky Ruin Drake is so high on this list because the very fact that a player cast it indicates that they had a relatively smooth mana draw and survived at least until the fifth turn."
This relates to my main issue...the fact that the system has no way of measuring the negative impact of a card stuck in hand--either from mana issues or context issues--during a loss that might have been a win had that stuck card been something castable. Perhaps the useful aspect of this kind of data is, "Sky Ruin Drake is better than you think," which is fine (that's what I've been saying all along anyway), but knowing that players won 67% of the time they were able to get down their Sky Ruin Drake doesn't compel me to take it higher in a draft or consider it more of a reason to play blue in sealed than any of blue's other common two-power flyers.
That being said, I expect Disfigure to be the #1 black common. Its cost and ability to hit black creatures is amazing, and supports aggro black strategies extremely well. The ability to kill an opposing threat with Disfigure while having the mana left to cast your own threat on turns 2-4 creates devastating tempo advantages that Hideous End at three times the cost just can't pull off.
I just tested this and found that we can be more than 90% sure that Hideous End is better than Disfigure.
Well, no. You can be more than 90% sure that sealed deck games in which Hideous End is cast are more likely to be won than sealed deck games in which Disfigure is cast -- and the distinction between the two statements, of course, is where (as others have been saying) this methodology comes up short.
Right, that's why I don't know what to do with this data. I guessed Disfigure as #1 since Hideous End is #2, but if something other than Hideous End or Disfigure is #1, then what exactly am I supposed to do with this data? Because it's wrong to pick another black common over H.E. or Disfigure.
I realize, like Shaterri says, that the data reflects "games won when cast" and isn't intended to convert directly into a pick order, but then what exactly should we be doing with it?
Store it away as trivia?
i kind of agree, there are no cards i would pick over hideous end and disfigure and i can honestly see disfigure being slightly worse but not 90%..thats just craziness. So I still fail to see exactly what im supposed to walk away from this with. If someone has smooth mana draws and casts a 5 mana flyer they have a good chance to win? well that was kind of a given already.
To say I am 90% sure X is better than Y, then there is only a 10% chance that Y is better than X but the data just coincidentally formed itself in the manner it did. It does NOT mean that X is 90% better than Y.
Also, I tested this theory that high mana costs distort win percentage, and I found this: for every one mana extra a card costs, its win percentage increases by 1.1% on average. Thus, it should not be taken as a given that I should be in good position to win just because I paid five mana to cast a spell. The quality of the spell is also of great importance here.
I ended up at a conference with some friends for a weekend and heard the best motivational speaker argue points and just make you think twice about what your doing day to day.