wspaniel's picture
By: wspaniel, William Spaniel
Dec 29 2009 1:06am
0
Login or register to post comments
1947 views


Click here for Statistically Speaking: White!
Click here for Statistically Speaking: Blue!
Click here for Statistically Speaking: Black!

Before we get to red’s results, I wanted to spotlight an anonymous reader’s response to the black article. Surrakar Marauder performed way worse than one would have suspected—a pedestrian 52%—and I was unsure why. The reader responded that black is a very popular color in sealed (true—Swamps were the second most frequently played card, just narrowly edged by Mountains), and thus nearly everyone has creatures to block the Marauder. Hence, for two mana, you get a 2/1 creature a lot of the time. We shouldn’t expect such a card to do very well, and sure enough that is what we saw.

Note, however, that Bladetusk Boar didn’t suffer from the same effect. This could possibly be attributed to its larger size. Keep that in mind as we go through the cards.

As always, you can check the methodology of the study by viewing the appendix at the end of the article. Regardless, here are the results:

Tuktuk Grunts: 65%**
Torch Slinger: 62%**
Bladetusk Boar: 60%*
Plated Geopede: 60%**
Burst Lightning: 57%
Goblin War Paint: 53%
Slaughter Cry: 52%
Magma Rift: 52%
Spire Barrage: 52%
Goblin Shortcutter: 48%
Highland Berserker: 47%
Ruinous Minotaur: 45%
Shatterskull Giant: 44%
Molten Ravager: 42%
Zektar Shrine Expedition: 34%***
Demolish: Not Eligible
Seismic Shudder: Not Eligible

*Statistic is significant at 90%.
**Statistic is significant at 95%.
***Statistic is significant at 99%.
A card is ineligible when there are fewer than 20 observations.

Despite being a highly played color, red drops off quite a bit after its top five commons. Let’s find out why:

Tuktuk Grunts: 65%**

I am a surprised to see Tuktuk Grunts on top as well. Unlike Crypt Ripper, the Grunts do not single handedly win the game; perhaps that is why the Ripper finished three percentage points higher and at the 99% significance level. However, they do contribute their fair share to the game, surprising opponents unprepared for a hasty creature while beefing up the rest of the allies you have in play. That is enough to get you by in most limited formats, Zendikar sealed not being an exception.

I certainly would not play Tuktuk Grunts over Torch Slinger, Bladetusk Boar, Plated Geopede, or Burst Lightning. But Tuktuk Grunts is comparable to Shatterskull Giant. Tuktuk finished 21% higher than Shatterskull, and a difference in means test revealed that Tuktuk is more than 99% likely to have a better finish in this study that Shatterskull. So unless you have no other allies in your pool, you really ought to be running Tuktuk Grunts instead.

Torch Slinger: 62%**

Obviously Torch Slinger received a 62% because it destroys creatures and leaves a body on the battlefield to contribute to your offense. That being said, I frequently fall into the trap of avoiding casting Torch Slinger without the kicker. Do not do this. You will often be better served by simple having an additional attacker, especially if you are waiting to draw a second Mountain off the top of your deck. Walk the fine line, and Torch Slinger should do better for you than this statistic shows.

Bladetusk Boar: 60%*

It is getting tiring to keep discussing how good evasion is in Zendikar, but Bladetusk Boar forces me to readdress this issue. Red creatures are pretty dorky for the most part, so if anything ever does block the Boar, you’ll almost always trade straight up. Failing to find any blockers, your opponent will simply eat three damage repeatedly. At that rate, he will not be able to survive very long, especially if you got off to a fast start with the next card…

Plated Geopede: 60%**

Once again, landfall is good. Plated Geopede will normally go uncontested for three or four more turns provided that you keep maintaining your land drops. Like Bladetusk Boar, that puts quite a clock on your opponent, especially when you can back it up with burn to selected blockers or just straight to your opponent’s dome. The Geopede can also screw with your opponent’s blocking plans with instant speed land search in the form of Harrow.

Finally, this guy demonstrates the importance of holding back land in hand when playing Zendikar. Although normally you do this to bluff combat tricks, failure to do so makes landfall creatures weaker in the late game. Imagine if you just kept playing topdecked lands into the late game. Then you draw Plated Geopede. Congratulations, you now have a 1/1 creature with first strike. Unfortunately, that is a virtual dead card in the late game. Contrast that with what would happen if you had held onto to those lands. Now you will get at least a couple swings in with a 3/3 first striker if not more.

Burst Lightning: 57%

Yes, Burst Lightning does better than average, but again a removal spell falls short of statistical significance. Theorists in the article feedback have suggested that this is because everyone plays removal spells, so Burst Lightning has a disadvantage when compared to cards that only good players know to use.

I understand the logic of the theory, but it is time to start questioning the concept. Even if everyone plays a certain card, if that card is truly spectacular, then it should be boosting its caster’s win percentage regardless. And it seems like everyone is playing other cards as well. I observed 116 instances of Burst Lightning. I also saw 118 instances of Goblin Shortcutter, but logically that fell below average. But most damning is the fact that I recorded 147 instances of Plated Geopede. Truly everyone must be playing that creature as well, except Plated Geopede came out as statistically significant. Burst Lightning did not.

I’m not prepared to say that removal is overrated yet. However, the conventional wisdom has always been that removal is the top dog, and you should be splashing to get access to it. Some of this data at least suggests a contradiction to this theory, so I’m going to keep this in mind as I gather data on Magic 2010.

Goblin War Paint: 53%

Creature enchantments run the danger of being a two-for-one target, but careful application of them (that is, not casting them with your opponent having mana open) reduces that risk. Moreover, a card like Goblin War Paint that boosts toughness makes the creature harder to remove later, thus reducing the risk that a random Burst Lightning or Magma Rift will successfully come along later. And that led to Goblin War Paint as coming out average.

Slaughter Cry: 52%

Combat boosters work when they can simultaneously deal more damage and increase the likelihood that your creature will survive after the fray. Slaughter Cry fulfills the first parts of this by pumping up the creature’s power by three; giving it first strike takes care of the second. What keeps Slaughter Cry lower than comparable card such as Vines of Vastwood is its mana cost—at three mana, Slaughter Cry takes up the better part of your turn’s allotment. While not the end of the world, this drawback limits its power.

Magma Rift: 52%

Magma Rift appears to be about as “fair” of a removal spell as you are going to find. Red decks have a difficult time removing big creatures without the support of other colors’ removal spells. Rift’s five damage nails most creatures in the format. However, it is a complete tempo drag—casting it any time before turn six always feels like an awful play. You also can’t play Magma Rift at instant speed, so you lose much of your ability to go two-for-one with opposing spells.  

Spire Barrage: 52%

You need a lot of Mountains to make Spire Barrage a real threat, and it is difficult to have that kind of concentration in a single color in a sealed format. As such, Spire Barrage joins Magma Rift as being a “fair” removal spell. At five mana, good luck casting something else with it on the same turn.

Goblin Shortcutter: 48%

I think Goblin Shortcutter gets a lot more credit than it deserves. A 2/1 creature doesn’t go very far in the late game, and the Falter ability makes only a meager impact on the board. Sealed tends to be slower than draft, so you would need a remarkably aggressive deck to make full use out of Goblin Shortcutter. If you get that kind of pool, run with it; if not, try to find another card.

Highland Berserker: 47%

Now we get to some less than stellar allies. Why do people value Highland Berserker as highly as they do? Even with other allies, the Berserker isn’t very good. If you curve right, he comes down on turn two. From then, your allies only get first strike every time another one falls into play. However, blocking isn’t a popular thing in Zendikar anyway. And if you draw it later, you get a bad creature and whatever bonuses come out of the other allies you have in play. If he’s your own ally, then you have a really terrible deal. Highland Berserker is permissible in a deck without a lot of allies, but don’t include it anywhere else.

Ruinous Minotaur: 45%

Five power creatures are tempting at only three mana, but blowing up land usually comes at too high of a cost. If you do play it, consider it a wall up until the late game. As blocker, it will trade with just about whatever your opponent attacks with, which does not happen frequently in this format. Swing, however, and crafty opponents will let it through to disrupt your tempo in the early game. And as nice as five damage is, it usually isn’t worth the price, especially when you need to blow up a land on Magma Rift.

Shatterskull Giant: 44%

Shatterskull Giant gets more respect than it ought to considering he merely a 4/3 vanilla creature for four mana. I think you could do worse (see below), but I would still want to put something else in my deck. On the other hand, I wouldn’t be completely embarrassed to play it either, so maybe I am being too hard on it.

Molten Ravager: 42%

If you want to drain a lot of mana into a card, look no further than Molten Ravager. Unlike Crypt Ripper, you need to continuously tap Mountains to make the Ravager do anything other than stand there as a wall. This detracts from your board production, which ultimately comes back to bite you. Thing get a little bit better in the late, late, late game—once you run out of spells to play, you can start pumping with reckless abandon. However, it is getting there that is the difficult thing. If you are looking to race, Molten Ravager and its 42% just isn’t your card. And if you are looking for a really good card in the late, late, late game, you are caring far too much over a single contingency.

Zektar Shrine Expedition: 34%***

Dare I label Zektar Shrine Expedition the single worst common in Zendikar? Why, in fact, I will! (I never bring controversy upon myself.) Well, I should at least clarify. Some cards are obviously worse than Zektar Shrine Expedition—Caller of Gales, Lethargy Trap, and Mire Blight come to mind—but at least those cards don’t sucker you into playing them. Meanwhile, a full 67 brilliant young minds spent their two mana in hopes of getting a Ball Lightning. 44 of them came back sad. That’s such a large degree of failure that it comes out as significant to 99%, the only bad card to come out of the study.

Honestly, if you end up playing one of these (and you shouldn’t), consider my advice for Ruinous Minotaur—Zektar Shrine tokens are really big blockers. Yes, there will be times that you can swing for a ton of damage and finish the game. However, your opponent will just let it through some of the time, preferring to pay seven life to set you down a card. Other times, they will spoil a removal spell to preserve their life total. Sometimes it will get chump blocked. Finally, it just sits there when you draw it in the late game and never roll through enough lands to activate it.

But, then again, all that discussion on Zektar Shrine Expedition’s strategy is for nothing—you aren’t going to play it any more, now are you?

Demolish: Not Eligible (3-2)

Example of good land destruction in limited: Mold Shambler. Example of bad land destruction in limited: Demolish. My opponent would have to have a bunch of ridiculous artifacts in his or her deck before I would even begin to think about sideboarding in Demolish, so I doubt I will be casting one bfore Zendikar rotates out.

Seismic Shudder: Not Eligible (3-5)

Unlike with Demolish, I can see the occasional use for Seismic Shudder from out of the sideboard. But such a scenario is rare, and even then I probably wouldn’t be too enthusiastic about putting it in the deck.

Of a little more than 1,000 players, 585 played Mountains, the largest total for any of the basic lands. And with four cards coming back as statistically significant (and another being Burst Lightning), it should not come as much of a surprise. However, red drops precipitously afterward, so keep that in mind before you blindly center your deck around it just because everyone else is playing it.

William Spaniel
williamspaniel@gmail.com

Appendix: Methodology

I assume that skill, luck, and the quality of a player's deck determine who wins any particular confrontation. While undoubtedly skill matters, this study is focused on the luck and card quality factors. Players actually have a great deal of control over both of these, as a poorly-constructed deck will win less often than a well-constructed one. From this, we can conclude that some cards contribute to wins more frequently than others. If an average card ever reached play in a game, we would expect its controller to have only won that game around 50% of the time. But if a truly exceptional card reached play, we would expect its controller to have won upwards of 70% of the time.

Watching replays of Pro Tour San Diego qualifiers on Magic Online (and carefully avoiding the qualifier in which the system malfunctioned and everyone played a 140 card deck—yes, this actually happened), I recorded the results of more than a thousand players. Every time a card hit play, I would record it as either a win or a loss, depending on what ultimately happened in that game. If the card reached play multiple times (perhaps because of a (Grim Discovery)), it only counted once. But if a player cast multiples of a single card, I counted that card multiple times.

Such a large number of observations were necessary to remove the play skill bias that would have shown up in a smaller-n study. It also shrinks the margins of error, allowing for better hypothesis testing, which I ran at 90%, 95%, and 99% confidence.

For those of you unfamiliar with hypothesis testing, here is a brief explanation for what each of those means:

90% Confidence: When I say we can be 90% confident that a card positively contributes to victories, it means that there was only a 10% chance that the card has no impact and the data came back so eschewed based on pure luck. While the odds of being wrong here are only 1/10, we should be very skeptical of these results as statisticians. Generally, it is only a good idea to accept these results if we have a good theory behind them. For example, I would accept (Burst Lightning)—a quality removal spell as being true—but I would cast doubt on whether (Blood Seeker) was actually affecting things.

95% Confidence: This is the gold standard of statistics. When a card meets 95% confidence, the likelihood the card is merely average but we got this extreme of data back is only 1/20. At this point, it is a good idea to start thinking of theories to justify the results if you do not have one already.

99% Confidence: While rare (there were only four in this study), a 99% confidence virtually guarantees that a given card has an impact on the match—there is only a 1/100 chance that this result is wrong. You should pay careful attention to these.

Just because a card does not show up a significant does not mean you should not care about the results. But you should not treat them as gospel, either. The best analogy I can draw is to that of a baseball team. It is possible that your star hitter goes through a minor slump at the beginning of the season and an average player goes on a torrid streak at the same time. That does not mean the average player is better than the star; it just means he was better during that period of observation. So don’t be surprised if the study ranks an average card lower than one you perceive as a top-pick. My card-by-card commentary will help qualitatively decide whether this was just statistical coincidence or if it might be part of a larger trend.

Additionally, just because a card is sub-50% does not mean you should automatically stop playing it in all of your decks. Going back to the baseball analogy, a team cannot field nine players with batting averages all over .300. But it can maximize its performance by putting its best players in the lineup. So if you need to play Vampire’s Bite to have enough black cards to justify running (Sorin Markov), go right ahead; but if (Vampire Lacerator) is floating around in your sideboard, the data indicate you should swap out the Vampire’s Bite.

37 Comments

Again, I think that tuktuk by Anonymous (not verified) at Tue, 12/29/2009 - 03:54
Anonymous's picture

Again, I think that tuktuk grunts being at the top of your list means you have to reevaluate your methods. Grunts is a bad card that unlike crypt ripper or steppe lynx, is bad in almost every deck. It's slow and weak, competing with much better red 5 drops. It's not even great in allies. I think these statistics are more misleading than helpful.

Of course these are sealed results, so the low placement of zektar shrine and shortcutter have merit, but the data is still unhelpful.

Again, just because a card by wspaniel at Tue, 12/29/2009 - 12:48
wspaniel's picture

Again, just because a card appears at the top of a list does not make it the "best" card in the color. That requires a different test, which I only do where noted. It takes a lot more than a coincidence for Tuktuk Grunts to do that well but actually have a value worth less than 50%. So maybe you should reevaluate your opinion of the card. ;)

Then you admit that this test by Anonymous (not verified) at Tue, 12/29/2009 - 20:45
Anonymous's picture

Then you admit that this test shows nothing, it doesn't show how you should draft or build your deck at all.

It's not coincidence that makes tuktuk grunts so good, it's your methodology. By your statistics iona would undoubtedly be the best card ever when it's not even playable. Makandi shieldmate may look like a bomb when it's a mediocre blocker. To have meaningful results you need to look at how many times you win when you draw a certain card and in which archetypes/matchups. You can't do that, so please refrain from making any analysis before you can. It looks like a lot of wasted effort.

I don't suggest you continue this series until you tell us what conclusions we can actually draw from the data. I have learned virtually nothing from these results, whereas other conventional limited articles on this site have valuable insight.

I most definitely do not by wspaniel at Tue, 12/29/2009 - 23:37
wspaniel's picture

I most definitely do not admit that. No methodology is perfect. The big difference here is that we can rigorously find errors because the assumptions are all laid out. You can't do that with informal analysis.

And, by the way, Makindi Shieldmate got a 50%--so much for it looking like a bomb.

You're dodging the by Anonymous (not verified) at Wed, 12/30/2009 - 00:34
Anonymous's picture

You're dodging the question.

What can we actually learn from this analysis? You have gone 4 articles without actually answering this, without drawing any kind of valid conclusion backed up by the facts.

Informal analysis allows us to say "ok card A is better than card B because of C D E". You yourself said statistical analysis doesn't tell us which card is best. So what does it tell us? It tells us what card might be better than we'd expect? We guess at what the numbers actually mean? How is that better than informal analysis? I'm a mathematics major, so I like to let the numbers talk, but without context, without understanding, mere numbers are useless.

What errors have we actually found? The 'assumptions laid out' have completely detached the analysis from the game itself. "If all cards have the same chance of getting played", "if all cards belong in all decks", "if sideboard cards get brought in every matchup", etc. These are invalid assumptions with huge sampling bias that make the results utterly meaningless.

So what does tuktuk grunt's 63% actually tell us about the card, how it should be included and played? If it's not the best card, or even a good card compared to other red commons, why do we even bother giving it a number?

Don't look at makindi shieldmate (even though 50% is way too high for shieldmate), look at ondu cleric. You yourself noted that the statistic is misleading. The moment you discovered that result you should've stopped and reevaluated your entire project.

Some comments: 1) Have you by wspaniel at Wed, 12/30/2009 - 00:45
wspaniel's picture

Some comments:

1) Have you considered that the sample of games you have played is very small, and thus YOUR results might not be accurate? That's one of the take home points of these articles. While I don't want to discount your games as being irrelevant (they are not), you need a bigger picture, and you simply can't get that out of the number of games one individual plays.

2) Ondu Cleric did not come back as statistically significant, so its result does not deter me in the least. But it does go to show you that if you have a lot of allies in your deck, it's not the end of the world to run it.

3) A correction (CAPS are mine): "Informal analysis allows us to say 'ok I THINK card A is better than card B because of C D E.'" There's a big difference there. Statistics won't tell us this kind of stuff without an even larger study, but I can say with a lot of certainty when a card is good or not. And, as we have seen, not all of these cards are obvious.

Good comments but you still by Anonymous (not verified) at Wed, 12/30/2009 - 06:27
Anonymous's picture

Good comments but you still haven't said what the point of this series is! I still don't know how good tuktuk grunts actually is because there are too many confounding factors to reach any conclusions based on the data.

1) I'm not just relying on my experiences here, I'm relying on general discussion around the cards, which is admittedly based on anecdotal evidence but the experiences of hundreds of players, many of whom are very good. The consensus of this discussion is that tuktuk grunts is not a very good card, and certainly not a game winner over geopede or lightning. I've played with the cards enough to conclude that the consensus is not too far from reality. I'm not going to argue that my results are 100% accurate, I'm saying that you have no basis to claim that your results resemble reality or the 'bigger picture'.

2) Well yeah, ondu cleric is a very important card in an allies deck. How does "55% of games where player played ondu cleric resulted in win" tell you that though? It doesn't, because 55% is just a number and not evidence. It doesn't account for the archetype or matchup at all.

3) You don't need a larger sample size. That's not your problem. Your problem is you've yet to make a conclusion, or even a meaningful analysis. Are you trying to say tuktuk grunts is a good card, a must-include or a high pick? I don't think you've shown that at all, like I said, by your analysis iona would be the best card ever printed. That doesn't even conform to common sense. The fact that your results are not even adjusted to something as simple as mana cost shows that they are no better than mere opinion. You're merely extrapolating from meaningless numbers.

In summary, various posters have pointed out the critical flaws of your methodology, yet you insist that your results have merit. I'm finding it very difficult to draw anything from these numbers to improve my play. Perhaps you can build a pick order out of these statistics and tell me how that goes.

William, You rock. Love this by Kris Rhodes (not verified) at Thu, 12/31/2009 - 00:00
Kris Rhodes's picture

William,

You rock.

Love this series.

Like any statistical analysis, you need to understand what your analysis actually is measuring-- I understand it, but in light of this guy and other's comments, it would be better if you stated your H0 and HA a little more clearly.

It would be very beneficial if you would post your full data... I'd love to redo your analysis and can think of a few more that might be valid.

You should post the P value for every card, as well as the N for each card. Most people won't care, but us statistics gurus will appreciate. :)

Dear anonymous:

The results are obviously helpful if you understand them. These results are better then opinion; they mean something very specific, in fact. And yeah, the conclusions you can draw from this analysis aren't a pick order. You find that data set that can reach the conclusion you want and I'll analyze it, though.

^^^^^ by moerutora (not verified) at Tue, 12/29/2009 - 07:06
moerutora's picture

Instead of complaining, why dont you offer a way he should do his statistics and did you know that 83% of statistics are made up. =)

Based on the nature of magic, by Anonymous (not verified) at Tue, 12/29/2009 - 11:56
Anonymous's picture

Based on the nature of magic, you've built sample bias into these "studies".

Higher cost cards (and cards with double colored mana in their cost) have a built in advantage in your ranking, because their use implies you've avoided color/mana screw.

I have discussed this in the by wspaniel at Tue, 12/29/2009 - 12:49
wspaniel's picture

I have discussed this in the other comments, and you are right. In the future, I hope to correct for this. However, it will take a lot of work.

You've only addressed it by by Anonymous (not verified) at Wed, 12/30/2009 - 00:31
Anonymous's picture

You've only addressed it by equating "mana screw" with "mana flood".

Take the text output of the games, pipe through awk, and then do a histogram of losses by number of lands on the table.

As an experiment, I tried to figure out the card that would do the best in your rankings to illustrate the flaws we've been discussing.

Realm Razor:
* Requires you to have 6 lands
* Requires you to have 3 colors of mana available
* Is rarely played when you're losing

I think these articles are incredibly interesting and there's a lot to be said for thinking about and discussing the methodology. I appreciate not only the time you've put into this, but also the way you've discussed people's comments. Very good stuff.

Plus the nature of conceding by Anonymous (not verified) at Tue, 12/29/2009 - 12:03
Anonymous's picture

Plus the nature of conceding makes this methodology even worse.

My opponent has 2 2/2 fliers and I am at 4 life. No cards in hand, etc.

If I draw a disfigure, I will kill one of them and probably lose. Disfigure goes down in your rankings.

If I draw a crypt ripper, I will concede without playing it. Crypt Ripper's ranking is unchanged.

Also I sincerely hope you did this by copy/pasting the game log into a text file and using grep.

Concessions are potential by wspaniel at Tue, 12/29/2009 - 12:51
wspaniel's picture

Concessions are potential problems, but I don't see why someone in that situation would concede after playing Disfigure but before playing Crypt Ripper.

I think the original poster's by ArchGenius at Tue, 12/29/2009 - 12:59
ArchGenius's picture

I think the original poster's point is that with Disfigure, you buy yourself an extra turn, but will still most likely lose unless you get another top deck. With the Crypt Ripper, you can't avoid or postpone a loss, so you concede without playing the Ripper.

Oh okay. That's a fair by wspaniel at Tue, 12/29/2009 - 13:01
wspaniel's picture

Oh okay. That's a fair point--though at least Disfigure in this case isn't a guaranteed loss.

It occurred to me that if two by wspaniel at Tue, 12/29/2009 - 14:18
wspaniel's picture

It occurred to me that if two non-flyers are pressuring you, then you would much rather have the Crypt Ripper than the Disfigure. So this problem (at least in the Crypt Ripper/Disfigure dyad) isn't as pervasive as it might have seemed.

While I argue that your by Anonymous (not verified) at Wed, 12/30/2009 - 00:20
Anonymous's picture

While I argue that your counter example doesn't actually correct the sample bias. I think we can agree that "Card was drawn" is a much more relevant way to score the matches than "Card was played". This is especially relevant to the issue we're discussing.

It certainly does not correct by wspaniel at Wed, 12/30/2009 - 00:46
wspaniel's picture

It certainly does not correct things, and "cards drawn" or "cards played in deck" would be a lot better--though the latter would need a huge n to pick up on things.

Red in draft vs. sealed by Felorin at Tue, 12/29/2009 - 14:03
Felorin's picture

I think the values of the red cards are a lot different in draft, where you can try for & sometimes achieve a mono-red deck, as opposed to sealed where you almost never will (and even if you do, the card quality will be lower than in a mono-red draft).

In particular, Spire Barrage is a lot more valuable when you can go mono-red. Also you have better chances of picking them up in multiples in a draft, where anybody not mono-red will probably pass them. Zektar Shrine Expedition becomes more valuable in draft also, as you're more likely to have enough critical mass of burn spells to make burning someone out viable. Molten Ravager probably goes up a little as well, since when you can afford to pump you'll be pumping it higher. Also since more of the damage in draft is dealt by 1-3 drops, having a 0/4 wall on turn three may be more relevant in draft than in sealed. Goblin Shortcutter and Plated Geopede might go up a bit in draft vs. sealed too.

I am really enjoying the series, but I'd love to see similar analysis done for draft someday as well!

I just cant see how a card by ShardFenix at Wed, 12/30/2009 - 11:13
ShardFenix's picture

I just cant see how a card that is really only mediocre/alright in one match up is the best red card over much more important and splashable cards that red offers. "Statistically" it makes no sense. Because there has never been a single moment of my life when I drew a burst lightning and thought.. "Man, I really wish this was a Tuk-Tuk Grunt."

Removal, like Burst by ArchGenius at Wed, 12/30/2009 - 13:02
ArchGenius's picture

Removal, like Burst Lightning, is also one of the most misplayed type of cards. Many inexperienced and new players use removal right away and often on insignificant targets. Creatures on the other hand are easier to play. Knowing when to attack or block with a creture is significantly easier than knowing when and what to target with removal.

If large numbers of players misplay Burst Lightning, does that make Burst Lightning a bad card? No...

If large numbers of players misplay Burst Lightning, does that make its percentage in this study go down? I'm guessing yes...

Do you have anything of value by Anonymous (not verified) at Wed, 12/30/2009 - 13:55
Anonymous's picture

Do you have anything of value to add regarding the statistical discussion?

Because, flawed as it is, at least it's unique. Your post, not so much.

Tuktuk Grunts is not better by wspaniel at Wed, 12/30/2009 - 14:54
wspaniel's picture

Tuktuk Grunts is not better than Burst Lightning at a statistically significant level. As I have said in past articles (and I think in this one as well), that requires a different test altogether, and there are so many combinations of cards that it isn't worth doing on the whole.

And it truly is amazing how bad players are with removal spells.

true...though i can also by ShardFenix at Wed, 12/30/2009 - 13:25
ShardFenix's picture

true...though i can also think of many creatures i would rather have than the grunts...unless im allies all the way, i dont think i would even push him as a 23rd..

What is the point of these by Anonymous (not verified) at Wed, 12/30/2009 - 19:56
Anonymous's picture

What is the point of these articles?

not showing that burst by Anonymous (not verified) at Wed, 12/30/2009 - 21:35
Anonymous's picture

not showing that burst lightning is better than tuktuk grunts to a statistically significant level shows that the percentages being output cannot be taken at face value, you would have a much better time comparing cards of the same mana cost. tuktuk grunts is generally better than spire barrage in sealed might be a conclusion (in draft where monored is possible or green red with a couple harrows this changes of course) shatterskull's double red make the comparison to the bladetusk murky. plated geopede is the best 2 drop in red, ok we already knew that, burst lightning is the best 1 cc spell, ok we already knew that, goblin war paint is better than a number of the other 2 drops might be a conclusion, I can see that, since in sealed few people are playing blue, it's also a pretty solid card in general and the other 2 drops aren't that exciting, slaughter cry and magma cry are the best 3 drops in sealed (though this is misleading since you often wont play either till much later and won't reveal you have the combat trick if you know youre losing meaning it'll see play more often when it actually helps you win the game, and rift won't be played when you're low on lands as much sometimes remaining uncast due to hoping for a 4rth or 5th land for a 4 or 5 drop) are they better than ruinous or molten ravager in sealed? I think most people would agree with that. The main thing here is you need to adjust your conclusions to take into account what you know or at least stick to situations where you know more than less. Comparing tuktuk to burst is just silly with these raw numbers. People that have been drafting for a long time know a lot about why cards are good, if they smell a rat, chances are theres a reason.

Still like the series, I think some people are becoming too venomous in their responses and you have become too defensive in your articles and replies. You seem to be lending more and more weight to the efficacy of your raw results the further along you go, which makes sense, people like to thin their efforts are meaningful and important. I would simply urge you to strongly consider the large number of factors that impede clear understanding coming out of such data.

suggestion by soru (not verified) at Fri, 01/01/2010 - 09:17
soru's picture

If you get to having 5 lands on the table and cast a 5 mana offensive spell without losing or conceding, your baseline chance of winning is not 50%, but something like 70%.

You could try and model that effect, and compensate for it, but that sounds pretty tricky, especially when you consider good players usually concede earlier than bad ones.

Simpler would be punt and split the cards up by mana cost and maybe role, instead of colour. Comparing 2 mana and 5 mana spells is just not like-for-like. Certainly, in sealed, which is what the data set is for, the real choice you make is 'what 2 mana spells should I play?', not 'what red spells should I pick?'. Choice of colour follows that evaluation of cards, when you select a mana curve, and decide which colours have the best set of cards fitting into that curve.

where is goblin bushwacker ? by Anonymous (not verified) at Fri, 01/01/2010 - 12:28
Anonymous's picture

where is goblin bushwacker ?

I'm not sure how I missed by wspaniel at Fri, 01/01/2010 - 13:46
wspaniel's picture

I'm not sure how I missed that, but it was 42% and just below meeting the threshold for 90% statistical significance. I will include it in the next article.

Ok this argument basically by Anonymous (not verified) at Fri, 01/01/2010 - 13:10
Anonymous's picture

Ok this argument basically seems to come down to:

'These statistics aren't perfect so they're useless'

'You haven't done the analysis for us, so these statistics are useless'

Yes there will be some major flaws with the results, that doesn't mean they're useless, it just means that you need to do some work and think about why some cards came out the way they did. Also, arguments about cards not being played because of concessions are reasonably minor and would become even less significant as the sample size increases.

And yes, you'll need to do your own analysis, he gives some explanations, but you need to think about why cards have been so successful for yourself.

One way to look at this is by Anonymous (not verified) at Fri, 01/01/2010 - 17:21
Anonymous's picture

One way to look at this is that the author is having a lot of fun doing analysis that is not very meaningful when the project could be much more informative. The article approach is interesting, the methodology and interpretation are incredibly rough.

From a statistical point of view the author needs to tighten up his methodology. He needs to report the hypothesis that are being tested. He needs to report the statistical test he is using. He needs to report his sample size and standard errors for the statistics he reports. As a reader I see his table of ranked statistics and wonder if other readers also see that although plated geopede and tuktuk grunts are statistically significant (using an unknown test and unknown hypothesis), at face value there is no evidence that a probability of 60 and 65 are different from one another. The magnitudes of the probabilities convey nothing if the author does not test the statistics against each other.

I am not sure about the issues of sample bias brought up by other readers but if I were in the authors shoes I'd try a different model. Maybe a logit or probit with indicator controls for mono vs. splash decks or concede / straight win could address some of the criticisms.

I get why I don't by Paul Leicht at Fri, 01/01/2010 - 18:45
Paul Leicht's picture

I get why I don't particularly care about the statistics here but I don't get why that makes his article invalid. Just because you or I have little use for the information clearly does not mean that is so for anyone else. Plenty of people here have found a use for his analysis and some have expressed enjoyment for the article itself. I don't see how he 'needs' to do anything other than what he has been doing.

I think the fact that the by Anonymous (not verified) at Sun, 01/03/2010 - 00:32
Anonymous's picture

I think the fact that the author is getting a lot of contentious feedback demonstrates the interest the community has in the subject. He certianly doesn't need to do any more than he's been doing, but there are many of us who would love to see it if he did, which should be appreciated as support for his efforts when the person expressing it keeps their civility. I find it an interesting logical exercise to think about the data he's presenting, so that's already a winner, if I agreed with everything he was concluding in analysis it probably wouldn't be as interesting, so keep it up mr. sandyman, this kind of writing is sorely underpopulated in the mtg community.

See that is what I meant. by Paul Leicht at Sun, 01/03/2010 - 00:51
Paul Leicht's picture

See that is what I meant. Surely there is a place for this sort of thing even if it isn't MY cup of tea. I fully appreciate that some people dig this stuff. Well said.

Usefulness by Mathu (not verified) at Mon, 01/04/2010 - 13:57
Mathu's picture

I think these articles are readable and enjoyable. Because I don't draft that often, I can always appreciate extra information from what is going on in Limited. I may not base my ratings of cards or build decks based on the results of these studis, but i do appreciate being given the chance to evaluate these cards in this light. Thank you.

A different way of looking at this by qwyrxian (not verified) at Tue, 01/05/2010 - 03:04
qwyrxian's picture

I was thinking about this article last night; I have all the same objections as the Anonymous above (the very first thing I thought of when I read the methodology and saw the Grunts first was that this was clearly an artifact of mana cost). But rather than just discounting the information entirely (as I was wont to do yesterday afternoon), what if we take this approach: instead of thinking that these are good cards, perhaps we should say that "decks which are capable of playing these cards are more successful." That is, instead of saying that the Grunts is a good or bad card based on the data, we can accept at face value that players who got into a position where they were able to play the Grunts one. Thus, our goal is to create decks (building in sealed, to a lesser degree, picking in drafts) that enable us to get to the point where we can cast the the Grunts. That is, perhaps these lists point us to thoughts about the overall shape of the format and of various archetypes than of the individual cards. For instance, this might lead us to think that saying "Zendikar is an extremely fast format" is less useful than saying "Zendikar is a format that rewards players for having enough control over the early game that a hasted 3/3 on turn five may be enough to lock the game in your favor." Somehow I feel like I'm not articulating this as clearly as I want to...does anyone else get what I'm saying enough to tell me if what I'm saying makes sense?