Wednesday, May 2, 2012


Parity and disparity: what’s the reality?

“On any given Sunday, any team in our league can beat any other team.”

-Bert Bell, 5th NFL Commissioner, 1946-1959


All the time I hear talking heads praise how paritious (I know it’s not a word, but the word “parity” needs an adjective, damn it!) the NFL is with, at times, awe and amazement. The idea certainly is appealing and clearly makes the NFL different than MLB. But is it true? Sure. But when and how is it true?

The quote at the top is a standard most fans of pro football have heard and if you look closely at games played in a season teams do lose and win when they should and shouldn’t. In the 2011 season the Giants lost both divisional games to the Redskins and just looked like a comedian failing in front of an audience. I just had to look away. But they won the friggin’ Super Bowl as a 9-7 team. A more pedestrian example is the Saints destroying the Colts in Week 7, 62-7, and losing to the Rams the next week, 31-21. Both the Rams and the Colts were 2-14 in 2011 and the Saints, well, were the 40-burger kids.

In 2011 the Saints won and lost back-to-back games to two teams that ultimately were 2-14 on the season. The Saints were 13-3 by season’s end. They destroyed the Colts in Week 7, 62-7 and then in Week 8 lost to the Rams, 21-31. WTF?
 
All this aside, I remember when southern teams lost to northern teams in the winter (San Diego at Cincinnati in 1981 playoffs ring a bell?), and teams that went 15-1 were a deadlock to kill whoever they met in the Super Bowl, let alone they were a deadlock to get to the Super Bowl. Now it’s all, “the teams that get hot late in the regular season are the teams that have the best chance to make a run deep into the playoffs” according to the faces, voices, and words around the World Wide Web and elsewhere. I’m sure this is true as the Packers once and Giants twice have proven recently and I’m also sure most, if not all, the credit belongs to parity and the measures the NFL takes to ensure it. Still, I think this is a big picture idea and a season here or there can’t tell the whole story. Therefore, I decided to put my education to work in order to answer the questions: is there parity in the NFL, and if so, has there always been?


***This next section is full of technical jargon. If you’d like to skip over this and get to the results and heart of the matter in what I found because life is too short and you’re a busy person, please feel free to do so. However, be warned. I may have addressed issues in the next section you have later on down the line. Consider this a preemptive bifocal adjustment.***

A test of parity

To test for parity in a season, I made two assumptions: 1) if any team can beat any other team on Sunday, than this implies that the pattern of winning and losing league-wide by season’s end is random; 2) if the pattern of winning and losing league-wide by season’s end is random, than any random sample of a season’s league standings will suffice; there’s no need for averaging 30 random samples. Once sampled, I compared the random samplings to actual single-season league-wide standings going all the way back to 1936. The real standings data came from nfl.com and pro-football-reference.com. The parity tests were done on the variance of simulated seasons against the variance of these real seasons. Variance is chosen to represent the gap between the biggest losers and the biggest winners. More specifically, variance, in this case, shows the spread, or distribution, of outcomes in a season. Overall, a league in the throes of parity ought to have more 9-7, 8-8, and 7-9 teams than a league of disparate winners and losers. Variance ought to be smallish and more tightly bound to its “middle-class.” (That’s as close to politics I hope this blog ever gets.)

If you’ve ever taken a course where you concentrated on hypothesis testing you may have run across a test known as an “F-test.” It tests the variances of two different samples. It’s very simple; you divide the larger of the two variance values by the smaller and compare the quotient to a table of values based on sample sizes. If the calculated quotient is larger than the table value, then, in this case, the real season is significantly different than the any-given-Sunday, parity-world season. That means disparity ruled the day. What’s significant is based on a certain level. The tester chooses the level of significance; the smaller and smaller the level, the harder it is to break that threshold. The standard level is 0.05 or a 1 in 20 chance your test is in error. I also tested the level at 0.01 (A 1 in 100 chance the test is in error). Basically, how much are you willing to risk you are wrong? For the purposes of this post, it can be explained as: how much am I willing to claim a season is disparate when it is not? That’s the quick and dirty explanation anyway. Additionally, you do the F-test for samples whose distributions are “normal.” I tested for this using the Anderson-Darling method and, let me tell ya, people, the samples are normally distributed. I think the jargon’s almost over.


Because the number of games and teams has changed quite a bit over the years I took it into account. I sampled based on comparing two randomly generated numbers (“teams”) from a uniform distribution with the larger value being the winner noted as a 1. Losing was a 0. I assembled the wins by summing across the number of games per season and sorted largest to smallest down the teams to get a single-season league-wide standing. In other words, for the league having 14 game seasons played by 28 teams, I simulated 14 games, randomly won and lost, once per the 28 teams, then sorted. One thing worth mentioning is I did not make the “teams” relative to each other. So, in a sense, in the computer world I created, there is no model league I am comparing to the real one.

Results

I took two views to explain the “when and where’s” of parity in the NFL. The first is the overall amount of parity since 1936 and the second is the percentage of disparity per decade since 1936. Including the AFL (1960-1969) and based on level of significance, results do not bode well for parity if I am willing to risk a 1 in 20 chance I am wrong for each test described in the previous section. There is 54.7% disparity in this case from 1936-2011. More conservatively, risking a 1 in 100 chance I am wrong, parity fares much better. There is 22.1% disparity from 1936-2011. For the record, the AFL was 40% and 100% disparate at 0.05 and 0.01 levels of significance, respectively, during its run from 1960-1969. So it depends on how you look at it. It was either moderately disparate (40%) or completely disparate (100%).

As for Commissioner Bell, it looks like he was telling the truth about any given Sunday in the NFL. During his time as commissioner (1946-1959) the league was its least disparate (See the graph below for a look at % disparity over the decades.). By the 1990’s however, nothing could have been further from the truth. Parity was pretty weak. This brings us to a moment of irony. The movie, “Any Given Sunday” by Oliver Stone and starring Al Pacino was filmed and released in 1998 and 1999, respectively. This stretch of time was the NFL’s most disparate period (1996-2001). Oh, well. 

Percentage of disparity in the league began low through Commissioner Bell's time and increased until hitting a crescendo in the 1990's. The dotted line indicates results of my parity tests which I am willing to be wrong 1 in 20 times. The solid line is where I'm only willing to be wrong 1 in 100.
The take home message is that yes, there is parity in the NFL, but it must be cared for through the salary cap, free agency, and draft equity to name a few measures. Otherwise, parity is not always going to be a phenomenon you see on any given Sunday. But you knew that already.

Some questions for the road

What role do the franchises play? Some ball clubs are more stable than others despite the fact that every team faces the same salary cap.

Is there parity in coaching and scouting talent? Do some coaching strategies or personnel philosophies work better more frequently than others?

How much parity is there in strength of schedules or strength of divisions?


2 comments:

  1. What about skill & talent? Could it be argued that teams have simply finessed their capabilities over the the years, making them more precise and predictable and "disparitable"?

    Then there's the new Nike gear. Who knows where that will take teams...

    ReplyDelete
  2. No soliciting. Let this serve as your only notice.

    ReplyDelete