Parity and
disparity: what’s the reality?
“On any
given Sunday, any team in our league can beat any other team.”
-Bert Bell, 5th NFL Commissioner, 1946-1959
All the time I hear talking heads
praise how paritious (I know it’s not a word, but the word “parity” needs an
adjective, damn it!) the NFL is with, at times, awe and amazement. The idea certainly
is appealing and clearly makes the NFL different than MLB. But is it true?
Sure. But when and how is it true?
The quote at the top is a
standard most fans of pro football have heard and if you look closely at games
played in a season teams do lose and
win when they should and shouldn’t. In the 2011 season the Giants lost both divisional
games to the Redskins and just looked like a comedian failing in front of an
audience. I just had to look away. But they won the friggin’ Super Bowl as a
9-7 team. A more pedestrian example is the Saints destroying the Colts in Week
7, 62-7, and losing to the Rams the next week, 31-21. Both the Rams and the
Colts were 2-14 in 2011 and the Saints, well, were the 40-burger kids.
All this aside, I remember when
southern teams lost to northern teams in the winter (San Diego at Cincinnati in
1981 playoffs ring a bell?), and teams that went 15-1 were a deadlock to kill
whoever they met in the Super Bowl, let alone they were a deadlock to get
to the Super Bowl. Now it’s all, “the teams that get hot late in the regular
season are the teams that have the best chance to make a run deep into the playoffs”
according to the faces, voices, and words around the World Wide Web and
elsewhere. I’m sure this is true as the Packers once and Giants twice have
proven recently and I’m also sure most, if not all, the credit belongs to
parity and the measures the NFL takes to ensure it. Still, I think this is a
big picture idea and a season here or there can’t tell the whole story. Therefore,
I decided to put my education to work in order to answer the questions: is
there parity in the NFL, and if so, has there always been?
***This next section is full of technical jargon. If you’d like to skip over
this and get to the results and heart of the matter in what I found because
life is too short and you’re a busy person, please feel free to do so. However, be warned. I may have addressed
issues in the next section you have later on down the line. Consider this a
preemptive bifocal adjustment.***
A test of
parity
To test for parity in a season, I
made two assumptions: 1) if any team can beat any other team on Sunday, than
this implies that the pattern of winning and losing league-wide by season’s
end is random; 2) if the pattern of winning and losing league-wide by
season’s end is random, than any
random sample of a season’s league standings will suffice; there’s no need for averaging
30 random samples. Once sampled, I compared the random samplings to actual single-season
league-wide standings going all the way back to 1936. The real standings data
came from nfl.com and pro-football-reference.com. The parity tests were done on
the variance of simulated seasons against the variance of these real seasons. Variance
is chosen to represent the gap between the biggest losers and the biggest
winners. More specifically, variance, in this case, shows the spread, or
distribution, of outcomes in a season. Overall, a league in the throes of
parity ought to have more 9-7, 8-8, and 7-9 teams than a league of disparate
winners and losers. Variance ought to be smallish and more tightly bound to its
“middle-class.” (That’s as close to politics I hope this blog ever gets.)
If you’ve ever taken a course
where you concentrated on hypothesis testing you may have run across a test
known as an “F-test.” It tests the variances of two different samples. It’s
very simple; you divide the larger of the two variance values by the smaller
and compare the quotient to a table of values based on sample sizes. If the
calculated quotient is larger than the table value, then, in this case, the
real season is significantly different than the any-given-Sunday, parity-world
season. That means disparity ruled the day. What’s significant is based on a
certain level. The tester chooses the level of significance; the smaller and
smaller the level, the harder it is to break that threshold. The standard level
is 0.05 or a 1 in 20 chance your test is in error. I also tested the level at
0.01 (A 1 in 100 chance the test is in error). Basically, how much are you
willing to risk you are wrong? For the purposes of this post, it can be
explained as: how much am I willing to claim a season is disparate when it is
not? That’s the quick and dirty explanation anyway. Additionally, you do the
F-test for samples whose distributions are “normal.” I tested for this using
the Anderson-Darling method and, let me tell ya, people, the samples are
normally distributed. I think the jargon’s almost over.
Because the number of games and
teams has changed quite a bit over the years I took it into account. I sampled
based on comparing two randomly generated numbers (“teams”) from a uniform
distribution with the larger value being the winner noted as a 1. Losing was a
0. I assembled the wins by summing across the number of games per season and
sorted largest to smallest down the teams to get a single-season league-wide standing.
In other words, for the league having 14 game seasons played by 28 teams, I
simulated 14 games, randomly won and lost, once per the 28 teams, then sorted. One
thing worth mentioning is I did not make the “teams” relative to each other.
So, in a sense, in the computer world I created, there is no model league I am
comparing to the real one.
Results
I took two views to explain the “when
and where’s” of parity in the NFL. The first is the overall amount of parity
since 1936 and the second is the percentage of disparity per decade since 1936.
Including the AFL (1960-1969) and based on level of significance, results do
not bode well for parity if I am willing to risk a 1 in 20 chance I am wrong for
each test described in the previous section. There is 54.7% disparity in this
case from 1936-2011. More conservatively, risking a 1 in 100 chance I am wrong,
parity fares much better. There is 22.1% disparity from 1936-2011. For the
record, the AFL was 40% and 100% disparate at 0.05 and 0.01 levels of
significance, respectively, during its run from 1960-1969. So it depends on how
you look at it. It was either moderately disparate (40%) or completely
disparate (100%).
As for Commissioner Bell, it
looks like he was telling the truth about any given Sunday in the NFL. During
his time as commissioner (1946-1959) the league was its least disparate (See
the graph below for a look at % disparity over the decades.). By the 1990’s
however, nothing could have been further from the truth. Parity was pretty
weak. This brings us to a moment of irony. The movie, “Any Given Sunday” by
Oliver Stone and starring Al Pacino was filmed and released in 1998 and 1999,
respectively. This stretch of time was the NFL’s most disparate period
(1996-2001). Oh, well.
The take home message is that yes, there is parity in the
NFL, but it must be cared for through the salary cap, free agency, and draft
equity to name a few measures. Otherwise, parity is not always going to be a
phenomenon you see on any given Sunday. But you knew that already.
Some
questions for the road
What role do the franchises play? Some ball clubs are more
stable than others despite the fact that every team faces the same salary cap.
Is there parity in coaching and scouting talent? Do some coaching
strategies or personnel philosophies work better more frequently than others?
How much parity is there in strength of schedules or
strength of divisions?


What about skill & talent? Could it be argued that teams have simply finessed their capabilities over the the years, making them more precise and predictable and "disparitable"?
ReplyDeleteThen there's the new Nike gear. Who knows where that will take teams...
No soliciting. Let this serve as your only notice.
ReplyDelete