# Can Sports Statistics Predict the Future?

For almost six decades, Leonard Koppett strode across the sport journalism landscape like a giant. His passion, innovation, and unique insights earned him accolades and acclaim, including inductions into the Baseball and Basketball Halls of Fame as a writer.

Many lauded his ability not just to communicate what had happened in the sports world, but also explain why it had happened. Koppett often used uncommon sports statistics to support his explorations and explanations.

But Koppett had a love-hate relationship with statistics.

In the last of his 16 books, The Rise and Fall of the Press Box – completed a mere two weeks before his death in 2003 – Koppett dedicates an entire chapter to his thoughts on statistics.

He minces no words, opening the section by declaring that when it comes to sports writing, the “excessive use of statistics, if not checked, may turn out to be a fatal malady.”

“Then there’s the silliest of all cliches, ‘on a pace for’… ‘Pace’ is a figment of the mathematician’s imagination.” – Leonard Koppett

Koppett’s least favorite statistic?

“Then there’s the silliest of all cliches, ‘on a pace for’. A player with 11 homers in his first 27 games is said to be ‘on a pace for 66 homers.’ Isn’t it obvious enough that home runs (and most other things) occur in irregular spurts? It’s a little less silly, but still sheer speculation, if you play the pace game after mid-season,” wrote Koppett. “Pace is a figment of the mathematician’s imagination.”

But how often does a fan encounter statements like, Steven Stamkos is on pace for 57 goals in 2011-12. Or, the Detroit Red Wings are on pace for million-and-two points this season (editor’s note: it’s more like 112).

These kinds of pace-based projections would seem to be just too tempting for the media to avoid.

What’s an informed fan to do?

#### NHL Teams’ Point Pace

Here is a plot of the Tampa Bay Lightning’s point pace in 2011-12.

It quickly becomes apparent that, based on last year’s postseason point cutoffs, the Lighting have been well off the playoff pace for most of the season.

However, notice that as the season progresses the point pace becomes more stable because more games are played and the sample size increases.

Notice too, this stabilization doesn’t happen equally throughout the season. There is a marked increase in stability after the 25-game mark.

This stabilization isn’t unique to the Bolts. This point-percentage stabilization that occurs in the NHL around the American Thanksgiving is a trend often cited by NHL GMs and other hockey observers as a useful rule-of-thumb.

But the so-called Thanksgiving Rule isn’t fool-proof.

The biggest problem is that using past performance to predict future performance is risky business, especially when only using a direct method like using a past record to predict a future record.

For example, take the Minnesota Wild’s performance this year. As examined by Kent Wilson in a recent Puck Daddy piece, this year’s edition of the Wild were atop the Western Conference after 31 games, and now trail all but the young Edmonton Oilers and horror-show Columbus Blue Jackets in the West.

Wilson referred to ‘regression towards the mean’ as an explanation for the Wild’s demise. Specifically, he explained that “the Wild were living off of sky-high save percentages that were unlikely to continue in perpetuity. Truly great teams, it was argued, tend to control puck possession and outshoot their opponents. As such, Minnesota’s success was likely a mirage. Regression was inevitable and with it, a fall from grace.”

Looking at the numbers behind the wins, the indirect statistics, allowed for more sophisticated analysis, and one that in the case of the Wild was more accurate.

Had Koppett been alive today, he too may have commented on the Wild’s vulnerability, using statistics that went beyond the Wild’s record and delved into some numbers that illustrated why he thought they were skating on thin ice.

#### Predicted Wins

One of the easiest sets of statistics to use to go beyond direct predictions based on record is a team’s scoring.

Derived from on the grade-school Pythagorean formula, sporting statsmen have proposed that a team’s winning percentage will be equal to to Goals For2/(Goals For2 + Goals Against2).

Here is the Tampa Bay Lightning’s predicted and actual wins in 2011-12, based on the formula.

So, like the point-pace graph, the dark blue line that shows the predicted end-season wins stabilizes around the 25-game mark, and begins to bounce between 30 and 35 wins. Faithful fans will no doubt notice that the the Bolts are out-pacing the formula’s number of predicted wins recently (which suggests they are winning close games, but losing by many goals when they lose).

Whether this trend regresses towards the mean that James’ formula suggests is still up in the air of course, but the educated fan can at least use the formula to decide whether any late-season push is likely to be sustainable or end after only a short streak. That fan has a tool to hedge against the kind of performance the Wild put on earlier this year.

#### Predicted Points

Unfortunately, hockey’s single point makes for a problem, but while the Pythagorean formula does not easily take into account the NHL’s single point, at least wins account for the bulk of an NHL’s points. So we can cheat by incorporating a team’s direct single-point pace with the Pythagorean win prediction.

Here is the Tampa Bay Lightning’s combined point prediction.

So while the predicted end-season point line is still adjusting itself as the campaign unfolds, it certainly looks like the Bolts will finish well outside the postseason cutoff – and it’s looked this way since November.

Only time will tell how accurate these predictions are, but at least the GF/GA formula offers even the casual fan a simple method to move beyond direct on-pace-for projections and move into the more sophisticated world of indirect statistical predictions.

Even Leonard Koppett might approve of that kind of statistical thinking.