Welcome to The Hockey Writers’ Advanced Analytics Primer. An introduction to what analytics and advanced statistics are and a short history of these statistics in hockey is followed by a glossary in which definitions for many common terms are provided. First, what is meant by calling a statistic “advanced?”
What are Advanced Statistics?
Simply put, the term “advanced statistics” refers to a number of metrics that go beyond traditional boxscore statistics, such as goals, assists, shots, hits, penalty minutes, and plus-minus differential. They offer a more detailed form of hockey analysis and reduce the amount of human error involved. As with any tool, they are far from perfect, and using a single statistic can lead to mistakes in judgment. All kinds of analysis can benefit from more diverse opinions, and the work being done in the hockey world is no different.
The History of Advanced Statistics in Hockey
Although the hockey industry’s obsession with analytics has exploded in the past decade, early pioneers, such as the Montreal Canadiens of the 1950s or Roger Neilson of the Vancouver Canucks, are recognized for creating forms of the common metrics still being used today, including plus-minus and scoring chances. Even as their usefulness has decreased as more complex statistics have been developed, these statistics started the current analytical movement.
Beginning in the early 2010s, public hockey bloggers (though that title sells their accomplishments outside of writing short) began to move away from what they believed to be outdated methods of evaluation. Although their entrance onto the hockey scene was met with hostility by NHL executives and professional hockey writers, the blogger-to-professional pipeline is now well-established. The most famous example may be the story of Eric Tulsky, the current assistant general manager of the Carolina Hurricanes, but several others have been hired by NHL franchises over the years. With every team having an analytics department in one form or another, these metrics are becoming an increasingly important tool for any well-run hockey organization.
The Foundational Metrics of Hockey Analytics
As with any subject matter, hockey analytics subsists of an immense range of metrics, theories, and concepts, but several key statistics set the foundation for all the modern analytical evolution. In this section, the metrics Corsi, Fenwick, and expected goals are explored in turn, giving the reader a base upon which to understand and meaningfully use such statistics during their own enjoyment of the sport.
Corsi can be understood as the raw sum of all shot attempts, including all shots on goal, missed shots, and blocked shot attempts. The theory behind the use of Corsi and Corsi-based metrics is that it follows that if a team directs a greater number of shots on net, then they will likely score more goals over time.
Fenwick, like Corsi, also tallies shot attempts. However, the distinguishing feature between the two is that Fenwick does not include blocked shot attempts in its count. Some prefer Fenwick for this reason, as it’s believed that shots that reach the net are more valuable than those that are prevented from troubling the goaltender.
Expected Goals (xGF/xGA)
Generally a measure of how likely a given shot is to become a goal based on historical success rates. The expected value of a given shot is presented as a number between 0 and 1. The probability is generated based on a number of factors, such as shot angle, shot location, and distance from the net, among other variables. It’s a reminder that each public or private statistical model can have different expected goals values depending on their model’s inputs and the subsequent value placed upon the selected key variables.
Microstats attempt to provide a more specific analysis of familiar hockey events, including shots, passes, and zone entries and exits, among a wealth of other granular statistics. These metrics attempt to highlight how players perform in isolated facets of a hockey game, providing a more comprehensive evaluation of their overall impact on play. Much of these metrics are derived from the work of Corey Sznajder.
A pass is credited as a shot assist if it is the final completed pass that directly leads to a shot attempt on goal. Shot assists are seen as a clearer indicator of a player’s passing ability, as the handing out of a regular assist is dependant on the shooter’s finishing ability, rather than the inherent skill of the passer.
According to Corey Sznajder’s definition, a pass is credited as a high-danger pass if it originates from behind the net or travels across the slot to its intended target. These passes are more difficult to complete and often put the receivers in a better position to score, making them a more valuable type of pass.
Scoring Chances (SCF/SCA)
This is a metric primarily provided by Natural Stat Trick (NST), but definitions can vary based on the source. According to NST, a scoring chance is any shot with a danger value of two based on several criteria. Essentially, each individual shot attempt is given a value based on where it was taken in the offensive zone. Shots from immediately in the crease area are given a danger value of three, those slightly farther out but still below the faceoff dots both vertically and horizontally are given a two, and those from within the rest of the offensive zone are given a one.
Any rush or rebound attempts are given an additional point, and any blocked shots carry a negative one value. For more information and a visual diagram of the values, read Natural Stat Trick’s glossary.
High-Danger Chances (HDCF/HDCA)
According to NST, a high-danger chance is a shot attempt with a cumulative danger value of at least three based on the explanation offered in the scoring chance definition. Generating a higher frequency of high-danger chances is thought to be a better predictor of future offensive success given the increased quality of these chances.
This metric tracks how often a team or player establishes possession in their opponent’s defensive zone. Entries are either considered controlled (enter with possession of the puck) or uncontrolled (dump-ins) depending on how the puck enters the zone. Controlled entries allow a team to enter the offensive zone in a more organized fashion, offering more intent during the implementation of attacking schemes.
The fairly self-explanatory term tracks how often a team or player exits their own defensive zone with the puck. Like entries, exits can either be controlled (exit while maintaining possession of the puck) or uncontrolled (dump-outs). Controlled exits lead to more seamless transition play and are more desirable than simply dumping the puck out, which essentially recycles possession for the other team.
Included in this section are terms that may not belong to the previous section, but are still valuable tools in modern hockey analysis.
Goals Saved Above Average (GSAA)
Goals saved above average estimates how many goals a given goaltender prevents compared to a hypothetical league-average netminder (in terms of save percentage) facing the same amount of shots.
Goals Saved Above Expected (GSAx)
This metric is thought to offer a more accurate depiction of a goaltender’s ability by accounting for the shot quality they face, rather than assuming each shot attempt is created equal. The GSAx value is derived by calculating the difference between the total expected goals a goaltender faces and how many actual goals they allow. For example, if Goalie A faces a total of 50 expected goals and allows 42, they have saved eight goals above expected. It is important to note that the expected total varies depending on the source of the expected goals model used as each model involves different outputs.
On-Ice Shooting Percentage (OiSH%)
A player’s on-ice shooting percentage refers to their team’s 5v5 shooting percentage (including their own) when they are on the ice. This can be used to gauge how fortunate or unfortunate a player and his linemates are in receiving favourable or unfavourable finishing luck.
On-Ice Save Percentage (OiSV%)
A player’s on-ice save percentage refers to their goaltender’s 5v5 save percentage when the player is on the ice. This is used to estimate how fortunate or unfortunate a player may be in receiving stronger or weaker goaltending.
A simple metric meant to approximate a team’s experience of luck. PDO is calculated by combining a team’s save and shooting percentage at 5v5. Generally, a PDO value of 1.00 or 1000 is considered the neutral or average value. A lower or higher value suggests that a team may soon experience a regression to the mean in their results. PDO can also be calculated at the individual level by combining a player’s OiSH% and OiSV%.
Primary Assists (A1)
An assist is designated as “primary” if it is the last pass or controlled possession of the puck before a goal is scored by a given skater. Primary assists are thought to more accurately represent a player’s playmaking than secondary assists, as they are, by definition, more directly involved in the buildup of play.
Primary Points (P1)
Primary points include any primary assists combined with goals scored by a given skater and exclude secondary assists. For the same reason as primary assists, the accumulation of primary points suggests that a player is more directly responsible for offensive success.
Quality of Competition (QoC)
Quality of competition is often presented as the percentage of time a skater plays against another team’s most-used players. For forwards, the opposition’s top two defensemen (usually by ice time) are considered, while the top three forwards are used for defensemen. More advanced calculations of QoC are also made according to metrics, such as Corsi or expected goals, rather than ice time, but each source differs in its approach.
Quality of teammates (QoT)
Similar to QoC, the quality of teammates is found by calculating the portion of time (or according to other metrics) a player plays with their own team’s best or most utilized forwards or defensemen depending on their own position.
Zone Starts (DZS/NZS/OZS)
It should be noted that although some players may start more shifts in either the offensive or defensive zone, every player begins a majority of their shifts on the fly (i.e entering active play from the bench). This draws into question the value of zone starts as an analytical tool.
Contextualizing Advanced Statistics in Hockey Analytics
Each aforementioned metric can be further manipulated beyond a count of their raw totals, including adjusting for ice time, the score of the game, and one’s teammates. No skater plays within the same framework, and these situational adjustments are crucial for providing greater context for what is happening on the ice.
Advanced statistics are most often initially presented as a sum total, either for or against a given team or player, before being adjusted for greater context. Raw totals are best used within small sample sizes, as larger totals can be difficult to conceptualize. For example, Team A takes 50 shot attempts and concedes 45, which gives them a plus-5 Corsi differential, which is relatively simple to digest and understand.
Game-state simply refers to the situation in which play is occurring. Statistics can be adjusted for even-strength (5v5, 4v4, 3v3), power play (5v4, 5v3, 4v3), or shorthanded (4v5, 3v5, 3v4) play as well as each specific numerical situation in turn. Adjusting for game-state is useful in trying to determine how effective a team or player is in certain situations. For example, it makes sense to evaluate players and teams based on 5v5 or even-strength play, given that a majority of the game takes place within this context.
Just as statistics, such as shots, goals, assists, and points, can be calculated at the team level, the same can be done at the individual level. This takes the form of isolating a player’s contribution in any number of metrics (ex. Corsi, Fenwick, expected goals) and presents them as a raw total or as a per-60-minute rate.
Rate Metrics (/60)
The purpose behind rate metrics is that not every player is given the same amount of ice time and certain individuals play much more than others. As a result, they are given a greater number of opportunities to score, which leads to higher statistical totals. However, by presenting statistics as a per-60-minute rate, a standardized baseline is established, and it becomes easier to identify what skaters are accomplishing in their deployment.
Imagine Player A scores 45 points while playing a total of 1000 5v5 minutes in a season, and Player B scores 40 points while playing only 750 total 5v5 minutes. In comparing the total number of points, Player A appears to be the better, or more productive, player. Yet, totals fail to acknowledge the discrepancy in ice time. Using rate-adjusted numbers, Player A has scored 2.69 points per-60 (P/60), while Player B has scored 3.2 P/60 in response, which is a more productive rate than his peer.
To obtain a player’s per-60-minute rate, take their total number of minutes and divide that number by 60 to find how many distinct 60-minute blocks they have played. For example, Player A played 16.7 distinct 60-minute blocks (1000/60). Next, divide their total number of points (or your chosen statistic) by the number of 60-minute blocks to find how many points, shots, goals, etc. a player scores per-60-minute increments. For Player A, this means dividing 45 by 16.7, giving us a 5v5 P/60 rate of 2.69.
Relative Metrics: WOWY/Relative to Team/Relative to Teammates
Relative metrics attempt to isolate a player’s on-ice results from those of specific teammates or their team as a whole. The first, and simplest, relative metric is relative to team, which compares a player’s performance in one statistic to how their team performs in the same statistic while they are off the ice. For example, Team A generates 50 shot attempts per-60-minutes (CF/60) with Player A on the ice and 55 shot attempts per-60 when they are not. Thus, Player A’s Rel Team CF/60 differential is minus-5 or minus-9.1%, depending on how it is presented.
However, Rel Team fails to account for talent differences in one’s linemates. It makes sense that those playing on Connor McDavid’s wing perform better than those who do not, for example, but should those other players be penalized because they do not enjoy the same honour? This issue is partly resolved by the use of the relative to teammates metric, which will be explored shortly.
The second form of relative metric is with or without you, abbreviated as WOWY. WOWY presents and compares two players’ on-ice results when they play together and when they play apart. Two issues with this method are that their total time spent apart may constitute a very small sample that can be overly prone to variance and that the talent of their linemates or defensive partners can impact their independent results. As a result, one player may appear much worse than the other according to their WOWY comparison when, in reality, a third party is responsible for dragging down their “without” numbers.
The third and final form of relative metric is relative to teammates, which requires a more complicated calculation. While the nitty-gritty can be found here, this metric essentially combines a player’s results from each of their teammates individually and weights them based on the time they spent together on the ice.
This filter accounts for the impact of score effects, a phenomenon that is inherently familiar to regular observers of hockey. When down a goal, the trailing team is more likely to emphasize attack and manufacturing shot attempts, while the leading team is more likely to sit back and collapse in an attempt to protect their lead, therefore conceding a greater number of shots in the process. As such, it becomes difficult to evaluate a team’s true ability on a neutral playing field (i.e. when there is no greater incentive to try and score).
To address this discrepancy, using score-adjusted data means that only events occurring while the score of the game is within a single goal are considered. While the score-adjusted filter may still be susceptible to score effects to some degree, much of the noise (two- or three-goal leads and deficits) have been removed from the sample.
After a certain point, using raw totals to present statistics becomes cumbersome. In this case, converting the numbers into percentages helps readers and other consumers of data more clearly picture how teams fare in a certain metric. For example, which is easier to understand: Team A is outshooting its opposition 245-204 at 5v5 or Team A controls 55% of all 5v5 shot attempts this season? Raw counts are useful in small samples, but after a certain point, they become more difficult to conceptualize and imagine.
Additional Resources for Advanced Hockey Analytics
A comprehensive, but not exhaustive, list of resources includes Evolving Hockey, Hockey Viz, MoneyPuck, Natural Stat Trick, and their accompanying datasets, glossaries, and visualizations. They also offer more detailed explanations of several of the metrics listed here for those who are more mathematically inclined.
Marko is an aspiring sportswriter with a passion for crafting stories while using a combination of the eye-test and (shudder) analytics, which is complemented by an academic background in criminology and political science.
When not covering the Colorado Avalanche and Pittsburgh Penguins for The Hockey Writers, he can also be found pouring countless hours into various sports video games franchises, indulging in science fiction novels, and taking long runs around his neighbourhood.