We rate the accuracy of baseball projections by comparing each source’s player predictions to the actual statistical outcomes. Our accuracy results are based on the following steps:
Step 1: Collect the right data.
Our analysis aims to determine who provides the most accurate preseason projections. In other words, we look at projections that offer a full season outlook. We grab these projections just prior to the first pitch on Opening Day to ensure we’re analyzing each source’s final set of predictions. In 2016, a total of 10 projection sources were evaluated for our study. Please note that our projections assessment is completely separate from our fantasy baseball rankings study.
Step 2: Determine the player pool.
Our assessment focuses on 398 players as determined by preseason rankings and actual in-season performance. This helps to ensure that we provide an apples to apples comparison by evaluating each projection source on the same set of players. It also ensures that our assessment evaluates players that had limited preseason expectations yet turned into fantasy difference-makers.
Step 3: Calculate each projection source’s category errors.
By “category”, we are referring to the 5×5 standard categories in a fantasy baseball league: HR, AVG, RBI, Runs, & SB for Hitters and W, K, SVs, ERA, & WHIP for Pitchers. A ‘category error’ is defined as the gap between a source’s projection for a player and that player’s actual output. We calculate the errors for each 5×5 category across the 300 players that are evaluated. For example, if Mike Trout hit 41 home runs and the projection source (e.g. ESPN) predicted 37 home runs, the error for this prediction is the absolute value of Trout’s Actual HRs (41) – Projected HRs (37) = 4. A similar calculation of projected vs. actual performance is completed for Trout’s RBI, SB, Runs, and Batting Average.
Step 4: Calculate category scores for each projection source.
Before we can generate overall accuracy ratings, we need to first convert each set of category errors into a common metric. If a projection source misses a player’s home run total by 4 and they also miss the player’s RBI total by 4, we cannot simply add the two together as 8 total errors. In other words, predicting 37 home runs for someone who hits 41 home runs is not the same error as predicting 86 RBI for a player that drives in 90 RBI. These are 2 different metrics with unique data sets. To address this, we need to convert the projection errors from Step 3 into a common metric across categories. The formula we use for this is the following:
Category Score = ((Average # of Errors at Category across all Sources – # of Errors at Category for Individual Source) / (Standard Deviation of Category Errors across all Sources))
As an example, let’s again say that ESPN predicted Mike Trout would hit 37 home runs. This would be an error of 4 since Trout hit 41 home runs. We compare this to the average # of errors that all 15 projection sources evaluated had for Trout’s HRs. Let’s say this value is 8 (i.e. the Avg HR projection was 33), which means that ESPN had a better than average projection for Trout. We also calculate the Standard Deviation of Errors across the sources. Let’s say this value is 1. A smaller value here means there was less variance in the predictions for Trout’s home runs (i.e. the predictions were close to the average prediction for Trout). That would mean our inputs into the Category Score formula are the following:
ESPN’s HR category score for Mike Trout = ((8-4) / (1)) = 4
For our purposes, a positive, larger number is a good thing for the accuracy of the individual source (ESPN). The positive score reflects that ESPN had a smaller error for their Trout HR prediction than the average projection source. A small Standard Deviation results in a higher score for ESPN because it’s an indication that ESPN separated itself from a clustered pack of HR predictions. From here, we run this calculation for ESPN’s HR predictions across all 300 players analyzed. This combined performance for all of the players represents ESPN’s HR category score (a larger number is desirable for ESPN). We then run this calculation for the other 5×5 categories and repeat the process for the remaining 14 projection sources evaluated.
Step 5: Sum the category scores to generate Overall scores.
We now have 5×5 category scores for each projection source that represent a common metric. To get to an Overall Score, we add the values for each source across the categories (HR, RBI, Runs, etc). We can then rank each Projection source from highest to lowest based on their scores. A larger number is once again desirable since it means the projection source had fewer errors across the stat categories relative to the other sources evaluated. In addition to the Overall Score, we’re also able to break out Hitter and Pitcher accuracy standings by isolating the relevant category score sums for each position group.
We hope this detailed overview was helpful. Thanks for taking an interest in reading through it!