I get the feeling that a lot of fantasy baseball managers tend to not treat each category equally. Everybody wants to lead the league in home runs and ERA, which leads to a relative neglect of some of the less sexy categories. Throughout a more than five-month regular season, that potentially leaves a lot of wins on the table. I believe that in a lot of leagues, people tend to not pay close attention to batting average despite it being one of the easier categories to control. This article will walk you through an analysis that may help you get a boost in batting average and on base percentage for very cheap on draft day, as well as throughout the season on the waiver wire.
Check out our early consensus rankings for 2020 fantasy baseball drafts
If you happen to be in a league that uses both batting average and on base percentage, this is vitally important. We will use a correlation matrix to demonstrate this. A correlation coefficient gives you a number between -1 and +1 to show you the relationship between two variables. A positive coefficient (a number closer to +1) means that as one variable increases, so does the other. A negative coefficient (a number close to -1) means that as one variables increases, the other will decrease. The closer to either extreme the coefficient gets, the stronger the relationship between variables. I have picked six offensive categories to display this, using the last three years of player data as my source (with a minimum of 300 PA’s for a player to be used in the sample). Here is the matrix:
I get the feeling that a lot of fantasy baseball managers tend to not treat each category equally. Everybody wants to lead the league in home runs and ERA, which leads to a relative neglect of some of the less sexy categories. Throughout a more than five-month regular season, that potentially leaves a lot of wins on the table. I believe that in a lot of leagues, people tend to not pay close attention to batting average despite it being one of the easier categories to control. This article will walk you through an analysis that may help you get a boost in batting average and on base percentage for very cheap on draft day, as well as throughout the season on the waiver wire.
Check out our early consensus rankings for 2020 fantasy baseball drafts
If you happen to be in a league that uses both batting average and on base percentage, this is vitally important. We will use a correlation matrix to demonstrate this. A correlation coefficient gives you a number between -1 and +1 to show you the relationship between two variables. A positive coefficient (a number closer to +1) means that as one variable increases, so does the other. A negative coefficient (a number close to -1) means that as one variables increases, the other will decrease. The closer to either extreme the coefficient gets, the stronger the relationship between variables. I have picked six offensive categories to display this, using the last three years of player data as my source (with a minimum of 300 PA’s for a player to be used in the sample). Here is the matrix:
You can see that each variable is perfectly correlated with itself, which is obvious. There are three pretty strong correlations here: between runs and home runs, between runs batted in and home runs, and between batting average and on base percentage. For our purposes, this means that generally if you get a player that will help you in batting average, they will also help you in on base percentage. That is true for the home runs correlation as well, but it is easier to predict a player’s batting average over the course of the year than his home run total, which makes it pretty easy to build a team that is very strong in batting average and on base percentage without investing heavily in the draft to do it.
One place we can turn to in order to help us find some cheap batting average are swing rate and contact rate statistics. To say one more deadly obvious but important thing, you cannot get a hit if you do not put the ball in play. Players that strike out at a high rate have a really hard time putting up a high batting average. After the ball is put in play, the randomness begins – of course, a well-struck ball could go for an out and a poorly struck ball could fall in for a hit, but by ignoring what happens after the ball is put in play we get a clearer picture of what players are more likely to hit for a higher batting average in the future.
Contact rate is simply the percentage of the time a player puts a ball in play when he swings. Over the last three years, the league average contact rate (filtering out players that had less than 300 plate appearances) has been 77.7%. Here is a histogram to show the distribution of all player’s contact rates:
Histograms are nice because they show much more than just the minimum, mean, and maximum of a list of numbers. We can see the total distribution. You can see how few hitters actually make it above 90% and how few have been below 65% with this graphic. Here is a scatter plot displaying the relationship between contact rate and batting average over these last three seasons:
You can see a general upwards trend from left to right, which shows you the positive correlation these two metrics have. You see very few hitters with a contact rate 80% hitting below .250, and you see very few hitting over .300 with a contact rate below 70%.
We see another relationship like this when we compare swing rate (the percentage of pitches a hitter swings at) to on base percentage:
You can see here that the more the hitter swings, the lower their on base percentage is. This makes intuitive sense, of course, because a walk requires a hitter to not swing at least four times. None of this is ground-breaking insight, but nonetheless it can be used to pick out players that are cheap to acquire that will help in these two vital categories of batting average and on base percentage.
Here are last year’s top 20 players (with 300+ PA’s) in contact rate, shown with their batting average:
And here are last year’s bottom 20 in swing rate with their on base percentages:
Not all of the names on those lists are going to be relevant for fantasy, you certainly do not want to draft Logan Forsythe this year, but guys like Luis Arraez, David Fletcher, Michael Brantley, Tommy La Stella, Nicky Lopez, Hanser Alberto, Daniel Vogelbach, Tommy Pham, and Rhys Hoskins stand out given their very affordable draft day prices.
Another thing we can do when we have a large sample of data to learn from is do some predictive modeling. To give you a very rough explanation of what this entails, we basically tell a computer to look at tons of data and make predictions on one variable based on what it has learned about that variables relationship with other variables. For this example, I took all of the data from the 2017 and 2018 seasons and had the model study how the combination of swing rate and contact rate influence batting average. There are other factors, of course (like everything that happens after contact is made), but these two simple statistics are actually pretty good predictors of batting average, so it works for our purposes. I then used that “model” (what the computer learned by studying the 2017 and 2018 data) and then made predictions on the 2019 data. So the algorithm looked at every qualified hitter’s swing and contact rates and then spit out what the predicted batting average would be. Then we can compare that prediction to the actual batting average they put up and see which hitters have the biggest disparities. This is far from perfect, but it is fair to say that the players with much higher batting average predictions than their actual batting averages are likely to hit for a higher batting average in 2020 due to there being a lot of bad luck involved in their 2019 output.
Here are the top 20 fantasy relevant (I did take some guys out that should never be drafted) players that had much higher predictions than actual batting averages, the least lucky:
And here are your top 20 the other way, the most lucky:
I accomplished this all with data from Fangraphs.com and Python coding. If you would like learn more or see my data / full results, please reach out to me on Twitter @JonPgh! Thanks for reading, I hope you enjoyed it!
Check out our top 100 MLB prospects for 2020
Jon Anderson is a featured writer at FantasyPros. For more from Jon, check out his archive and follow him@JonPgh.