Finding value in major league players independently has been heavily researched by the public and major league clubs, providing a vast number of advanced statistics used to place value upon players. However, one avenue that has yet to be researched in full is where the monetary value is located positionally in Major League Baseball. One trend that has become increasingly evident in the majors is the practice of moving players around to alternate defensive positions different from their true position.
Baseball has eight defensive positions: catcher, first base, second base, shortstop, third base, and the three outfield positions (left field, center field, and right field). Players in today’s game are frequently switching positions defensively to optimize the offensive production from the lineup of hitters on the team. For example, let’s say a theoretical team sees their starting third basemen sidelined for a few games due to injury. The starting 3B is a star hitter and fielder. The team’s bench third basemen can provide above-average defense but struggles to hit the baseball. In the past, teams would be more inclined to put said bench 3B into the lineup to replace their starter, hoping that the bench player could capture just some of the offensive production of the starter. However, let’s expand on this theoretical situation. Say that this team has a second basemen on the bench who offers much more offensively than the bench 3B, but struggles on defense. Clubs today are far more likely to have the bench 2B play third, despite the defensive struggles, rather than replacing the starting 3B with the backup 3B. Offense is everything, and teams will gladly sacrifice traditional defense for even the slightest boost on the hitting side.
This begs the question: Why do Major League clubs pay more for certain positions than others? In other words, why pay more for a shortstop when you can get the same production out of a second baseman who can also play shortstop? Every guaranteed dollar counts in Major League Baseball, and handing out bad contracts comes with major consequences. Therefore, it’s important to find out what positions are ‘safer bets’ to hand out major contracts to. At what positions can teams get the most ‘bang for their buck’? I sought to find where the monetary value was positionally during the 2023 MLB season.
Method
To answer this question, there are several things to establish first. I needed a representative sample of each position in the major leagues with each player’s respective AAV, or Average Annual Value. AAV is the monetary amount that a player is making on average per year from their current contract. The sample from each position consists of the top 10 players per position with the highest AAV during the 2023 season. While it’s a valid criticism to point out the limited sample size per position, I wanted to evaluate the players that best encapsulated the market for each position as determined by the teams. As you go beyond the first 10 players at each position, you start to find players who have not signed extensions or free agent contracts. These players are likely in the arbitration stage of their first contract. After players spend 3 years in the majors, they go into arbitration, which is the practice of an independent arbitrator determining what a player is worth monetarily. Whichever value the arbitrator determines is fair for both the team and the player is the salary that the player receives for that year. While arbitration is an important part of the financial side of Major League Baseball, players in arbitration do not capture the value teams place in each position. Therefore, it is best to examine the top 10 players in terms of AAV per position because they either received an extension or a free agent contract. Such values were determined by teams rather than independent arbitrators.
I sampled the top 10 catchers, first basemen, second basemen, third basemen, and shortstops with the highest AAV in 2023. Outfielders switch positions more than any other players. Therefore, I decided to consider outfielders as one position (OF), with double the sample size (20) to appropriately capture the value placed on outfielders. To find these AAV values broken down by position, I had to utilize the widely renowned sports contract database known as Spotrac. Unfortunately, Spotrac does not allow for the public to extract data directly from the website into Excel or CSV format, so I manually entered the players and their associated Average Annual Value into an Excel spreadsheet. As a matter of transparency, the players used in my sample(s) are shown below:
Once I determined my sample and the associated AAV values for each player, I had to decide what metrics from the 2023 season I would use to evaluate player performance. The first metric considered for this research is FanGraphs Wins Above Replacement (WAR). FanGraphs is a popular baseball database that stores traditional stats as well as advanced metrics for players. Wins Above Replacement, or WAR, is a common industry metric used to determine the complete value a player provides to a team. Simply put, WAR can tell someone how many more wins a player provides to a team than the average replacement-level player. FanGraphs provides the following explanation on their website:
“Wins Above Replacement (WAR) is an attempt by the sabermetric baseball community to summarize a player’s total contributions to their team in one statistic. You should always use more than one metric at a time when evaluating players, but WAR is all-inclusive and provides a useful reference point for comparing players” (Slowinski, FanGraphs, 2010).
The second metric used in this project is OPS, or in other words, on-base percentage plus slugging percentage. This is another statistic that is commonly used to place value on a player’s offensive production. On-base percentage is simply the rate at which a player gets on base, whether it’s a walk or a hit. Slugging percentage is the “total number of bases a player records per at-bat” (MLB.com). Unlike a statistic like batting average, slugging percentage properly weighs extra-base hits (doubles, triples, and home runs) more than singles. A player’s OPS is generated simply by adding his on-base percentage to his slugging percentage. After determining the respective sample sizes and the first two metrics to be used in the analysis, I could begin to process the data and draw conclusions.
Data
I wanted to use the popular data visualization tool Tableau to effectively show the correlation between offensive performance (WAR and OPS) and AAV (Average Annual Value), broken down by defensive position. It’s important to first establish how Major League spending was broken down by position in 2023, which is represented in the bar chart below.
The visualization supports the idea that the MLB market favors high-profile positions like shortstop (yellow) and outfield (green) over other positions. While spending big on shortstops and outfielders is the league trend, does the correlation to offensive production support this trend?
The best way to show correlations by position within Tableau is to create scatterplots for each position, utilizing the filter function to discard other positions. For each position’s scatterplot, AAV is located on the Y-axis and the considered metric (WAR or OPS) is located on the X-axis. I started by examining the correlation between AAV and WAR (Wins Above Replacement), as shown below:
As shown by the visualization, it is evident that most positions have no clear correlation between AAV and WAR. When it comes to catchers, first basemen, shortstops, and outfielders, the amount of money teams are spending has little effect on their production (WAR in this case). In other words, more spending at these positions does not lead to clear results. However, it is a different story when you look at second and third basemen. As shown by the trend lines in blue, there is a correlation between AAV and WAR for those positions.
When you start to take a deeper look, you can see that both positions (second and third base) have medium correlations between Average Annual Value and Wins Above Replacement. Second base has a stronger correlation than third base, considering second base’s higher R-squared value and p-value of less than .05. However, taking into account that the sample sizes for each position are relatively small, I wanted to expand my acceptable significance threshold from .05 to .1 (10% significance level). Therefore, it can also be said that third basemen have a medium correlation between AAV and WAR. Unlike the other positions considered, Major League teams can feel safer spending money on second and third basemen as opposed to other positions. The value positionally when it comes to the first performance metric considered (WAR) lies in 2B and 3B.
The second performance metric taken into consideration for this research is OPS (on-base percentage plus slugging percentage). Using the same process used for WAR, I generated a scatterplot for each position (utilizing the filter function) with AAV located on the Y-axis and OPS located on the X-axis. Broken down by position, the findings for OPS speak to the same conclusions drawn from WAR:
As shown by the scatterplots above, similar to WAR, most positions show no clear correlation between Average Annual Value and OPS. Teams looking to spend on catchers, first basemen, shortstops, and outfielders have some cause for concern when it comes to return on investment offensively. However, just like with WAR, the trend lines in red show that there is a correlation between AAV and OPS for second and third basemen specifically.
After further examination of the specific correlations, it is appropriate to say that there is a strong correlation between offensive production (OPS) and AAV when it comes to third basemen. Not only does an R-squared value above .5 indicate a strong correlation, but a p-value of approximately .016 is very strong and reflects a significance level of less than 2%. Unlike the correlations for WAR, second basemen have less of a correlation between OPS and AAV than third basemen. However, an R-squared value of approximately .33 and a p-value less than the significance level set for this research (.1, 10% significance level) would be classified as a medium correlation. These correlations between AAV and OPS for second and third basemen support the findings from WAR: teams returned more offensive value spending on those two positions as opposed to other positions.
Further Validation: Statcast Data
Drawing conclusions on where value was positionally in 2023 requires more support and validation than the evaluation of two metrics and their respective correlations to AAV. While WAR and OPS are very strong metrics to consider when it comes to production, I wanted to introduce a third level of data to further support my findings. This is why I implemented Statcast data into the analysis. Introduced in 2015 powered by Google Cloud, Statcast is a revolutionary form of baseball data that utilizes tracking software and cameras precisely located inside stadiums to deliver important insights about player performance. In a study using Statcast early on in 2016, IEEE Computer Graphics and Applications provided this explanation:
“[Statcast is] a system that uses player and ball location as well as semantically meaningful game events to capture games with unprecedented detail. The Statcast Metrics Engine is a key component of Stat-Cast that uses discrete locations across time to reconstruct entire baseball games. This enables the computation of new player statistics, such as “route efficiency” or “lead distance,” which allow for more detailed and accurate analyses of player and team performances” (Lage, Ono, Cervone, et al., 2016).
In the eight years since its inception, Statcast has provided many metrics used by Major League teams to evaluate player performance. When it comes to offensive production, Statcast has three useful metrics I wanted to include in my analysis: barrel rate, exit velocity, and hard-hit rate. These three metrics can tell a lot about the quality of contact a player makes when they hit the baseball.
Another component of Statcast data is expected stats. Expected stats are derived by disregarding variables that a hitter can’t control like defense. For example, let’s say a theoretical hitter hits a ball hard down the third baseline, and the opposing team’s third basemen makes a great defensive play to record an out. That play would normally result in a hit, but the third basemen made an unexpected catch on defense. Therefore, while that hitter’s actual batting average (BA) would go down, his expected batting average (xBA) would go up, because usually that play results in a hit. Expected stats are a great way to ‘read between the lines’ of actual numbers to account for players who are having a genuinely unlucky season.
To incorporate Statcast into my analysis, I extracted 6 choice metrics from the Statcast portal of FanGraphs to reflect offensive production in 2023: barrel rate, exit velocity, hard-hit rate, expected batting average (xBA), expected slugging percentage (xSLG), and expected weighted on-base average (xwOBA). After pulling these metrics from FanGraphs via CSV format, I joined that data set in Tableau with the data set for Average Annual Value (AAV). Next, I broke that data set down into two separate sheets filtered by position: one sheet representing the two highest-paid positions (shortstops and outfielders) as the sample, and another representing second and third basemen with their strong correlations to WAR and OPS as the sample. From there, I utilized the crosstab-to-Excel function within Tableau for each sheet to get my data into Excel format. Finally, to find all six correlations between each Statcast metric and Average Annual Value simultaneously, I placed each Excel file into RapidMiner Studio to utilize its correlation matrix function. Providing the complimentary pairwise table related to AAV, here is what I found:
Shortstops and outfielders (top), who were paid significantly more on average in 2023 compared to other positions, have small to medium correlations between their AAV and their offensive production via Statcast. On the other hand, when it comes to second and third basemen (bottom), who are paid significantly less, the correlation matrix shows medium to strong correlations between AAV and their offensive production via Statcast. It is even more telling when you look at the expected stats for second and third basemen, which reflect the strongest correlations with AAV. Therefore, Statcast data also supports the conclusion that there was more value in spending on second and third basemen in 2023 as opposed to other, more expensive positions.
Conclusion
This research is especially relevant today. Major League Baseball is currently in the offseason, where players without contracts are free to sign with any team. Clubs are also more likely to hand out extensions to their current players during the offseason. Therefore, this time of the year is when the MLB spending market shifts. The insights provided by my research could indicate an incoming shift in the market for second and third basemen. In other words, in the coming weeks, we could see an uptick in spending for second and third basemen to appropriately reflect the correlation between Average Annual Value and offensive production. This concept is two-fold as well; because of the weak correlation between spending and offensive production for more expensive positions like shortstop and outfield, we could see a decrease in spending on those positions.
This research could certainly be expanded upon in future studies. While I specifically considered offensive production from 2023, the timeline could be expanded to reflect multiple years. For this project, however, I wanted to capture where the value was positionally in the MLB as it stands currently, so data from the most recent season (2023) was most relevant. Another component that could be implemented into a similar study would be pitcher data as opposed to hitter data. Because pitchers cannot be broken down positionally (outside of relief and starting), I thought breaking down hitters by their positions would be more insightful.
The analytical insights provided by Tableau and RapidMiner help to answer the research question: Where was the monetary value positionally in Major League Baseball during the 2023 season? According to the correlations between spending in the form of Average Annual Value (AAV) and offensive production (WAR, OPS, Statcast), the positions in which teams saw the most return on investment in 2023 were second and third basemen.
References
Fangraphs Baseball. (n.d.). https://www.fangraphs.com/
Lage, M., Ono, J. P., Cervone, D., Chiang, J., Dietrich, C., & Silva, C. T. (2016). Statcast Dashboard: Exploration of Spatiotemporal Baseball Data. IEEE Computer Graphics & Applications, 36(5), 28–37. https://doi.org/10.1109/MCG.2016.101
MLB active player contracts. Spotrac.com. (n.d.). https://www.spotrac.com/mlb/contracts
Slowinski, P. (2010, February 15). What is war?. Sabermetrics Library. https://library.fangraphs.com/misc/war/
Slugging percentage (SLG): Glossary. MLB.com. (n.d.).
"Xander Bogaerts 7.9.2023" via Casey Aguinaldo licensed under CC BY-SA 4.0
Comments