Modeling Salary Arbitration: Part One — Offensive Players
(9/8/22) Exploring Major League Baseball's arbitration process.
Originally published on Medium on September 8, 2022.
Welcome to Part One of my multi-part series in which I explore the Major League Baseball arbitration process, and formulate a model that can be used to project salaries for arbitration-eligible players. In this article, I will focus on offensive players and create two linear regression models, in order to identify which metrics significantly contribute to a player’s arbitration value, and create a preliminary arbitration model that can be used to project offensive player’s salaries.
Background:
Major League Baseball players that have between three and six years of service time, or have two years of service time and are designated as a “Super Two”, are eligible for arbitration if they are unable to reach a contract agreement with their respective team before arbitration hearings begin at the end of the offseason. Both sides present to the arbitration panel a salary offer for the upcoming season, and the panel chooses which proposal will become the player’s salary. One storyline that has developed in recent years is that arbitration panels are more inclined to award salaries based off of a player’s traditional statistics, as opposed to advanced metrics which are often more indicative of a player’s value to their team. With this perceived discrepancy present, it is imperative that teams identify which variables influence the arbitration panel’s decisions in order to understand how players are valued by the panel, and identify which players might remain undervalued or overvalued during their years of arbitration eligibility.
Data:
In this article, I am going to focus on modeling salary arbitration for offensive players. The dataset that is being used will be all salary arbitration hearings that have been held since the beginning of the 2015 Major League Baseball season, which total 34 arbitration hearings for offensive players. In an attempt to better project awarded salaries, I have broken this larger dataset into two smaller datasets. The “Rookie” dataset consists of players who had an arbitration hearing during their first year of eligibility, and the “Veteran” dataset consists of players who had an arbitration hearing after their first year of eligibility. The dependent variable in the linear regression will be the players awarded raise from the previous season. Statistical variables in the datasets include HR, RBI, PA, BA, SB, FLD%, OPS, wRC+, WAR, Career WAR, and a dummy variable of whether or not the player was on his league’s All-Star team during the prior season.
“Rookie” Regression Model:
In the linear regression model of the 17 players who had an arbitration hearing in their first year of eligibility, there is a statistically significant relationship between the amount of a player’s awarded raise and their Career WAR, Home Runs and Plate Appearances from the previous season, and whether or not they were an All-Star during the preceding season. These four variables are able to explain 96.6% of the variance in player’s arbitration raises during their first year of eligibility. Appearing in an All-Star Game in the preceding season has a particularly significant effect on a player’s salary, as the player can expect a raise of $4,395,300 as a result. Each Home Run and Plate Appearance during the previous season is worth $48,695 and $2,905, respectively, and each 1.0 increase in Career WAR results in an expected raise of $100,057.
Mookie Betts $9,550,000 raise for the 2018 Major League Baseball season was the highest awarded raise during this time period for arbitration “Rookies”. Caleb Joseph’s $176,500 raise for the 2017 Major League Baseball season was the lowest awarded raise for arbitration “Rookies” during this time period. Mookie Betts was the only “Rookie” All-Star during this period, which is responsible for about half of his salary increase according to the linear regression model.
“Veteran” Regression Model:
In the linear regression model of the 12 players who had an arbitration hearing after their first year of eligibility, there is a statistically significant relationship between the amount of a player’s awarded raise and their Home Runs and Plate Appearances from the previous season, and whether or not they were an All-Star during the preceding season. These three variables are able to explain 90% of the variance in player’s arbitration raises during these years of eligibility. Appearing in an All-Star Game in the preceding season also has a significant effect on a “Veteran” player’s salary, as the player can expect a raise of $1,006,745 as a result. Each Home Run and Plate Appearance during the previous season is worth $53,511 and $6,080, respectively.
Adam Duvall’s $4,275,000 raise for the 2022 Major League Baseball season was the highest awarded raise during this time period for arbitration “Veterans”. Michael A. Taylor’s $850,000 raise for the 2017 Major League Baseball season was the lowest awarded raise for arbitration “Veterans” during this time period. Most of Adam Duvall’s salary increase can be attributed to the 38 home runs he hit between Miami and Atlanta during the 2021 season, which is the highest single-season total of his career.
Interpretations + Conclusion:
Overall, both of these regression models are able to explain at least 90% of the variance in player’s salaries awarded by the arbitration panel. Upon this initial exploration of player data, it appears that Home Runs, Plate Appearances, and whether or not a player was selected to the All-Star Game are the most influential variables to the arbitration panel, while Career WAR is important for players who enter their first year of eligibility. This indicates that platoon players can be vastly undervalued by the arbitration system, as they accumulate fewer plate appearances and are generally less likely to be selected for an All-Star team. Players who are able to produce offensively at a high level (wRC+/OPS+) without hitting a lot of Home Runs are also likely to be undervalued by the arbitration panel. Conversely, players who hit a lot of Home Runs but do not produce at a high level offensively in other areas are likely to be overvalued by the arbitration panel.
One player that is likely to be undervalued by the panel due to these factors is Yandy Díaz. Prior to the 2021 season, Díaz was used primarily as a utility player who provided great value to the Rays by drawing a lot of walks and producing a wRC+ of at least 115 in both 2019 and 2020. This season, Díaz has produced a wRC+ of 145 while only hitting 8 home runs and not being named an All-Star. Given the value that the arbitration panel places on Home Runs and All-Star Appearances, it is likely that the arbitration panel will award Díaz only a moderate raise in salary which is not reflective of the overall offensive impact he has on the Tampa Bay Rays.
On the other hand, a player that has been overvalued by the arbitration panel in recent years has been Adam Duvall. As mentioned earlier, Duvall hit 38 home runs in 2021, which is the highest single-season total of his career. For these efforts, Duvall was awarded a $4,275,000 raise for the 2022 season, which is the highest among arbitration cases in the “Veteran” dataset. Despite the high power totals, Duval produced a wRC+ of 103 in 2021, which indicates that his total offensive production was only 3% above league average during that season. Given the value that the arbitration panel places on Home Runs, Duvall was awarded a raise that overvalued the offensive impact he had on the Miami Marlins and Atlanta Braves in 2021.
It should also be noted that I elected to remove the 2021 arbitration hearings (held after the 2020 Major League Baseball season) from the dataset. This is due to the shortened season that took place in the 2020, which would cause disruption in the final results as it became apparent during the model construction process that arbitration panels do not place value in metrics that are collected as a percentage of plate appearances, and rather place value on cumulative totals.
In conclusion, I believe that these two regression models will serve as a strong foundation for the model I will construct and utilize to forecast arbitration raises this winter. In order to increase the strength of these models, I will next analyze the effects that some relatively unorthodox factors (such as team attendance, and overall team performance) have on the arbitration panel’s decisions, and construct similar models in order to forecast arbitration results for pitchers. After the construction of these models, we will be able to identify which variables influence the arbitration panel’s decisions in order to better understand how players are valued by the panel, and identify which players might remain undervalued or overvalued during their years of arbitration eligibility.
Follow @MLBDailyStats_ on X (Twitter) for more in-depth MLB analysis. Statistics provided by Baseball-Reference and FanGraphs.