aStuff+ v2 Preview
Incorporating arsenal effects into pitch modeling, along with other notable adjustments.
For those wondering, I have not fallen off of the face of the Earth over the past month, and instead, I have spent some time updating and making further adjustments to my pitch quality model, aStuff+. What was initially intended to be a month-long project has quickly turned into an offseason-long project, and I am looking forward to unveiling aStuff+ v2, alongside new metrics aLocation+, aPitching+, and arsenal-infused aPitching+ in the coming months.
In the meantime, I believe that it is appropriate to present a preview of the updates I have made to the aStuff+ model. As discussed in my previous articles, pitch quality models are important and powerful tools that can be used to evaluate the quality of a pitcher’s pitches in small sample sizes, however, it is important to continuously update these models to both align with the current state of public research, as well as to capture season-to-season changes in overall league trends. As shown by Driveline’s Stuff+ model, the pitch quality of sweepers has decreased and the pitch quality of sinkers has increased in recent seasons, however, this changing trend would not have been captured if Driveline had not updated their model at the beginning of last season.
While my personal goal for aStuff+ v1 was simply to have a model that I could use for stuff-related analysis (such as the effects that aStuff+ has on park factors, offensive performance, and handedness bias), my goal for aStuff+ v2 has been to maximize the descriptive and predictive performance of the model. Therefore, I took a more deliberate process in selecting variables that will be used in the model which will be discussed in further detail later in this article. In addition, I have undertaken the process of creating additional pitch quality models, such as aLocation+ and aPitching+, that can provide additional value to analysts when evaluating pitching production.
As a warning, this article may get technically dense at times and might seem incomplete in certain areas, as I am still working on fine-tuning elements of the various pitch models. If you are a reader who is unfamiliar with pitch quality models, I would highly recommend reading my introductory article on aStuff+ and/or FanGraphs’s Stuff+ primer for an explanation of the value of pitch quality modeling. With that being said, onto the preview of aStuff v2!
aStuff+ v2:
A majority of machine learning-based pitch quality models in the public sphere are constructed as one of the following two models: a regression model (ex. FanGraphs’s Stuff+, tjStuff+, and aStuff+ v1) or a classification model (ex. PitchingBot and PLV). Both types of models use the movement, spin, and velocity components of each pitch to predict outcomes, however, the main difference is that regression models predict the expected run value of each pitch, while the classification models predict the likelihood of various events occurring (such as whether or not a swing resulted in a whiff, whether or not a batted ball was a hard hit fly ball, etc.) then mapping an expected run value onto these probabilities.
My initial approach was to re-train aStuff+ v2 as a classification model, similar to PitchingBot and the revamped, soon-to-be-released FanGraphs’s Stuff+ v2. I believe that these models likely handle balls in play more effectively, given that they are directly calculating the probability of these events occurring, and I believe that the inclusion of expected event metrics such as xWhiff%, xGB%, etc. can be a valuable tool in evaluating a pitcher’s arsenal and designing pitches that can achieve specific objectives. While I believe that this approach holds promise in improving the performance of pitch quality models, this framework was too computational and time-intensive for me to incorporate within aStuff+ v2, however, I am intrigued in taking this modeling approach with aStuff+ v3 in the future.
As mentioned earlier, I took a more deliberate approach in selecting variables that will be used in the updated aStuff+ model. In v1, I simply utilized metrics that were commonly used in other pitch quality models such as velocity, spin rate, induced vertical break, horizontal movement, velocity and movement differences from a pitcher’s primary fastball, etc. Given velocity’s impact on the vertical and horizontal movement of certain pitches (ex. slower four-seam fastballs experience more induced vertical break), I replaced the aforementioned movement variables with location-adjusted vertical and horizontal acceleration. Also, I included estimated spin efficiency in the model to better represent the effect that seam-shifted wake has on pitch movement. Lastly, I included an altitude adjustment within the model to partially account for the environmental effects that can have an impact on pitch quality, as well as a variable indicating the handedness of the batter to properly account for platoon splits.
When testing different models with various combinations of variables, I discovered that the variables measuring the velocity and movement differences of each pitch from the pitcher’s primary fastball often decreased the descriptive and predictive performance of the model. After running various models and taking into deep consideration the implications of the removal of these variables from the model, I decided to only include the horizontal acceleration difference from the primary fastball as a feature in the model. Why did I decide to make this altercation? First, only incorporating this variable in the model improves the model’s performance relative to any other combination of “difference from fastball” metrics (including dropping these “difference” variables from the model entirety). Second, I believe that including this variable in the model helps capture certain pitchers' ability to generate sub-optimal contact, given that more horizontal movement than expected is typically a driver of weak contact. A more detailed explanation of incorporating this variable will be included in the aStuff+ v2 rollout, however, given the performance of the model with this variable included, I believe that the inclusion of the variable will be the best approach to take moving forward.
The table above depicts the 2024 aStuff+ v2 leaders, grouped by starting and relief pitchers, with a minimum of 100 pitches thrown. Mason Montgomery, Michael Kopech, and Griffin Jax are the leaders among relief pitchers, while Framber Valdez, Jacob deGrom, and Tyler Glasnow lead in aStuff+ among starting pitchers. The presence of Mason Miller on the leaderboard for relief pitchers resolves an anecdotal concern I had about aStuff+ v1 (which graded Miller as a 106 aStuff+) underrating pitchers with exceptional velocity, and the inclusion of Kevin Kelly, Framber Valdez, and Cristopher Sánchez on the leaderboards provides optimism that this model takes into account certain pitcher’s ability to generate sub-optimal contact. With his 108 aStuff+ grading as 4th best amongst starting pitchers, Joe Boyle could be a candidate to experience improved production next season if he can even just marginally improve his command issues.
Similar to aStuff+ v1, I trained aStuff+ v2 on data from the 2021, 2022, and 2023 Major League Baseball seasons. I intentionally left data from the 2024 season out of the training set, so I could evaluate how well the model performed on an entire season’s worth of unseen data. Not evaluating the model’s descriptive and predictive performance on unseen data was, frankly, a mistake in the construction of aStuff+ v1, as it left the model vulnerable to overfitting (which, luckily, was not an issue). To evaluate the model’s performance, I conducted descriptive and predictive correlations of aStuff+ to various pitch metrics among pitchers with at least 500 pitches thrown.
As shown by the table above, aStuff+ v2 demonstrates an improvement in descriptive and predictive performance over FGStuff+ and aStuff+ v1 as it pertains to ERA and FIP, while FGStuff+ possesses an advantage at describing and predicting a pitcher’s K-BB%. In addition, aStuff+ v2 can predict itself year-to-year better than FGStuff+ and aStuff+ v1.More analysis regarding the descriptive and predictive performance of aStuff+ v2 at varying sample sizes and with different metrics, as well as the model’s point of stabilization will be discussed in further detail when the model is fully rolled out in a couple of months. Given these preliminary findings, I am confident that the performance of aStuff+ v2 will be an improvement in quality over aStuff+ v1 and provide more valuable insight into the effects that pitch quality has on overall pitching performance.
The Challenges of Modeling Command:
Arguably the most difficult part of the model revamp process was attempting to create a model that could evaluate the quality of each pitcher’s command. While the location of each pitch is the most important feature in a location-aware pitch quality model, it is the variable that is the least stable year-to-year. My attempts at creating a FanGraphs-style Location+ model resulted in the year-to-year correlations of Location+ to metrics such as ERA and FIP being near zero.In recent years, there has been intriguing research done regarding how to improve command modeling, from a pitch trajectory density estimation presentation created by Scott Powers and Vicente Iglesias to an intended zone-based metric created by Driveline Baseball. While both of these methods display promise in improving the public’s understanding of how to evaluate command, these processes are either too statistically advanced (in the case of Powers and Iglesias’s research) or involve the use of private data (in the case of Driveline’s research) that prevents me from utilizing these methods in aLocation+. I am also very intrigued by research in pitcher release angles conducted by Michael Rosen at FanGraphs, however, including release angle variables in the model instead of pitch location coordinates results in a decline in model performance.
Instead, I have decided to take a bucketing approach to model pitch locations, inspired by research conducted by Alex Chamberlain on FanGraphs and Takeshi Kono on X. This approach analyzes the general locations of where pitches are located rather than precise coordinates, which increases the predictive performance of both Location+ and Pitching+. While stuff is still much “stickier” year-to-year than location using this method, the improvement in predictive performance justifies the implementation of this new approach in my opinion.
Arsenal Effects:
In my opinion, the next frontier of pitch modeling is the ability to model how well certain pitches interact with each other within a given pitcher’s arsenal. While existing pitch quality models, such as aStuff+, are incredibly valuable tools to use in evaluating a pitcher’s production, “stuff” models in the public sphere are largely optimized, with nearly all of the existing public models valuing similar players. While there does appear to be room for growth in terms of modeling command, I believe that an improved ability to model arsenal effects is the next frontier of pitch modeling and can provide analysts with a clearer understanding of how some pitchers might over or underperform the results expected by existing models.
The concept of modeling arsenal effects was first introduced to the public sphere via a presentation at Saberseminar by Marek Ramillo and Jack Lambert of Driveline Baseball this past August. In the presentation, Ramillo and Lambert unveiled two new metrics, Mix+ and Match+. Mix+ attempts to measure the breadth of a pitcher’s arsenal (how large of a velocity and movement band does a pitcher’s arsenal cover), while Match+ attempts to measure how similar a pitcher’s pitches look to the opposing hitter at the hitter’s estimated decision point. Mix+ and Match+ for each pitcher are then combined to create an overall arsenal metric, which can be utilized to grade how well a pitcher’s arsenal is constructed.
Since this presentation, similar research has been conducted by Maxwell Resnick on pitch tunneling and Stephen Sutton-Brown of Baseball Prospectus on a different approach to quantifying arsenal effects. In constructing my own arsenal metrics, aMix+ and aMatch+, I decided to take an approach more in line with Sutton-Brown’s method, which measures arsenal breadth by calculating the spread of each pitcher’s velocity and movement distribution, while measuring each pitch’s tunneling ability by calculating the probability that a pitch will result in a given pitch type at the hitter’s decision point, given the pitch types present in the pitcher’s arsenal.
More details regarding how these models are created will be discussed in the aStuff+ v2 introduction article, however, a preliminary look at the aforementioned leaderboards reveals that these models show promise in measuring arsenal breadth and pitch similarity. Pitchers such as Chris Bassitt, Bowden Francis, and Max Fried are known for having wide arsenals, which are captured in aMix+, while pitchers such as Sonny Gray, Zack Wheeler, and Tobias Myers are known for possessing pitches that look similar to the hitter at decision point, which is captured in aMatch+.
At the present moment, incorporating these variables into Pitching+ has resulted in an improvement in descriptive performance, particularly for starting pitchers, and I am planning on releasing the tentatively named “arsenal-infused aPitching+” during the rollout of the new models. Modeling how well certain pitches interact with each other within a given pitcher’s arsenal is an exciting development in pitch modeling, and can provide analysts with a clearer understanding of how some pitchers might over or underperform the results expected by existing pitch quality models.
Concluding Thoughts:
In conclusion, I am excited about the release of aStuff+ v2 as well as the new models aLocation+, aPitching+, and arsenal-infused aPitching+. Between now and the release date, I will be working on updating my existing stuff-based research utilizing the new model and hopefully (however, no guarantees) create a dashboard app that can be used to easily view each pitcher’s modeling metrics. As mentioned earlier, I am expecting the new pitch models to be released in a “mega article” in March, no later than the domestic Opening Day of March 27th. I have also been exploring methods to quantify sequencing effects, however, the inclusion of these variables in a Pitching+ model currently results in a decline in both descriptive and predictive performance. In addition, it should also be noted that I am also planning on adding 2024 pitch data to the training sets of the models around May or June. Upon inclusion of this data in the training set, I would expect Paul Skenes’s pitch quality grades to improve, as well as an improvement in evaluating other unique movement profiles such as Kumar Rocker’s “death ball”.
The public sphere has seen an explosion in pitch modeling analysis over the past couple of seasons which has led to a greater understanding of how to evaluate pitching production, and I believe that the rumored open-sourcing of the new FanGraphs Stuff+ model will only lead to more valuable research conducted by public analysts across the internet. With the aforementioned adjustments to aStuff+, as well as the inclusion of arsenal metrics in pitch modeling, I believe that these pitch models will continue to be valuable tools in analyzing pitching production for seasons to come.
Thanks for reading!
Follow @MLBDailyStats_ on X and Adam Salorio on Substack for more in-depth analysis. Photo credits to Ezra Shaw/Getty Images.