Methodology: aSwing+
Investigating my swing quality model.
Welcome to the article discussing the methodology behind my swing quality model, aSwing+. If you are reading this article, I assume you have already read my release article, Which Player Has The Best Swing?, but if not, I highly recommend reading that article before proceeding with this article. To make the release article more digestible for a wider swath of viewers, I decided to split the article into two parts, with this article discussing the methodology I used to create the article, rather than condensing it all into one article like my previous aStuff+ articles. This article will be in more of a conversational tone, and please feel free to message me either here on Substack or (preferably) on X if you have any questions, comments, and/or suggestions on how the model is constructed and how it can be improved upon.
As mentioned in the release article, aSwing+ takes a “kitchen sink” approach to analyzing each player’s swing quality, using various bat tracking metrics (bat speed, swing path tilt, attack angle, attack direction, contact point, swing length), along with interaction variables, and the given location, vertical approach angle, and horizontal approach angle of each pitch to predict the probability that each swing will result in an ideal contact outcome for the hitter, which I define as a Barrel. The model, a CatBoost classifier model, is trained on all batted balls that contain bat tracking data from the 2023 and 2024 seasons, with predictions made on all swings with the requisite data present from the 2023, 2024, and 2025 seasons. The goal of the aSwing+ model is to possess better descriptive and predictive performance at evaluating a player’s offensive performance than simply looking at a player’s bat speed. Simply analyzing a player’s bat speed can provide an analyst with a lot of information regarding a player’s offensive potential in a small sample size, and it was my intention to improve upon this performance by evaluating bat speed in tandem with swing path metrics to obtain a better understanding of how a player generates their offensive production.
Model Selection:
When I first sat down to plan how I was going to construct a swing quality model, I initially thought about my process through the lens of how I constructed my pitch quality model, aStuff+. A majority of machine learning-based pitch quality models in the public sphere are constructed as one of the following two models: a run value-based regression model (ex. FanGraphs’ original Stuff+, tjStuff+, and aStuff+) or an outcome-based classification model (ex. FanGraphs’ new Stuff+, PitchingBot, and PLV). The main difference between these approaches is that regression models directly predict the expected run value of each pitch, while the classification models predict the likelihood of various outcomes (such as whether or not a swing resulted in a whiff, whether or not a batted ball was a hard-hit fly ball, etc.), then mapping an expected run value onto these probabilities.
I initially attempted to construct aSwing+ as a run-value based regression model (since it is, frankly, easier to construct than multiple classification models), predicting the expected run value of each swing based on swing characteristics and pitch location, however, this model turned out to be more of a proxy for a player’s swing decisions rather than their swing quality (this initial attempt is briefly discussed at the end of my article “A Closer Look at Swing Decision Metrics”). The model likely came to this conclusion because swings typically do not generate positive value by measure of run value, and because I was training the model on all swings, it heavily penalized swings out of the zone, and too strongly rewarded swings in the zone. Therefore, I decided it was best to train the model as a classifier model on batted ball events instead.
I tested out various approaches to building the classifier model (such as predicting the probability of contact, then the probability of all batted ball events); however, I found that the best approach was to simply predict the probability that each swing would result in a Barrel. Barrels were chosen as the target variable because, as classified by Statcast, this subset of batted balls is the most productive contact that an offensive player can produce. So far this season, Barrels produce a 1.202 wOBA, while non-Barrels produce a .285 wOBA. The model is trained on only batted balls (in an attempt to remove the swing decision noise caused by training on all swings), while predictions are made on all swings.
Feature Selection:
My initial intention with the model was to use as few features as possible, as while machine learning models are powerful tools for understanding relationships between multiple variables, I also place a value on interpretability, as a model is useless if actionable insights can not be obtained from its findings. Given their high levels of year-to-year stability compared to the other bat tracking metrics, I initially attempted to construct the model using only bat speed, swing path tilt, and the location metrics as features. The model performed better than simply utilizing bat speed; however, I felt that there was still room for improvement to generate more insights into how Major League hitters generate offensive production.
Earlier this month, I was very impressed by Jake DeBoever’s release of his swing quality model, Swing+, and I strongly encourage every reader of this article to subscribe to his Substack, Smash Factor. While Jake trained his swing quality model on aggregated data, rather than pitch-by-pitch data, which aSwing+ is trained on, he used multiple bat tracking metrics as well as interaction features between the bat tracking metrics to better capture the relationships that these variables have with each other. A lot of the bat tracking metrics are intertwined with each other (for example, attack angle will increase if a player’s contact point is closer to the pitcher), and it is important to control for these factors when incorporating these features in a model. Including these additional features in the model significantly improved the descriptive and predictive performance of aSwing+.
Including the pitch location, vertical approach angle, and horizontal approach angle of each pitch are important inclusions into the model since a player’s “ideal swing path” will vary depending on which area of the zone a pitch is located. Flatter pitches at the top of the zone will require flatter swings to generate flush contact, while steeper pitches at the bottom of the zone will require steeper swings. Given the importance of pitch location when determining what a player’s ideal swing should look like, it appears unlikely to fully isolate a player’s swing decision ability from their swing quality ability, as swings on pitches well out of the strike zone will always have a lower probability of resulting in a Barrel than pitches located down the heart of the strike zone. Training the model only on batted balls has appeared to mitigate this issue, as aSwing+ displays a .05 R-squared with my new swing decision model (which will be released in the near future), compared to a ~.20 R-squared the aforementioned regression-based aSwing+ model displayed with swing decision quality.
Model Construction:
As mentioned earlier, aSwing+ is trained as a CatBoost classifier model. I settled on CatBoost due to the inclusion of a platoon state categorical variable, which I also included in the model. To improve the performance of the model, Hyperopt was used for hyperparameter tuning, and the model was constructed using a 70/15/15 split with KFolds. The model was trained using data from the 2023 and 2024 seasons, since I prefer to use a full season of unseen data (2025) for the model to make predictions on. Initially, I only trained the model on 2023 data; however, since the 2025 season is almost complete, I decided to include 2024 data in the training set, which improved the performance of the model. As demonstrated in the release article, aSwing+ has displayed a superior ability in descriptive and predictive performance regarding a player’s offensive production than simply analyzing a player’s average bat speed.
Contact Path Model:
In addition to constructing a “raw power” swing quality grade (aSwing+), I also attempted to construct a “contact ability” swing quality grade to measure a player’s bat-to-ball ability. I constructed the model using the same features that I utilized for the aSwing+ model, instead using whether or not each swing resulted in contact as the target variable. While the model did properly have contact stalwarts such as Luis Arraez and Steven Kwan as having two of the best “contact-oriented” swings in Major League Baseball, the model performed worse descriptively and predictively at all metrics than simply analyzing a player’s Contact%. I still believe that there is value in analyzing a player’s contact ability, as players with double-plus contact ability can still generate above-average offensive production with a low aSwing+ grade, and I believe that perhaps the availability of additional features such as hand position and/or spine angle would allow the model to better understand the precise path of the bat as it moves in space throughout the swing. While this version of a contact-focused swing model fell short, I believe that additional bat tracking and/or biomechanical data would make this area ripe for future exploration.
Concluding Thoughts:
Overall, creating my swing quality model, aSwing+, has been a rewarding experience, and I hope that the creation of this metric can provide more reliable insight into the factors that allow Major League Baseball hitters to generate offensive production. On a personal level, while creating my own version of Stuff+ taught myself an indescribable amount of how machine learning models work, creating a swing quality model has been an exciting experience since swing path modeling is still an open-ended topic in this part of the bat tracking era (as opposed to Stuff+ models, which largely coalesce around similar methods and value similar players). aSwing+ is only one piece of my suite of offensive metrics that I will be rolling out in the coming weeks, with additional metrics, such as aBatting+, slated for release in the immediate future. Bat tracking metrics stand poised to provide additional understanding to the public as to how offensive production is generated at the Major League level, and I am excited to see what findings models such as aSwing+ will provide over the next couple of seasons. As mentioned earlier, feel free to reach out and message me either here on Substack or (preferably) on X if you have any questions, comments, and/or suggestions on how the model is constructed and how it can be improved upon.
Thanks for reading!
Follow @MLBDailyStats_ on X and Adam Salorio on Substack for more in-depth MLB analysis. Photo credits to Jason Getz / Atlanta Journal-Constitution.



Thanks for the shout Adam! The inclusion of pitch-level data and pitch metrics is super interesting