Exploring ChatGPT: How AI Can Transform Major League Baseball Front Offices
(2/6/23) Use of AI language models, such as ChatGPT, has the potential to become the next MLB competitive advantage.
Originally published on Medium on February 6, 2023.
Artificial intelligence is quickly changing the world we live in, and seemingly every day there are new advances in the abilities that AI is able to obtain. ChatGPT is the most popular AI innovation at the moment, as other companies are racing to create their own AI language models to compete with this ground-breaking chatbot. Similar to the public introduction of the internet decades ago, artificial intelligence has the opportunity to dramatically change the way businesses and individuals complete day-to-day tasks.
These new artificial intelligence systems have the ability to become the next competitive advantage for Major League Baseball organizations. With its ability to write and interpret information similar to the way a human does, AI language models such as ChatGPT have the opportunity to automate numerous tasks, such as writing scouting reports, allowing for front offices to place more focus on other non-automated activities. This article will examine ChatGPT’s abilities to produce relevant baseball information, providing insight into how artificial intelligence can be used by Major League Baseball organizations in the future.
Background:
According to the model itself, ChatGPT is an AI language model developed by OpenAI. It is a Generative Pre-trained Transformer (GPT) model trained on a large corpus of text data to generate human-like responses to various prompts. The model has the ability to answer questions, complete tasks, generate text and much more, with high accuracy and coherence, due to the massive amount of training data and computational resources it was exposed to.
In order to get a better understanding of what ChatGPT is able to accomplish as it relates to Major League Baseball information, I decided to ask a few questions pertaining to its training set.
Me: Is FanGraphs a part of your training set?
ChatGPT: Yes, FanGraphs is part of the training set used to train me as an AI language model. OpenAI, the organization that trained me, uses a diverse and extensive collection of text from the internet to train me, which includes data from websites such as FanGraphs. As a result, I have a general understanding of the content and language used on the site, and can use that information to answer questions related to it.Me: Is FanGraphs a part of your training seChatGPT: Yes, FanGraphs is part of the training set used to train me as an AI language model. OpenAI, the organization that trained me, uses a diverse and extensive collection of text from the internet to train me, which includes data from websites such as FanGraphs. As a result, I have a general understanding of the content and language used on the site, and can use that information to answer questions related to it.
Me: Is Baseball Savant a part of your training set?
ChatGPT: Yes, Baseball Savant is part of the training set used to train me as an AI language model. OpenAI, the organization that trained me, uses a diverse and extensive collection of text from the internet to train me, which includes data from websites such as Baseball Savant. As a result, I have a general understanding of the content and language used on the site, and can use that information to answer questions related to it.
Me: What are your limitations?
ChatGPT: As an AI language model created by OpenAI, I have several limitations including:
I am trained on vast amounts of text data and can only provide information and answer questions based on the data I have seen.
I do not have personal experiences or opinions, only the knowledge and information I was trained on.
I may not always provide up-to-date information or account for recent events and advancements.
I am not capable of performing certain tasks such as image recognition or physical actions.
I may struggle to understand the context and intent behind a question or request, leading to incorrect or unsatisfactory answers.
As described by the model, ChatGPT is trained on data included on the FanGraphs and Baseball Savant websites, which is exciting! One drawback is that the publicly available research preview has limited knowledge of events that occurred after 2021, limiting our ability to get up-to-date information as it pertains to Major League Baseball.
To display the capabilities that ChatGPT has as it pertains to player evaluation, I am going to run queries asking the model to describe advanced baseball metrics, create a scouting report for a MLB hitter and pitcher, and create a scouting report on a player that has not reached the MLB level, according to ChatGPT’s training set.
Advanced Metrics:
Me: What is O-Swing%?
ChatGPT: O-Swing% (Outside Swing Percentage) is a statistic that measures the percentage of pitches thrown outside the strike zone that a batter swings at. A higher O-Swing% indicates that the batter is more aggressive at swinging at pitches outside the strike zone, while a lower O-Swing% indicates that the batter is more selective and only swings at pitches within the strike zone.
O-Swing% can be used to evaluate a batter’s plate discipline, as well as the effectiveness of a pitcher’s ability to get hitters to chase pitches outside the strike zone. A high O-Swing% for a batter can indicate that the batter is having difficulty recognizing pitches and is more likely to swing and miss or make weak contact, while a high O-Swing% for a pitcher can indicate that the pitcher is successful at getting hitters to chase pitches outside the strike zone.
Me: What is wRC+?
ChatGPT: wRC+ (Weighted Runs Created Plus) is a statistic used to evaluate a player’s offensive value relative to the league average and adjusted for park and league effects. It is expressed on a scale where 100 is average, and each point above or below 100 represents one percent better or worse than the league average.
wRC+ takes into account a player’s total offensive contributions, including batting average, on-base percentage, and power, as well as other factors like stolen bases and double plays. By using park and league adjustments, wRC+ provides a more comprehensive evaluation of a player’s offensive value compared to traditional statistics like batting average or RBIs.
For example, a player with a wRC+ of 120 means that their offensive contributions were 20% better than the league average, while a player with a wRC+ of 80 was 20% worse than the league average. A wRC+ of 150 would indicate a player’s offensive contributions were 50% better than the league average.
Me: What is FIP?
ChatGPT: FIP (Fielding Independent Pitching) is a statistic used to evaluate a pitcher’s performance, independent of their team’s defense. It is designed to estimate what a pitcher’s ERA (Earned Run Average) would look like if their defense performed averagely behind them.
FIP measures a pitcher’s effectiveness based on three factors that they have the most control over: walks, strikeouts, and home runs allowed. It does not take into account factors like batted ball location, which are influenced by the defense, but instead focuses on the pitcher’s own ability to prevent runs from scoring.
The formula for FIP is (13 * HR + 3 * (BB + HBP) — 2 * K) / IP + a constant, where HR is the number of home runs allowed, BB is the number of walks, HBP is the number of hit by pitches, K is the number of strikeouts, IP is the number of innings pitched, and the constant is used to put FIP on the same scale as ERA.
FIP is a useful tool for evaluating pitcher performance, especially when comparing pitchers across different teams with varying defensive strengths. It can also be helpful in predicting future performance, as it focuses on the pitcher’s ability to control the outcome of their pitching performance.
Pitcher Scouting Report — Jacob deGrom:
Me: Can you give me a scouting report on Jacob deGrom? I would like a description of each pitch in his pitch arsenal, and can you tell me how often he uses each pitch along with the Whiff% and xwOBA for each pitch.
ChatGPT: Jacob deGrom is a starting pitcher for the New York Mets. He is a two-time National League Cy Young Award winner and is widely regarded as one of the top pitchers in Major League Baseball.
Pitch Arsenal:
Four-Seam Fastball: deGrom’s four-seam fastball sits in the mid-90s, with excellent movement and late life. He throws it about 35% of the time, with a 2021 Whiff% of 11.7% and a 2021 xwOBA of .291.
Slider: deGrom throws a hard slider in the mid-80s with sharp, late break. It’s considered one of the best pitches in the game and he throws it approximately 30% of the time, with a 2021 Whiff% of 33.3% and a 2021 xwOBA of .169.
Changeup: deGrom’s changeup is a third pitch that sits in the mid-80s with fading action, serving as an effective off-speed offering to complement his fastball and slider. He throws it approximately 20% of the time, with a 2021 Whiff% of 24.6% and a 2021 xwOBA of .265.
Curveball: deGrom also throws a curveball in the mid-70s with 12-to-6 break, which he uses sparingly, approximately 15% of the time, with a 2021 Whiff% of 35.3% and a 2021 xwOBA of .230.
deGrom is known for his dominant stuff, excellent control, and ability to miss bats. He has a career strikeout rate of over 10 per nine innings, and he consistently ranks among the league leaders in key pitching metrics such as ERA, WHIP, and strikeout-to-walk ratio.
Hitter Scouting Reports — Mike Trout and Spencer Torkelson:
Me: Can you give me a scouting report on Mike Trout? I am interested in his hitting and fielding abilities, as well as his career wRC+.
ChatGPT: Mike Trout is widely considered one of the best players in Major League Baseball.
Hitting: Trout has a career batting average of .307, with a .416 on-base percentage and a .573 slugging percentage. He is known for his plate discipline, power, and ability to get on base consistently. Trout has won multiple American League MVP awards and has led the league in several offensive categories multiple times.
Fielding: Trout is an excellent fielder, known for his range, arm strength, and instincts in center field. He has won several Gold Gloves and is considered one of the best outfielders in the game.
Career wRC+: Trout’s career weighted runs created plus (wRC+) is 170, which is significantly higher than the league average of 100. This statistic measures a player’s total offensive value, taking into account the run-scoring environment, and Trout’s 170 wRC+ indicates that he has been 70% more productive than a league-average player over the course of his career.
Overall, Trout is a complete player with outstanding hitting and fielding abilities, and he consistently ranks among the best players in the league in terms of overall offensive production.
Me: Can you give me a scouting report on Spencer Torkelson? I am interested in his abilities.
ChatGPT: Spencer Torkelson is a baseball player who was drafted first overall by the Detroit Tigers in the 2020 Major League Baseball draft. Torkelson is primarily a first baseman, and is known for his impressive power and overall hitting ability.
As a hitter, Torkelson has a very advanced approach at the plate, with a good eye for the strike zone and the ability to make consistent contact. He has a very strong and quick swing, with plenty of raw power that generates a lot of hard-hit balls. He is also known for his ability to hit for both power and average, with the potential to hit for a high batting average and plenty of home runs.
Defensively, Torkelson is considered to be an average first baseman, with good hands and a solid arm. He moves well around the bag and has the ability to make the routine plays, but may not have the lateral range or athleticism to make difficult plays.
Overall, Spencer Torkelson is considered to be one of the top prospects in baseball, with the potential to develop into a middle-of-the-order power hitter and solid everyday player.
Concluding Thoughts:
AI language models, such as ChatGPT, can be used to automate tasks in Major League Baseball front offices, such as writing scouting reports, and provide valuable insights into player evaluation. The ChatGPT model is trained on data found on websites such as FanGraphs and Baseball Savant, and it appears safe to assume that AI language models can produce even more valuable insights if trained on private metrics, such as Stuff+.
In conclusion, AI language models are already able to produce basic Major League Baseball scouting reports, and training these models on greater amounts of data (both private and public) will allow for many front office tasks to be automated. The use of artificial intelligence will be a game-changer for Major League Baseball front offices, and the first organizations to utilize this “new frontier” efficiently will hold a competitive advantage over the rest of the league.
Follow @MLBDailyStats_ on X (Twitter) for more in-depth MLB analysis. Statistics provided by FanGraphs and Baseball Savant, courtesy of ChatGPT.





