Within the sabermetrics community, there is substantial interest in the relationship between an assortment of pitch-characteristics and the ensuing outcome. To date, most pitch-related analyses have focused on a small subset of characteristics or on transformations of the outcome-space. We take a comprehensive approach to pitch analysis by estimating the distribution of the pitch-outcome – the set of basic events, Ball, Called Strike, Single, etc. – conditional on an expansive set of pitch-characteristics. By carefully decomposing the outcome-space, we develop a tractable method for estimating the pitch-outcome distribution conditional on a high-dimensional, mixed covariate vector. The conditional pitch-outcome (CPO) distribution estimator is a useful tool as it can sift through the noise of a typical baseball game. Due to the inherent randomness in the outcomes of a baseball game, it can be hard to assess a pitcher’s performance based on a limited number of innings. If a pitcher gives up three home runs in a start, one might wonder whether he was truly making bad pitches or if he simply the victim of bad luck. The CPO distribution can provide an estimate of how many home runs the pitcher should have given up based on the characteristics of the pitches he threw. Perhaps the most important application of the CPO distribution is that it can be used to assign novel WAR estimates to pitchers. The three major WAR estimators are known to be highly correlated because they are predominantly determined by the outcomes of pitches. However, the pitch-based WAR estimator is often substantially different from the other major WAR estimators. Moreover, this estimator preserves a much higher autocorrelation from one year to the next making it a valuable tool for projection.
The baseball industry’s current system of classifying pitches allows for so much variation within each pitch type that two very different pitches can be, and often are, identified as the same pitch type. Though this simplicity may be beneficial for the common fan, another level of specificity in pitch classification would provide benefits for scouting, player development, and player evaluation and comparison. Using TrackMan data and an unsupervised machine learning classification model, my research shows that classifying pitches based on their characteristics, and doing away with our current notions of pitch types like "curveballs" and "sliders" would have wide-reaching benefits for analysts, coaches, and players alike.
This paper examines the relationship between the effective tunnel distance and whiff rate(%) on each consecutive two pitch sequences. The word ‘Pitch Design’ became of the most popular topic in basebell industry recently, but no one ever told about ‘most effective distance’ for deceive batter on two concusecutive pitches at bat, which is called ‘pitch-sequence’. By using piece-wise segmentation method, the author found the most effective distance for inducing batters’ swing and miss on pitch-sequences.
Red Sox Assistant Hitting Coach Andy Barkett answers your questions about the Red Sox offense in 2019.
Who doesn’t remember Tom Hanks observation in the iconic movie A League of Their Own: “Of course its hard, the hard is what makes it great!!” Join us to hear from some of today’s women on the field: Perry Barber has been umpiring baseball for almost forty years. She is the first and only woman so far to umpire major league exhibitions in both the US and Japan and assembled the first four-woman crew to umpire a major league spring training game; Robin Wallace is the first professional female baseball scout hired by MLB in its scouting bureau since 1945 and has served as a technical commissioner and coach for the World Baseball Softball Confederation (WBSC); Donna Mills played in 3 Women’s Baseball World Cup championships with Gold Medal wins in two of them. Kelly Rodman is the first woman scout hired by the NY Yankees organization. Kids in many Latin countries still remember Kelly as someone that gave all of herself in helping them learn the game; Elizabeth (Liz) Benn is currently Coordinator, Labor, Diversity, and Baseball Development at MLB’s Office of the Commissioner. The discussion includes how documented and analyzed umpire performance can impact player performance and game results; the connection of how science, statistics, mental skills and performance are factored into scouting; the playing opportunities for women in the national and international scene; the goals of MLB to promote training and opportunities in umpiring, coaching, scouting, professional development and labor relations for women and to answer the time worn question of “How does it feel to be one of the only women in the room”.
The new Statcast metric of Outfielder Jump tells the story of the first three seconds of a play, but how do those three seconds affect the rest? To quantify how jump influences a play, I built a catch probability model for distance and time left after jump, and will demonstrate the drastic effect jump can have on a play using illustrations from Jackie Bradley Jr.
Major League teams have begun using the four-outfielder alignment as a defensive strategy over the past few years. Baseball Info Solutions (BIS) started tracking such alignments in 2018 and has observed a significant increase in four-outfielder usage in 2019. This presentation will explore that data, looking at which teams have used the strategy (such as the Cincinnati Reds and Tampa Bay Rays), which batters have faced it (such as Joey Gallo and Matt Carpenter), and what the results have been to date. It will also explore a framework that BIS has devised to determine which batters are potential candidates for the strategy based on batted ball tendencies and the effects of opposing pitchers.
The defensive value of a first baseman has generally been difficult to measure. Even harder is determining the abilities that contribute most to good defense. Using FIELDf/x ball- and player-tracking data ([x, y, z] coordinates at 20 or 30Hz), we look at MiLB first-basemen and quantify receiving ability for first base put-outs. We rank players using a Receiving Ability Score (RAS), and, using 3-D visualizations, consider such skills as a player’s reach, ability to field hops etc. From this, we determine the attributes most strongly associated with receiving ability.
What information goes into a batter's decision to swing? Do batters only focus on the current pitch, or do previous events and game context tilt the scale in one direction or another? To study this with the 2018 statcast dataset, we used batter/pitcher handedness, physical pitch characteristics, and game context to predict swing events. Using decision trees, we created models that were ~77% accurate in predicting a batter's swing decision, and identified pitch and count variables which were most predictive. Additionally, we found that historical information within the at-bat was not predictive of the batter's current swing decision. These models can be refined to identify the tendencies of individual batters as a whole, or versus individual pitchers.
Bunting is down in Major League Baseball recently. This decline is generally due to simply looking at the change in expected runs for typical bunting scenarios. However, run expectancy varies greatly from batter to batter and across various potential bunting scenarios. We aim to better understand the heterogeneous treatment effect of bunting. Heterogeneity comes with respect to different game scenarios (runners on which bases, number of outs, inning, score difference etc) and types of hitters (OPS, speed, bunting ability, etc). We estimate the effect of bunting among those who bunted using Bayesian Additive Regression Trees (BART) as well as propensity score methods (matching and inverse weighting). We show that there are certain scenarios where bunting is advantageous even if the overall change in run expectancy is negative.
This research builds off of the concept of Barrel Zones, but focuses specifically on home runs for each ballpark, and incorporate directional data from Statcast. In addition, this research looks into who would gain or lose the most home runs from playing in each of the ballpark to determine which player would be best fit to play at each ballpark. I examine the exit velocities and launch angles that are needed to hit a home run in each launch direction of each MLB ballpark. I define the combination of exit velocity, launch angle, and launch direction necessary to hit a home run in a specific ballpark as the “Ballpark Specific Home Run Zone”. I visualize data for every home run hit in the 2017 and 2018 season using cylindrical coordinate system. I superimpose ballpark dimensions onto the visualization to compare how they correspond to the Ballpark Specific Home Run Zones. I then generate a dataset of all hits with a potential of becoming a home run. I apply the Ballpark Specific Home Run Zone to the dataset to determine players who would have benefited the most from playing at a specific ballpark, and players who would have been at a disadvantage from playing at a specific ballpark.
As a NCAA DIII softball player, I am immensely intrigued by my uncanny ability to swing and miss. I see the ball, am ready to swing, I swing, and then whiff. My intrigue here is replicated in Major League baseball as strikeouts continue to climb at a historic pace. Baseball players are not DIII athletes. They have honed their skills for years and are paid to hit the ball, why are they swinging and missing? Obviously its not on purpose. A player may forfeit plate discipline for power and pitchers these days are throwing harder and harder. What I want to research, though, is what it is that a batter sees that convinces him to swing in the first place. Mike Trout and Joey Gallo are both professional players and, being in the same league, they face similar pitchers. Why is it that one of them rarely swings and misses and the other nearly makes a sport of it. Is it arm slots? Spin rates? Perceived velocity vs actual velocity? What does the batter’s mindset have to be to not swing and miss? With this understanding, teams can better predict what pitchers certain batters should face, as well as inform the coaching staff of a potential reason a bad habit has a formed and how to go about fixing it.
In 2017, we saw an unprecedented increase in home runs, ultimately setting a new season record of 6,105. It was shown that this Home Run Surge was caused by a change to the baseball—most likely the introduction of thicker laces, which inadvertently produced a ball that was more spherically-symmetric, and hence more aerodynamic. As a byproduct, increased lace thickness created “rougher” seams, leading to a massive spike in pitcher blister injuries. This year, the home run rate has soared again, and once again it appears related to a change in baseball construction. However, this time the culprit is not thicker laces. While the drag coefficient has decreased, blister rates have not increased, and pitchers league-wide have described this season’s ball as feeling “different.” In addition, both walk- and hit-by-pitch rates are up, suggesting that the new ball is affecting command and control. For this study, I disassemble a sample of 2019 baseballs, measuring properties of the ball and the construction materials; I then compare my findings with previous analyses of pre-2015 and 2016-2018 balls. My results show that aspects of the baseball are indeed different and that these statistically-significant differences could account for lower drag and a higher home run rate. I discover further changes that provide an explanation for the decrease in pitcher blisters and the increase in command-and-control issues. Evidence suggests that these changes are not unexpected, but are the result of improved quality control and a concerted effort to mitigate pitcher blisters.
Baseball depends on the feel of the ball, the way it travels through the air and its response from impact with a bat. While many players, coaches and fans have subjective opinions on the ball, laboratory tests are needed to quantify its properties. Some laboratory ball tests have been used for decades, largely unchanged; others have evolved and novel methods are being developed to measure new properties. This presentation will review new and old methods to test baseballs, with an emphasis on identifying changes to the ball, observed after the 2015 All Star game. The laboratory tests have shown good agreement with anecdotal observations (the ball is traveling further) and tracking measurements during play (the ball has less drag). This has led to the conclusion that the aerodynamic drag of the ball has changed. While small changes in the ball drag can have a large effect on offense, there is no aerodynamic specification for baseballs or even a standard test method to quantify it. Complicating matters further, baseballs are made from natural materials and hand sewn. The result is that the ball-to-ball variation in baseball drag is much larger than the small change in average drag needed to cause a significant change in offense. Large sample sizes, therefore, are needed to separate systematic changes in the ball from natural ball-to-ball variation. Novel methods to measure ball drag and potential causes for the change in drag will be discussed.
It is obvious to any observer of baseball that the aerodynamics of the ball are important, both for pitched and batted balls. Much has been written about the well-known Magnus Effect, or the force on a moving ball due to its rotation. Less is known about forces due to the wake of the ball. Baseballs are “bluff bodies,” meaning that the majority of their drag is due to the wake of the ball rather than friction on their skin. Baseball seams make baseballs very interesting when compared to other sports balls. They play two roles: As many have speculated, when located on the front of the ball, they can cause “laminar” flow to become turbulent, altering the wake of the ball. More surprisingly, when located on the back of the ball, they can also modify the location where the wake of the ball forms and can make the wake (and thus the force on the ball) asymmetric, leading to break. In this talk, we will discuss these effects and the possibility of the existence of the “laminar express” 2-seam fastball.
In this talk, I will discuss the latest developments in my efforts to develop a physics-based 3-dimensional model for the ball-bat collision, aided by laboratory experiments that help pin down the parameters of the model. One goal is to relate information about the ball-bat contact (e.g., the orientation, speed, and direction of the bat and the speed, direction, and spin of the pitch) to the launch parameters of the batted ball, including exit velocity, launch angle, direction, spin rate, and spin axis. Some examples will be presented that help elucidate the general features, particularly the factors that determine the launch angle.
Due to the dramatic change in the rate of home run hitting, Major League Baseball in 2017 commissioned a group of scientists to explore and suggest reasons for this phenomenon. We revisit the statistical work from the committee report and discuss changes in home run hitting using Statcast data from the 2018 and 2019 seasons.
In this presentation we will examine a mathematical approach to positioning defensive players. In a process that used ideas from geometry, probability, and integer programming, we worked on an undergraduate research project to model particular baseball fields and optimize positioning for particular batters on those fields. More specifically, we discretized locations on the field and developed an optimization model that considers hitter tendencies, fielding ability at various positions, and desirability of preventing big plays. In our talk we plan to discuss our challenges, our basic assumptions, our current results, and the potential uses of our ideas.
It is widely known that players have increasingly been hitting fly balls to beat the infield shift, but little has been done to show whether or not players have tried to hit balls to the opposite field more against the shift on the ground. My talk will focus on an analysis of the horizontal angle for every ground ball put in play against the shift from 2015-2018, which statistically proves that affected batters have improved at hitting to the opposite field. This research also nullifies the possibility that the change in spray chart patterns is due to teams shifting on more than just strictly pull hitters. This fall, I will begin a year-long thesis investigating how players have adapted to the shift and which types of players have had more or less success doing so. Before starting my thesis, however, I had to show that a counter-effect to the shift was taking place. Thus, this talk will focus on exploratory data analysis and I hope to be back on this stage next summer to discuss my extensive findings.
Slumps? The Yips? A nose-dive from All-Star performance to minor league demotion? The issue may be problems at home, pressure, expectations, and the ever-present fear of failure. In this talk on the mental side of player performance, you’ll hear the story of former pro pitcher Dan Blewett’s biggest implosion on the mound, the Black Swan that explained his epic meltdown, along with factors to keep in mind when analyzing player performance.
With data making its way into every decision in baseball, one area it hasn’t gotten to is player makeup. How does a player respond to adversity? How does a pitcher respond from a 3-0 count? How does a hitter respond after getting down 0-2 in the count? What does a player do in his next at-bat after making an error on defense? Looking at data from the past 10 seasons, I will focus on a few different scenarios where players face adversity. I will then identify all the possible outcomes for each scenario to determine if you can predict how a player will handle adversity.
This talk presents a student’s experience as a data analytics intern with the Central College Baseball team. Central College is located in Pella, IA. The baseball team competes at the DIII level in the American Rivers Conference. This presentation covers how the intern used data to help player development, decision making, and in-game performance for the program. Examples of data include information from baseball technology like Rapsodo, advanced metrics like wOBA, OPS, and runs created, as well as pitch sequencing. This presentation also covers how a small college or high school program can still benefit from analytics without having the resources of a D1 or MLB team.
A model was created in 2017 to generate similarity scores for comparing Major League players. Using a similar model adapted to available college data, I am developing a method to compare college players based on their career performance/tendencies (ISO, OBP, SB, K%, BB%, etc.). My goal is to see if certain collegiate play styles align as they progress to the Major League level. Taking similar players based on scores, I will match up their trajectories to see if how they perform relative to each other at various stages of their development. Ideally, this could be a useful tool to help evaluate the future value of a college player.
Making optimal pitching changes throughout a game, series, or season can heavily influence winning and losing. We want to examine how changing pitcher profiles significantly throughout various stretches of play has affected hitter performance and game outcomes. In a hypothetical scenario (one of many), we will examine how starting a hard-throwing righty and following him up with a soft-throwing, deceptive lefty has affected both individual hitters and opposing lineups as a whole. We theorize that we can contribute meaningful insights toward in-game strategy, roster construction, professional and amateur scouting, and player development. Our analysis will statistically prove or disprove our theory. We will provide case studies to illustrate our findings.