Saturday, August 5th

9:00 - 9:10

Conference Open

Chuck Korb

Welcome to Sabermetrics, Scouting, and the Science of Baseball, our seventh annual charity baseball research conference.

This year, we are proud to support the Angioma Alliance, an organization by and for those affected by cavernous angiomas and their loved ones, health professionals, and researchers.

We hope you have a fun time, learn some new things, think differently about a problem, and meet some great people.

9:10 - 9:25

Pitch Tracking
Labor Markets

The Strike Zone Expansion: Which Players Were Affected Most?

Brian M. Mills

Recent work has shown that relatively precise ball tracking technology used to monitor MLB umpires has resulted in considerable improvement in ball-strike call accuracy at the expense of total offense. Given that players tend to have heterogeneous skillsets, these changes may impact players differentially, potentially resulting in lower salaries for some. If so, union officials have good reason to demand a role in negotiations on enforcement of policy largely external to players themselves. This work therefore uses pitch-level data on umpire ball-strike calls to estimate changes in batter productivity stemming from changes in the called strike zone. I begin by estimating pitch location-specific differences in run expectancies with the 2008 MLB season using a generalized additive model. Pitch-level error terms are then aggregated at the player level to identify deviation in performance from the league-level expectation across pitch locations. Finally, these aggregations are applied in the context of the changes in the strike zone and shifts in pitch location choices in subsequent seasons to arrive at expected performance and salary impacts for individual players. This process will be presented visually and with concrete examples of specific players most affected by these changes for the broad audience.

9:25 - 9:35

Visual Tracking
Batting Performance
Manual Control Skills

Visual Tracking: The First Step in Batting

Dorion Liston (neuroFit, Inc.), Raine Chen (Hong Kong University), and Li Li (Hong Kong University & NYU Shanghai)

The batting process requires tight coordination between visual and motor capabilities. First, the visual system must encode the motion of the baseball, using retinal image motion combined with high-level cues to construct a percept of pitch trajectory. Second, a high-level decision must be made about whether to swing. Third, given an affirmative decision, the motor system must drive a smooth batting stroke to contact the ball. Although these capabilities blend together seamlessly in a well-practiced batter, recently-developed assessment technologies can quantify the component capabilities that support this process. Here, we address three questions: 1) whether professional baseball players have superior visual tracking and manual control capabilities as compared to non-athletes, 2) whether experience contributes to visual tracking ability, manual control performance, and batting performance, and 3) whether visual tracking and manual control capabilities correlate with batting performance.

Methods. We collected data from professional baseball players from Hong Kong leagues (n=44, 27 females, 3-30 years’ experience) and demographically-matched non-athletes (n=47, 27 females). First, we used a visual tracking task (Liston & Stone, 2014, Journal of Vision 14:1-17) requiring smooth eye movements to pursue motion that varied from trial to trial (speed: 16°/s-24°/s; direction: 0°-360°). Next, we used a manual control task (Li et al., IEEE Trans on Systems, Man, and Cybernetics A 36:1124-1134) in which participants used a joystick to center a randomly-moving (sum-of-sines motion: 0.1–2.19Hz) target. Last, to test whether visual tracking and manual control performance predict batting performance, we measured infield batting for a subset of players (n=23, all females, 3-18 years’ experience).

Results. Our sample of professional baseball players showed superior visual tracking and manual control capabilities than non-athletes, as well as tighter coordination between visual and manual capabilities (Pearson’s r=0.49, p<0.001) than we observed in non-athletes (Pearson’s r=0.19, p=0.10). Across all levels of experience, hitting showed a weak positive trend with both visual tracking and manual control; subdividing the players by experience (9 or more years) revealed strong correlation for more-experienced (Pearson’s r=0.68, p<0.05) but not less-experienced (Pearson’s r≤0.29, p≥0.45) players, with visual tracking being more predictive of hitting performance than manual control.

Conclusions. While professional baseball players excel in all aspects of visual and manual tracking relative to non-athletes, their visual-motor skills (e.g., batting, manual control) develop with experience. For the first time, our data show how batting performance develops within the dynamic visual abilities of pro ballplayers.

9:35 - 9:50

Pitch Tracking

A Jukebox Can Play Joe West

Harry Pavlidis (Pitch Info, Baseball Prospectus)

Who wouldn't want a perfect strike zone? Everyone who watches baseball knows the "K-Zone" is ready to replace the fallible humans behind the plate.

Or not.

9:50 - 10:20


A Q&A with Ben Cherington

Ben Cherington (Toronto Blue Jays)

A Q&A discussion with the former Red Sox General Manager and current Vice President of Baseball Operations for the Toronto Blue Jays.

10:20 - 10:50

Sports Medicine
Tommy John Surgery

The True Relationship between Pitch Velocity and Elbow Stress: A Biomechanical Study

Glenn Fleisig (American Sports Medicine Institute)

Analysts and reporters have suggested that the increasing rate of elbow injuries in professional pitchers is due to the documented increase in average fastball velocity. In this presentation, Dr. Fleisig will explain the motions and torque leading to common elbow problems (such as Tommy John injury), and explain how elbow torque can be measured with motion capture. Then, results will be presented from a new biomechanics study testing the link between fastball velocity and elbow torque. Results from this study can lend insight as to whether high velocity increases the risk of elbow injury.

10:50 - 11:00

Machine Learning
Neural Networks

Using Recurrent Neural Networks to Predict Player Performance

Kiri Oler

We’ve all heard the saying, “Past performance is the best indicator of future performance.” There’s room for debate here. On a basic level, the argument rings true, though perhaps oversimplified. What is surely an oversimplification is assuming future performance will be exactly the same as past performance. Thus, we try to account for aging curves, regression to the mean, recovery from injury, and any number of other human variables in play. However, human beings are complex; so complex that even naming all possible variables is a large task, and quantifying them for analysis is even more staggering. So why try? I am not suggesting we give up, but rather turn the task over to a mechanism not limited by human perspective, understanding, and bias. Recurrent neural networks (RNNs), a branch of machine learning, provide a means for predicting the future based on what happened in the past. RNNs have been used successfully for image processing, language prediction, etc., though they are best suited for sequential data, say data occurring over the course of a season, or career. This particular application of RNNs feeds the neural network vectors representing statistics for a single player during a set time window. The network is trained by showing it player vectors in succession over their full career, allowing it to pick up on patterns impossible to detect via simple human observation. Once the neural map has been trained, it predicts the future statistical performance of players based on the patterns existent in their previous statistical performance. This presentation will report on the effectiveness of RNNs in predicting future performance, discuss the optimal combination of statistics to include in the player vectors, and consider the appropriate time window of data for each vector.



Changing Gendermetrics in Baseball Front Offices

Jean Afterman & Tyler Tumminia

Yankees AGM Jean Afterman & Tyler Tumminia lead a panel discussion.



An Introduction to the Rapsodo Pitch Tracking System

Kelvin Yeo & Seth Daniels

The creators of the Rapsoso Pitch Tracking system will give a brief technical talk on how the device captures information useful to developing pitchers.




Live Demo

A Live Demo of the Rapsodo Pitch Tracking System and Motus Sleeve

Kevin Vance (URI), Dave Fischer, Seth Daniels (Rapsodo), Kelvin Yeo (Rapsodo) & Will Carroll (Motus)

A live demo of the Rapsodo pitch tracking system and Motus Sleeve.



The Physics of the Infield Bounce Throw

Andrew Dominijanni

Throws from infielders to first base are often bounced, whether intentionally or unintentionally. In a fraction of a second, the infielder decides how to get the ball to the first baseman in the shortest time possible. At every location on the infield, for a given possible throw release speed, it is assumed that a binary choice is made between bouncing a throw and reaching the first baseman on the fly. The range of possible initial throw angles for each type is determined by a reasonable vertical range of the first baseman's glove when receiving the throw. In this study, physical models of the ball in flight as well as the ball bouncing on the infield surface are used to simulate a range of release speeds at many different fielding locations. A mean time from release to glove is associated with each throw type and is used to determine the optimum choice for each scenario. The situations where a bounce throw may be most advantageous are discussed, and a method to empirically validate the model simulations is suggested.



Statcasting Sac Flies

Julia Prusaczyk (MLBAM)

Baserunners often need to decide if they should advance on a sacrifice fly. Runners must consider both their own skill as a baserunner and the skill of the outfielder who makes the play. Based on the speed of the runner, the arm strength of the outfielder, and the outfielder's exchange talent (quickness of release and amount of feet he can cover during exchange), we can estimate the breakeven point on a sacrifice fly: when do the baserunner and ball meet exactly at home? Statcast has established a "scrimmage line" setup from which the runner must go through a go/no-go decision-making process. This has also led to the development of a new Statcast metric, Exchange Distance, which measures the ground distance the outfielder travels during the exchange period.



Q&A with White Sox General Manager Rick Hahn

Rick Hahn

Rick Hahn answers your questions about his experience as the White Sox General Manager.


Cross Validation

Prediction: Motivations, Problems and Methods

Katherine Louise Evans (Harvard University)

Often in baseball analytics, and sports statistics in general, analysts are asked to perform a variety of predictions. The type of prediction can vary quite a bit. For example, an analyst may be asked to predict if a pitch will be called a strike, how many games a team will win in a season, or whether a draft prospect will be successful in his career. All of these predictions have different motivations and thus should be approaches in different ways. A model for predicting strikes can account for more than just the location of the pitch, such as the stance of the batter or which umpire is behind the plate. Understanding how those variables can affect the probability of a strike can influence in-game pitching decisions. Therefore it might be preferable to use a prediction model that allows the analyst to clearly quantify the relationships between the dependent and independent variables. Furthermore, analysis can be narrowed to focus on the influence of a single variable. By contrast, predicting the success of a draft prospect is more focused on the outcome of his success rather than understanding precisely what predicts that success. Quantifying all the variables that predict success may not be as important as understanding who the best available player is. In such a situation, it more important to have a prediction model that is highly accurate than it is to have a model which is easily interpreted. Data adaptive ensemble learners can combine numerous models to increase precision. Regardless of the model, it is important to have rich data sets to account for as many sources of variation as possible and to implement cross-validation in order to avoid over fitting.

In this presentation I will further discuss these, and other, scenarios along with potential methods with which to proceed.


Deep Learning
Big Data

The application of deep learning to the analysis of baseball statistical data

Aaron Goldenberg (QuantRiskTrading)

With the advent of open source libraries for creating neural networks and the ease of use of cloud computing, using deep learning techniques to solve statistical problems in baseball has become easier than ever. This talk will focus on applying these techniques to achieve more accurate forecasting.





The MLB Prohibited Substances List: One Chemist's Perspective

Stephanie Springer

This presentation will provide a basic overview of the principles underlying mass spectrometry, and how a change in the analysis of samples may have resulted in an increase in positive tests for PEDs in 2016. The basics of human metabolism and the formation of metabolites will be covered, particularly as it pertains to the anabolic steroids and the amphetamine derivatives on the prohibited substances list. The presentation will touch upon the mechanisms of action of other substances on the list which fall outside the scope of anabolic steroids and amphetamines. If there is interest, the presentation will conclude with either new upcoming analytical techniques that might be used in the future, or a quick look at the health risks of taking a prohibited substance for an extended period.


Strike zone

Finding One Strike Zone Effectiveness Metric to Rule Them All

Jason Rollison and Jeremy Wyne (Pirates Breakdown)

Our research into strike zone efficiency began after longtime Pittsburgh Pirates pitching coach Ray Searage let slip to media that he would be instructing his charges to pitch “up in the zone” more frequently in 2017. As the Pirates have long been ground-ball enthusiasts obsessed with keeping the ball low, this struck us as peculiar to say the least. We then attempted to find a way to definitively measure a pitcher’s effectiveness in any particular part of the strike zone – in this case, the upper third. Using complete PITCHf/x data for 2016 and utilizing the least squares regression method, we developed a statistic to measure effectiveness. ZEF, or Zone Efficiency Factor (working titles), aims to roll up several important and common descriptors of pitcher performance into one universal weighted metric.


Sports Medicine

Saving the Pitcher, 2017: A Data Driven Approach

Will Carroll (Motus)

A presentation of recent experiments with the Motus Sleeve, and a discussion of updates to the technology.


Surplus Value

An Empirical Look at Surplus Value Calculations for Prospects

Dan Rausch

Over the past few years, the concept of Surplus Value has been popularized as the industry standard for determining the value of big league players relative to their remaining period of team control and the salary cost of those years. Effort has been made to extend this concept to prospects, with contributions made by Jeff Zimmerman in the Hardball Times, and Kevin Creagh and Steve DiMiceli at The Point Of Pittsburgh blog. In particular, their methodology looked at the historical results of prospects at various ranking levels, and based on that they were able to estimate a future Surplus Value for a prospect with a particular ranking.

This presentation will look at this valuation system empirically in an attempt to determine its accuracy. The methodology is to look at dozens of trades over the past several years and see if the Surplus Value - as calculated by their methods - accurately predicts the trade packages in real life. What I found is that the Surplus Value of prospects, using their methodology, is significantly over-estimated using their methods, at least from the standpoint of trade currency. I will explore some potential explanations for this discrepancy, and propose some modifications to their calculations to better reflect the empirical results.


Count Management

Battling Back: Which Hitters Can Flip an 0-2 count, and which Pitchers Can Flip a 3-0 count.

Matt Petitt

The terms “hitters count” and “pitchers count” have been around since the inception of baseball, but recently analysts have been able to prove the incredible significance that a the ball/strike count can have in determining the potential run value of an at bat. As the below plot shows (originally created by Professor Jim Albert), the average value in terms of runs is significantly higher when hitters are ahead in the count (i.e. more balls than strikes), and lower when pitchers are ahead in the count (i.e. more strikes than balls).

This understanding of how certain counts impact run values (either advantageously or not) is at the core of the value of pitch framing, which is arguably the singular concept that has had the greatest impact on sport in the past ten years. But no one has measured the concept of a hitter/pitcher’s ability to “battle back,” or turn a count that on average is disadvantageous into a positive outcome.

Using historical data from Retrosheet I am going to analyze which hitters are best at turning an at bat that starts with a no balls and two strikes count into either a walk, or a base hit. Simultaneously, from another view I am going to analyze which pitchers are best at turning an at bat that starts with a three balls and no strike count into an out. Ultimately I want to determine if the ability to “battle back” is a repeatable skill over a player’s career, or simply random. Additionally, I will also examine whether or not a player’s battling back ability correlates to their overall true talent, or if players who are not particularly noteworthy based on any other metric can excel at “battling back.”



Reexamining Runs Saved by The Shift

Joe Rosales (Baseball Info Solutions)

The use of defensive shifts has increased ten-fold since 2011 before finally leveling off this year, yet the question of its effectiveness continues to be raised. Some question whether shifting has ever been effective at all, while others debate whether teams have gone too far on not far enough. Baseball Info Solutions has been studying the effectiveness of defensive shifts in great depth for about a decade, and we recently took a new look at how we quantify how many runs teams save with their shifting. By accounting for factors such as the location and velocity of a batted ball, the speed of the batter, and the types of plays used as a basis for comparison, we have developed a more refined method for quantifying exactly how effective defensive shifting has been.


Replacement Level

Analyzing the Risk and Return of Baseball Players: An Application of Asset Pricing Theory

Rachel Heacock (University of Virginia; The Hardball Times)

This research is an application of Asset Pricing Theory – more specifically, the Capital Asset Pricing Model (CAPM) – to assess the value of a Major League Baseball Player. The use of this model allows the financial concepts of systematic risk and return to be practically applied towards the evaluation of a player’s worth, both to the success of his team and the financial health of his organization. Using salary data from Lahman’s Database, performance data from FanGraphs, and other previous research, the variance in individual player performance and lineup performance is compared to the variance in the particular team’s performance overall. This comparison produces a coefficient (β) measuring the risk of a player or lineup. To apply the CAPM, a risk-free rate of return is identified, which is easily translated as the return on a replacement level player. The slope of the line between the point replacement level β (0), replacement level return and the point team β (1.0), team return defines a new “replacement level” specific to the particular team in question.* Whether a player or multiple players fall above or below the line can signify an imbalance in risk and return. The CAPM formula produces the expected return on a player/lineup relative to risk and can indicate when these may no longer be worth the risk. When applied properly (both with the use of other supporting evidence and when the relationship between player and team performance is significant), this research can aid teams in reconsidering their investment in players or making decisions about playing time and player contracts. It may also be relevant in assessing player and team cohesion. In addition, it introduces the idea that each team has a specific “replacement level” that varies from the theoretical, league wide concept of replacement level. Though there is controversy over the CAPM in finance, its application in this case controls for some of its issues in asset pricing.



Live Recording of Effectively Wild

Ben Lindbergh (The Ringer), Jeff Sullivan (Fangraphs), Dickie Thon

A live recording of the popular baseball podcast Effectively Wild, including a special guest appearance by ex-MLBer Dickie Thon.