Abstracts

An Exploration of the First Pitch in Baseball

by Ashley Spangler




Institution: Bowling Green State University
Department:
Year: 2017
Keywords: Statistics; baseball; statistics; swing; pitch count; count; logistic model; first pitch
Posted: 02/01/2018
Record ID: 2154928
Full text PDF: http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1490300154782369


Abstract

Sabermetrics is the statistical analysis of baseball.This research was started in the 1950s and since then has becomeincreasingly popular. Over the last couple of years, theavailability of data within the sport of baseball has exploded.From mainly three sources, we have access to a vast arrange ofstatistics. This research investigates the importance of count andthe first pitch in baseball. The first pitch determines whether thehitter or the pitcher has the advantage in the at-bat and can setthe precedence for the rest of the at-bat. Exploratory methods areused to investigate and summarize the relationships between variousvariables through the use of tables, contour plots, scatterplots,and line graphs. As the pitchers thrown first pitch strikepercentage increases, the number of innings pitched per gameincreases, Walks per Hits per Innings Pitched (WHIP) decreases,walk percentage decreases, and strikeout percentage increases. 64%of the first pitches thrown are either four-seam or two-seamfastballs or sliders, which are all fast pitches. Over 50% of thefirst pitches are in the strike zone. Singles, doubles, triples,and homeruns are more likely to be hit on the first pitch. Pitchershave the highest pitching statistics when the hitter swings andmisses compared to putting the ball in play, a called strike, or aball on the first pitch. When the first pitch is a ball, thehitters have the highest hitting statistics. Generalized AdditiveModels (GAM) and Logistic Regression Models are used to discoverthe factors significant in predicting the probability that hittersswing. Logistic models were created for all pitches and then firstpitches for all players. Next, four logistic models were createdfor four different players. In the majority of the models, counttype (whether the count favored the pitcher, hitter, or wasneutral), the distance in feet of the pitch from the center of thestrike zone, and if runners were on base or not were significant inpredicting the probability of swinging. Overall results suggestthat hitters have different hitting strategies and swinging on thefirst pitch, or in general, depends on the hitter. More maturehitters tend to not swing on the first pitch.Advisors/Committee Members: Albert, James (Advisor).