Baseball is as sport where it’s pretty easy, compared to other sports, to measure the difference in ability between teams. A baseball match consist to a large extent of the duel between the pitcher and the batter. The other players (fielders) play a role in deciding the outcome, but that role is pretty minor compared to other sports. Because the game is in an on/off- mode between every pitch, statistical documentation of even the smallest details within the game is possible. When the batter hits the ball there is a lot less happening on the field (and the measurement is easier) than for example in a football match, where the good/bad performance of a player can be very dependent on how his teammates or the opposing players are performing.
The MLB-teams play 162 regular season games in a season. Because of the nature of the game there is a lot of surprises in individual games. Through the season the best and worst teams standout, even if the differences are usually pretty small. The best teams wins about 60 % and the worst 40 % of their games. Even if a large part of the outcome of an individual baseball game can be explained by luck and randomness in general, it is possible to find a number (probability) for the home-/away team that tells how many games the home-/away team would win, if the teams would face each other enough times within the same game environment (lineups, stats, weather etc.). By comparing this number to the market odds, it’s possible to measure if there is some value in betting on the game.
Many years of research has led to the baseball community being able to recognize which factors are important for measuring the number of runs scored and allowed in a baseball game. These factors are nowadays no secret for the public. There are a few (good) ways to make estimations, the most commonly known is probably OBP (on-base-percentage), which correlates very well with runs scored. The book/movie Moneyball made the stat known to the large public. Another good stat for batters is SLG (slugging %). When trying to measure runs allowed and pitcher ability K (strikeouts), BB (bases-on-balls) and HR (home runs) are widely used stats.
In my own analysis I only use stats and numbers which relationship and effect on my model I fully understand. I found this very important because every now and then there is going to be a situation where you have to critically re-evaluate your model(s). If you don’t fully understand the relationships within your model(s), it is very difficult to find possible problems and errors. I also find it very important to know the sport that is being handicapped. If this is not the case it’s difficult to understand which different parts of the game affect the result.
The probability of different outcomes My handicapping method is based on judging the ability of individual players. I calculate the probability for every possible outcome when a player is hitting or pitching. The different outcomes are static in the beginning of the season and change depending on the player’s performance throughout the season. The possible outcomes are 1B (single), 2B (double), 3B (triple), HR (home run), BB (bases on balls/walk), IBB (intentional bases on balls), HBP (hit by pitch), K (strikeout) and GO/FO (groundout/flyout). For example Buster Posey of San Francisco Giants has the following probabilities against a starting pitcher with average ability in the beginning of season 2014:
I am also taking into account home and road splits and handedness splits for each player. I also use park factors to adjust the stats, because every ballpark has its own characteristics. For example San Francisco Giants AT&T Park is a pitcher friendly ballpark and the Colorado Rockies Coors Field a hitter friendly ballpark. The differences stem from the weather conditions and the size and form of the field.
When player probabilities for the different outcomes are set I run 100000 simulations of the game that is being estimated. The result is a matrix, from which the probabilities for home and away victories and different results can be read. Below is an example of a matrix, from which you can see that the probability for an Under 7.5 bet is 61.6 % (61608/100000). In other words there where 61608 out of 100000 simulated games where there was less than 7.5 runs scored.
Implementation My database is updated once a day. After the update I run the calculations for the upcoming games. A big part of my handicapping is fully automatized. The testing and programming has taken some time but I am very satisfied with the result. I haven’t come up with some revolutionary stuff but what I‘ve done though, is invented efficient methods of handling data and analysing the game.
The goal with the analysis is to find mispriced odds and situations, where history is going to repeat itself. In other words I try to find value (as every handicapper does). I compare my projections with Pinnacle Sports odds and if there seems to be some value, I look at the game once more and finish my analysis and projections.