But all is not lost, just because the bookmakers are as good as, or maybe even better than you at predicting game outcomes, there are still ways to at least identify what constitutes a safe bet and what constitutes a poor and or risky bet by analysing the bookmakers information.
The following takes a look at how good the bookmakers are a predicting games, and takes a look at opportunities for potential profit making staking strategies using statistical analysis of the bookmakers odds for winning and losing predictions
Bookmakers probability distribution
Now let’s have a look at how the the bookmakers probabilities compare between the false positives (when they picked a team to win and they lost) and the true positives. This is going to tell us how well calibrated the probabilities are and will be useful for determining appropriate staking strategies
Fist let’s take a quick look at the summary statistics for the normalised probabilities for true positive events and false positive events:
For true positive events:
bTP <- subset(bp,bp$TP ==1)
#save third quartile for use later
bpTPq <-quantile(bTP$bookie_back_prob_norm, c(.75))
print (summary(bTP$bookie_back_prob_norm))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5010 0.5870 0.6470 0.6561 0.7070 0.9380
For false positive events:
bFP <- subset(bp,bp$FP ==1)
#save third quartile for use later
bpFPq <-quantile(bFP$bookie_back_prob_norm, c(.75))
print (summary(bFP$bookie_back_prob_norm))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5010 0.5630 0.6180 0.6267 0.6740 0.9120
And lets take a look at that visually in a simple box-plot:
p2 <- ggplot(bp,aes(factor(TP),bookie_back_prob_norm))
p2 <- p2 + geom_boxplot()
plot(p2)
Great! we can see straight way that the probabilities are well calibrated because the mean of the probability of the false positive events is lower than the mean of the probability of the true positives. This means that even though the bookies have mis-predictions (false positives) their implied probability is lower for these events (which is really good! (for them – not so much for the punters)).
We also observe that the third quartile for false positives is 0.67 (equivalent to 1.49 odds) and that there are a large number of true positive >0.67. Now this is important; this means that when the bookmaker gives odds odds on which are >0.67 implied probability (<1.49 odds) then in the overwelwing majority of cases, the bookmaker will accurately pick the winner. We can actually calculate the bookmakers precision in this region Which is 75%!!. This means if you bet on anything with odds <1.49, then you should win ~75% of you bets:
bp75 <- subset(b, b$bookie_back_prob_norm> bpTPq)
#add precision column
bp75$precision <- sum(bp75$TP)/(sum(bp75$FP) + sum(bp75$TP))
#report the overall precision
round (max(bp75$precision), digits=2)
## [1] 0.75
If you want to look at it from an odds perspective, we can create the same box-plot; but just with odds on the y axis instead of the probability. Note that the plot should be reversed because higher (IE longer odds means lower probability). I personally prefer looking at it from a probability point of view but in some scenarios it might be good to look from and odds point of view. Lets see:
p3 <- ggplot(bp,aes(factor(TP),bookie_back_odds_norm))
p3 <- p3 + geom_boxplot()
plot(p3)
Yes the box-plots are reversed but you can make the same inference as above. That if odds on are <1.49 then in the majority of cases the bookie will be correct!
But what does it all mean??
So what does all this mean for us?? Well for one the bookmaker is going to be damn hard to beat! They have well calibrated probabilities a precise predictive model, and on top of this have a little lea-way due to the overround/vig. But like I said in the introduction all is not lost; just in doing this simple exercise we have learnt a great deal about the bookmakers. And we can try use this analysis to help us out in our own ‘informed’ gambling. So lets review what we know and think about how we can use it logically.
Ok, so we know that the bookmaker predicts winners correctly about 75% of the time when odds on are <1.47. This means that if we pick games when the odds are <1.47 then we also have a 75% chance of a return on a bet… but does it mean we will make a profit? We should just be able to place bets on anything less than odds of <1.47 and get a return right?
Lets see. I will quickly make a simulated bet with the logic of staking on everything which is >0.67 in probability (equivalent to <1.47 odds):
#create a new dataframe using bp for simple simulated staking
bss <- bp
#create a 'stake' column and just set it to a fixed staking value
bss$stake <- 10
#find the 3rd quantile of the bookie probability
bssFPq <-quantile(bFP$bookie_back_prob_norm, c(.75))
bssTPq <-quantile(bTP$bookie_back_prob_norm, c(.75))
#create stake column using logic
bss$stake <- ifelse(bss$bookie_back_prob_norm > bssFPq , bss$stake,0)
#now simulate profit/loss on the stake amount
bss$profit <- ifelse (bss$TP == 1, (bss$stake*bss$bookie_back_odds-bss$stake),-bss$stake)
#create a total profit/loss cloumn
bss$profit_total <- sum(bss$profit)
#now create a cumulative profit for graphing
bss$profit_cum <- round(cumsum(bss$profit),2)
#print the amount of profit we would make
print(max(bss$profit_total))
## [1] -284.6
Bummer!! that’s -$-284.6, so we actually made a loss. Seems its not so simple after all!! So what went wrong here? Well, there must be enough outliers in the False positive region to offset our small profits.
Lets take a look at how many false positives there are which are > 0.67
sum(ifelse(bss$bookie_back_prob_norm >bssFPq & bss$FP ==1, 1,0))
## [1] 120
So there a 120 of these occurrences. Ok so that means straight up we are going to lose $1,200 ($10 stake * the amount of false positives).
Now lets count the number of occurrences of True positives which are >0.67 probability
sum(ifelse(bss$bookie_back_prob_norm >bssFPq & bss$TP ==1, 1,0))
## [1] 338
Ok, so there are 338 of these. That seems like a lot more, but obviously not enough.. lets see why. The average of the odds for the bookmaker above >0.67 probability is 1.28:
mean(subset(bp$bookie_back_odds,bp$bookie_back_prob_norm >bssFPq))
## [1] 1.276114
So on average our profit on all our bets above >0.67 is (on average) going to be $946:
# (1.34 * stake * number of bets) - stake * number of bets
(1.28 * 10 * 338) - (10 * 338)
## [1] 946.4
So we can see straight away that we are going to make a loss because this value ($946) is smaller than than the value from false negatives at the same threshold ($1,200)
We can use this knowledge to create a simple formula to check if we can profit from any sort of ‘short odds’ betting. We can use:
((TPd*TPb)-TPb) - FPb
where
- TPd is the average decimal odds of the true positives above/within the threshold
- FPd is the number of False positive bets above/within the threshold
- TPb is the number of True positive bets above/within the threshold
So we could reduce our previous staking simulation to: (1.28*338)-338 - 120
((1.28 *338)-371)-131
## [1] -69.36
Which is -69. Since this is negative we aren’t going to make any sort of money staking like this because the number of false postives outwieghs the number of true postives times the odds of the true postives
Right! so is there anything that might work? Well when we looked at the box plot there looked to be as many outliers in the False positive range above 0.67 as there were True positives in this range. So what about if we just use the range between the 3rd quartiles (75th percentiles) of the false positives and true positives?
p2b <- ggplot(bp,aes(factor(TP),bookie_back_prob_norm))
p2b <- p2b + geom_boxplot()
p2b <- p2b + geom_hline(aes(yintercept=bpFPq), col = "coral", linetype = "dashed", size = 0.8)
p2b <- p2b + geom_hline(aes(yintercept=bpTPq), col = "coral", linetype = "dashed", size = 0.8)
plot(p2b)
Sounds like a pretty simple strategy, maybe to good to be true so lets quickly check how it performs. Just using a really simple fixed staking calculation on the data using the logic that we will bet $10 on every game which has an implied probability of between ~0.67 and ~0.71:
#create a 'stake' column and just set it to a fixed staking value
bp$stake <- 10
#save 75 percentile (3rd quartile of true postives and false postives)
bpTPq <-quantile(bTP$bookie_back_prob_norm, c(.75))
bpFPq <-quantile(bFP$bookie_back_prob_norm, c(.75))
#set up basic staking logic
#if the bookie back odds are between 3rd q of the FP and 3rd Q of the TP then we are going to stake $10 otherwise we are not going to stake ($0)
bp$stake <- ifelse(bp$bookie_back_prob_norm > bpFPq & bp$bookie_back_prob_norm < bpTPq, bp$stake,0)
#now simulate profit/loss on the stake amount
bp$profit <- ifelse (bp$TP == 1, (bp$stake*bp$bookie_back_odds-bp$stake),-bp$stake)
#create a total profit/loss cloumn
bp$profit_total <- sum(bp$profit)
#now create a cumulative profit for graphing
bp$profit_cum <- round(cumsum(bp$profit),2)
#print the amount of profit we would make
print(max(bp$profit_total))
## [1] 42.2
Yay!!!! All that hard work and we got $42.00!! On my way to money town oh yeah!!
Ahem… back to reality. Lets chart that baby up and see what it looks like..
p4 <- ggplot(bp, aes(match_id,profit_cum))
p4 <- p4 + geom_path(aes(col=profit_cum), size = 0.5)
p4 <- p4 + geom_hline(aes(yintercept=0.0), col = "red", linetype = "dashed", size = 0.8)
plot(p4)
Not so pretty… BUT what really stands out is that profit never dropped below 0… now that is interesting…
So it appears that betting in this region is profitable (just). Lets just explore a little bit further. The total amount staked across all seasons was $42:
sum(bp$stake)
## [1] 1500
So if we made $42 then return on investment (ROI) is ~3%:
(sum(bp$profit)/sum(bp$stake))
## [1] 0.02813333
That is we invested $1,500 over the term and ended up with $1,542 (profit $42) for a total return of…. ~3%… amazing.. (I am being sarcastic, BUT making any profit as you will come to know is actually quite a challenge)
So how many bets did we actually make?
sum(ifelse(bp$stake ==10, 1,0))
## [1] 150
184 bets! That’s not many…, how many games did we forgo betting on…
sum(ifelse(bp$stake ==0, 1,0))
## [1] 1242
1,242! Ok so we ended up betting and profiting small on a low number of games, but came out on top. What was our staking accuracy??
Its the number of bets we placed and had profit over the total number of bets we placed:
(sum(ifelse(bp$profit >0, 1,0)))/
(sum(ifelse(bp$stake >0, 1,0)))
## [1] 0.7466667
74%!! well that’s pretty damn good!