NRL 2016 predictive model results to date (r05)

NRL 2016 predictive model results to date (r05)

Introduction

This post is to summarize the 2016 results to date (round 05) and compare this to the bookmakers predictions. Considering that round 6 is now going to be my first betting round (in lui of round 5 - purely due to the minimal time I have had to work on this), I thought it important to track how we are going to date. I am also going to discuss the results of a model re-run since the results of round 4 and 5 were less than favorable.

Summary of results to date (r05)

We observe from the summary table/s and figures below that:

  • overall model accuracy/precision to date is 56%

  • overall bookmaker accuracy/precision to date is 66%

  • the model had higher accuracy than the bookmakers in rounds 2 and 3

  • the models round 4 and 5 accuracy was below 50%

  • Bookmaker precision was very poor (<30%) in round 3

  • In round 5, there were 6 away team wins vs 2 home team wins

  • In round 4 there were and even number of home and away team wins (4 home and 4 away)

  • In all other rounds the number of home team wins outweighed the number of away team wins

  • poor accuracy in round 4 is largely attributed to predictions being based on round 2 results (without knowing round 3 results) and high away team wins (4)

  • poor accuracy in round’s 5 is largely attributed to high number of away team wins (6)

  • round 3 contained a drawn match which is historically rare (there have only been 8 drawn games since the inception of the golden point rule representing 0.5% of all games to date)

Results to date (r05)

Given:

mTP = Models true positives

mFP = Models False positives

mPrec = Models precision (equivalent to accuracy in this case)

bTP = Bookmakers true positives

bFP = Bookmakers False positives

bPrec = Bookmakers precision (equivalent to accuracy in this case)

Table of results

Predictive model vs bookmaker precision by round
round_no season mTP mFP bTP bFP mPrec bPrec
1 2016 5 3 5 3 0.625 0.625
2 2016 7 1 6 2 0.875 0.750
3 2016 4 3 2 5 0.571 0.286
4 2016 3 5 7 1 0.375 0.875
5 2016 3 5 6 2 0.375 0.750

Models average precision

## [1] 0.564

Bookmakers average precision

## [1] 0.657

Plots of results

Discussion of results

We can see from the results that the model performed particularly poorly through rounds 4 and 5 as compared to the bookmaker. This dramatically affected overall precision which was 56% as compared to the bookmakers 66%. After reviewing the model, I determined that the main reason for poor performance through rounds 4 and 5 is due to the high weight of the importance of feature ‘home_away’ in the model. Because the model (in most circumstances) preferentially picks home teams to win, the model suffered from a dramatic increase in away teams winning through rounds 4-5.

We also observe poor accuracy from the bookmaker in round 3 where there were a number of very close games, and a draw. As the model was likely ignoring additional feature the bookmaker was relying on in round 3 (and using home/away as a highly weighted feature) the model performed well by comparison.

Generally, I would have expected model accuracy to be increasing as rounds progress (based on historic analysis), however the importance of home/away in the model was such that it overwhelmed the importance of other features such as summary of results to date, previous match ups etc. Re-running the model with less weight (less feature importance) on home/away confirms that accuracy would be higher by boosting the importance of other features. We have seen that in 2015, the importance of home away teams was much lower than historic (2009-2014) and 2016 may turn out to be similar in this regard.

Re-modelling

Since I didn’t get an opportunity to use round 3 results for round 4 predictions I firstly wanted to re-model this to determine if round 4 accuracy would have been improved. Secondly, because I am generally not satisfied with the average precision to date, and the degradation in precision (we should actually be improving in precision) I wanted to re-model all rounds with an improved predicative model. Specifically, I am concerned that the original model/s place too much significance on the home/away feature and that (like in 2015) we may see reduced significance of this feature in 2016. Therefore I chose to ‘de-weight’ this feature so that it was not an overwhelming predictive feature in the model/s. As you will see, the new model improved average precision to date, generally improved precision across all rounds as compared to the current model, and had slightly better overall precision than the bookmaker

Re-modeling of round 4 results

As stated; round 4 predictions were based of results to round 2. I re-ran the model to include round 3 results, however predictions were the same. I expected the predictions to be different due to the inclusion of round 3 results, however the weight of the home and away feature at this early stage in the model actually mitigated the weight of previous win/loss results. This largely drove low predictive accuracy in rounds 4 and 5.

Re-modelling of all rounds

Re modelling of all rounds shows an overall increase of precision from 56.4% to 67.5% which is slightly higher than the bookmakers overall precision 65.7%. As stated the re-modelling places less weight on the home/away feature and hence mitigates the dramatic decrease in accuracy across rounds 4 and 5. In fact as show below, the precision of the new model is similar in shape to the bookmakers and actually did slightly better in round 3.

Financial modelling

Finally, I wanted to determine how we would have went using various staking strategies to bet on predictions to date. As stated previously one of the main reasons for choosing to wait until round 5 or 6 to place bets was due to historically poor (50-60%) model performance in rounds 1-5. So it is worth finding out if we would have lost money betting in these rounds this year. I was also interested to know if my preferred staking strategy would have mitigated losses (avoided losing bets). i am going to keep it relatively simple and just compare a fixed staking strategy to my preferred staking strategy.

Fixed staking

First up lets have a look at a simple fixed staking strategy. We will use $10 as a fixed bet, and see what happens if we place $10 on every game which we predicted to win. To do this, we simply use the rule that if mTP = 1 then profit equals bookmakers odds * 10, and if (conversely) mFP = 1 then loss = 10.

If we chart this up we see that the cumulative loss to date would have been $49.40.

We can see in the chart split by round that we would have had a steady increase in profit after game 12 through 18 and then had rapid decline through rounds 4 and 5 (due to the poor predictive accuracy in these rounds)

Fixed staking with proability cut-off

Now lets look at what we would have ended up with if we used the proposed Staking Strategy and only bet on those games in which the models perceived probability is greater than the bookmakers implied probability and on the condition that the perceived probability is greater than the average of the models false positive perceived probability (FPpp). For reference in this case the FPpp threshold was ~0.67.

If we do this, unfortunately, the number of games which satisfied the condition are limited (7 out of 40 games), which means the cumulative profit was static (stayed the same) across many games. This can be observed in the below chart. In addition, we can see that we make steady cumulative profit from the first 3 bets (in round 5, 8 and 15) which is wiped out by a single loss.We see the same pattern from game 23 where we make small profit right up until game 40 (picked bulldogs to win against Canberra) on which we lost, wiping out all profit. In the end this staking strategy lost $0.40 (essentially broke even). It has highlighted a key issue with this strategy which I discussed in Bookmaker Analysis; that is, because we are betting on games which are short odds, our profit from bets is small, and single losses dramatically affect the total profit.

To put it another way, using this strategy we actually won on 5 of 7 bets (71% staking accuracy) but just two losses were enough to reduce the to essentially 0!

Finacial modelling of new model

Finally, knowing that our originally model predictions (as posted) would not have achieved profit using a fixed staking method, and would have only broke even using the proposed staking strategy, lets take a look at how the improved model would have fared..

We can see from the graphs below that the both the fixed staking and the probability cut-off (preferred) staking methods yielded profit with the new model. In fact the preferred staking method did not suffer a single loss (staking accuracy of 100%) and had a return on investment (ROI) of 26%. We can also see that the number of bets was small (again 7 out of 40 available) and through rounds 3 and the 4, a very limited number of bets were recommended (mitigating poor round 3 results in the new model).

Despite this; the fixed stake method yielded higher profit simply because the accuracy of the new model was higher than the bookmaker, and hence benefited from long odds bets.

Conclusion

I have shown that my strategy to wait until round 6 to make a bet was a good choice, as the original model suffered from low overall accuracy (56% to date). Despite this I have also shown that my preferred staking strategy had high accuracy (71%) and was able to mitigate most losses, however would have only broke even in terms of profit. I have also shown that my latest model would have produced much better results through rounds 1-5 (67.5% accuracy) and we would have made a profit no matter which staking strategy used. I am going to utilize the latest model for round 6 predictions and at this stage, I will utilize the risk adverse preferred staking strategy.

Appendix - Summary of all predictions to date

round_no game_no tip_text result
2810 1 1 Brisbane Broncos to win against Parramatta Eels while Away Correctly picked
2811 1 2 Manly Sea Eagles to win against Canterbury Bulldogs while Home Incorreclty picked
2813 1 3 Canberra Raiders to win against Penrith Panthers while Home Correctly picked
2816 1 4 New Zealand Warriors to win against Wests Tigers while Away Incorreclty picked
2817 1 5 North Queensland Cowboys to win against Cronulla Sharks while Home Correctly picked
2820 1 6 South Sydney Rabbitohs to win against Sydney Roosters while Away Correctly picked
2822 1 7 Newcastle Knights to win against Gold Coast Titans while Away Incorreclty picked
2823 1 8 Melbourne Storm to win against St George Illawarra Dragons while Home Correctly picked
2825 2 1 Penrith Panthers to win against Canterbury Bulldogs while Home Incorreclty picked
2827 2 2 Brisbane Broncos to win against New Zealand Warriors while Home Correctly picked
2829 2 3 Canberra Raiders to win against Sydney Roosters while Home Correctly picked
2831 2 4 South Sydney Rabbitohs to win against Newcastle Knights while Home Correctly picked
2833 2 5 Parramatta Eels to win against North Queensland Cowboys while Home Correctly picked
2835 2 6 Cronulla Sharks to win against St George Illawarra Dragons while Home Correctly picked
2837 2 7 Melbourne Storm to win against Gold Coast Titans while Home Correctly picked
2839 2 8 Wests Tigers to win against Manly Sea Eagles while Home Correctly picked
2841 3 1 North Queensland Cowboys to win against Sydney Roosters while Home Correctly picked
2843 3 2 Canterbury Bulldogs to win against Parramatta Eels while Home Incorreclty picked
2845 3 3 Newcastle Knights to win against Canberra Raiders while Home Incorreclty picked
2848 3 4 Brisbane Broncos to win against Penrith Panthers while Away Incorreclty picked
2849 3 5 Gold Coast Titans to win against Wests Tigers while Home Correctly picked
2852 3 6 Melbourne Storm to win against New Zealand Warriors while Away Correctly picked
2854 3 7 South Sydney Rabbitohs to win against St George Illawarra Dragons while Away Incorreclty picked
2855 3 8 Manly Sea Eagles to win against Cronulla Sharks while Home Correctly picked
2857 4 1 South Sydney Rabbitohs to win against Canterbury Bulldogs while Home Incorreclty picked
2859 4 2 Brisbane Broncos to win against North Queensland Cowboys while Home Correctly picked
2861 4 3 Canberra Raiders to win against Gold Coast Titans while Home Incorreclty picked
2863 4 4 Sydney Roosters to win against Manly Sea Eagles while Home Incorreclty picked
2865 4 5 St George Illawarra Dragons to win against Penrith Panthers while Home Correctly picked
2867 4 6 New Zealand Warriors to win against Newcastle Knights while Home Correctly picked
2869 4 7 Wests Tigers to win against Parramatta Eels while Home Incorreclty picked
2872 4 8 Melbourne Storm to win against Cronulla Sharks while Away Incorreclty picked
2873 5 1 Manly Sea Eagles to win against South Sydney Rabbitohs while Home Incorreclty picked
2876 5 2 Brisbane Broncos to win against Gold Coast Titans while Away Correctly picked
2877 5 3 Melbourne Storm to win against Newcastle Knights while Home Correctly picked
2879 5 4 Wests Tigers to win against Cronulla Sharks while Home Incorreclty picked
2881 5 5 North Queensland Cowboys to win against St George Illawarra Dragons while Home Correctly picked
2883 5 6 Sydney Roosters to win against New Zealand Warriors while Home Incorreclty picked
2885 5 7 Parramatta Eels to win against Penrith Panthers while Home Incorreclty picked
2887 5 8 Canterbury Bulldogs to win against Canberra Raiders while Home Incorreclty picked

NRL S2016 R02 Results

Hi guys, sorry this is not going to be a detailed overview, I’m a bit time poor at the moment! So time for a quick wrap of up round 2.

This round we got 5 of 8 correct for an accuracy of 62.5%.

Tips and if they were correct:

  • Penrith Panthers to win against Canterbury Bulldogs while Home
    • (correct? N)
  • Brisbane Broncos to win against New Zealand Warriors while Home
    • (correct? Y)
  • Canberra Raiders to win against Sydney Roosters while Home
    • (correct? Y)
  • South Sydney Rabbitohs to win against Newcastle Knights while Home
    • (correct? Y)
  • North Queensland Cowboys to win against Parramatta Eels while Away
    • (correct? N)
  • Cronulla Sharks to win against St George Illawarra Dragons while Home
    • (correct? Y)
  • Melbourne Storm to win against Gold Coast Titans while Home
    • (correct? Y)
  • Manly Sea Eagles to win against Wests Tigers while Away
    • (correct? N)

To date this gives us a rolling accuracy of 68.8%

In round 2 of 2015,  the model also got an accuracy of 62.5% so we are about in line with historic results.

For those observant I actually input old odds data into the round 2 detailed table and haven’t had a chance to retrospectively updated the post. As indicated before I have been improving the model iteratively and my latest model actually would have achieved a 7/8 result on this round! Unfortunately I couldn’t post before most of you would have put in your tips, but its nice to know the model is getting better (hopefully it will be raring to go for staking rounds!!).

I am working on trying to get a more standardised and automated layout for posting results and tips so please bear with  me until I get this right!

 

 

 

NRL S2016 R01 Results

Summary

Off to a good start! The model predicted 6 out of 8 matches for a 75% accuracy in the first round.

The two games the model mis-predicted were Manly vs Bulldogs and Roosters vs Rabbits

Assuming a $1,000 initial bank, if we used a Kelly staking strategy to bet this round, we would have made a profit of $470. Despite high prediction accuracy, the accuracy of this staking strategy would have been 50% due to its recommendation to stake on two of the matches the model actually mis-predicted.

Using Kelly staking we would have bet big on the Tigers, and this is where most of the profit comes from; however if the Tigers had of lost, we would have actually lost $483 for the round, and our final bank would have been $517!

If we had of used the (more conservative) proposed staking strategy we would be up a modest $39. Using this staking strategy we would have staked on two matches and made profit on both (100% staking accuracy).

Detailed round overview

The model predicted 6 of 8 games (75% accuracy). The models accuracy was well above the average of the 2009-2015 historic round 1 results (53%) and was well above the 2015 results (25%).

The biggest upset to the bookmakers odds was West Tigers who beat New Zealand. The bookmakers had West Tigers at 2.65 to win (indicating only a 38% chance of win).

The model was successfully in predicting two of the three upsets to the bookmakers odds (West Tigers and Gold Coast to win).

The two games the model mis-predicted were Manly v Bulldogs and Roosters v Rabbits which were upsets to the bookmakers odds as well.

Despite our high predictive accuracy for the round,  If we used the Kelly strategy to stake we would have staked on four games and lost on two of the matches (50% staking accuracy). But due to the win on the Tigers we would have still made a profit.

If we had used the proposed staking strategy, we would have staked on two games and won on two games (100% staking accuracy), but due to the conservative nature of the staking strategy would have only had a modest return of $39 for the round.

Detailed simulated staking

In accordance to our staking strategy we aren’t staking on anything until round 5, but lets take a look at what would have happened if we did bet. lets assume we started with $1,000 which is the actual current bankroll in my account.

(Full) Kelly staking:

The first game we would have backed (first game which had a positive Kelly fraction) would have been Manly v Bulldogs at odds of 1.70 to win. Our Kelly fraction was 0.07  which means we would have bet 0.07*1000 ($70 dollars). Bulldogs got up in an upset win (28 to 6 – yikes!). This would have reduced our bank to $930.

The next game we would have back would have been West tigers vs New Zealand, which was against the bookies tip by a very long shot (bookies had tigers at 2.65 indicating they though they only had a 38% chance of winning). My model had them at a 60% chance of winning so the discrepancy was BIG. The Kelly fraction was 0.41 which indicates we should bet 41% of our bank on this game… hmm seems very risky, oh well I have the excuse that the algorithm made me do it! The stake would have been 0.41*930 (our new bank after the Manly loss) which is $381.

Amazingly tigers did actually get up with a 34 to 26 win. We would have won big on this game, with a profit of (2.65*$381)-$381 = $629. Adding this to our bank we would have a new bank of $1,559

The next game we would have staked on would have been Roosters v Rabbits at 2.10 to win. Roosters got smashed 42 to 10 (great work algorithm lol). Kelly stake on this was pretty high at 0.18, so we would have lost 0.18*1,559 = $281.

And our bank would have went back down to $1,278

Our last bet would have been on the Gold coast to win against Newcastle at odds of 2.0 to win. Kelly fraction for this match was 0.15 so we wold have staked 0.15*$1,278= $192. Gold coast did indeed get up (30-12) and so we would have got a reasonable return of $192 ((2.30*$192)-$192)

This would have taken our final bank to $1,470 which is a profit of $470 over our initial bank.

Proposed staking strategy:

If we were using the proposed staking strategy which is to use a fixed (5% of current bank) wager on games which have a perceived probability > than the average of the perceived probability of historic false positives, then we would have only staked on two games:

Cowboys to win against the Sharks and Storm to win against the Dragons.

We would have first staked $50 on Cowboys at 1.35 which would have given us a profit of $20 taking our bank to $1,020. We would have next staked 5% of 1,020 ($51) on Strom at 1.38. This would have given us a $19 profit taking our final bank for the round to $1,039

Obviously this is a less risk adverse strategy and in this round would have yielded a lower profit than the Kelly strategy. However keep in mind that if Tigers had of actually lost, the final bank for the round would have been $517. Such is the rollercoaster ride of the Kelly staking strategy.

Table Summary of predictions vs actual results

TipScoreScore AgainstResultTrue PositiveFalse Positive
Brisbane Broncos to win against Parramatta Eels while Away174correct prediction10
Manly Sea Eagles to win against Canterbury Bulldogs while Home628incorrect prediction01
Canberra Raiders to win against Penrith Panthers while Home3022correct prediction10
Wests Tigers to win against New Zealand Warriors while Home3426correct prediction10
North Queensland Cowboys to win against Cronulla Sharks while Home2014correct prediction10
Sydney Roosters to win against South Sydney Rabbitohs while Home1042incorrect prediction01
Gold Coast Titans to win against Newcastle Knights while Home3012correct prediction10
Melbourne Storm to win against St George Illawarra Dragons while Home1816correct prediction10

Penalties analysis

Who is the most penalised player in in the last 10 years?  What about last year (2015)?

In 2015 the titans were perceived as being the bad boys of the NRL due to their high penalty count early in the season. Was this really the case? Titans coach Neil Henry vowed to correct and reduce this. Did he?

Guest poster Matt Dick had a look at these questions and tried to answer them (with statistics of course)! Lets find out what he came up with!

‘ Chapter 1

Beginnings

Hi all my name is Matthew Dick and like Kane I have an interest in data science and machine learning. Kane has been kind enough to allow me to use maxwellAi as platform to share some interesting stuff about the NRL stats that he has put together. This is the first article, but i’ve had a few ideas about concepts/theories I’d like to look into regarding the historical NRL data.
Hopefully some of this can be used to help the prediction model…

I took an interest the National Rugby League (NRL) Experiment a couple weeks ago and have been putting together some player data to join up with all the round by round data Kane had already collected.

The first chapter of this data analysis vignette is a look at the data collected from the NRL website, as well as from NRLStats and AFL tables

The Data

The dataset I am working from is still a CSV (comma separated variable file) that was created by joining several thousand individual CSVs from each team and round, for seasons 2005-2015.

A quick look at the variables…

str(nrl)
## 'data.frame':    72664 obs. of  66 variables:
##  $ X.2          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ X.1          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ X            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ player_code  : Factor w/ 1171 levels "AaronCannings",..: 297 167 595 1007 237 320 1107 135 189 298 ...
##  $ P            : int  12 5 4 4 4 0 0 0 0 0 ...
##  $ T            : int  1 1 1 1 1 0 0 0 0 0 ...
##  $ G            : int  4 0 0 0 0 0 0 0 0 0 ...
##  $ AR           : int  4 4 8 10 14 8 10 14 3 9 ...
##  $ ARM          : int  27 24 82 77 130 62 88 128 49 76 ...
##  $ LB           : int  0 1 1 0 0 0 1 0 0 0 ...
##  $ TA           : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ OL           : int  0 0 2 1 2 0 1 1 0 0 ...
##  $ TK           : int  17 14 2 18 31 40 30 17 14 19 ...
##  $ MT           : int  10 8 0 1 2 3 3 2 2 1 ...
##  $ E            : int  3 0 1 2 0 0 1 0 1 0 ...
##  $ K            : int  8 7 1 0 0 0 0 0 1 0 ...
##  $ LBA          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ KM           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ KR           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ CN           : int  4 0 0 0 0 0 0 0 0 0 ...
##  $ FG           : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ X40          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ DH           : int  0 0 0 2 0 2 0 0 0 0 ...
##  $ S            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ SF           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PenF         : int  1 0 0 0 0 1 0 1 0 0 ...
##  $ CM           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ season       : int  2005 2005 2005 2005 2005 2005 2005 2005 2005 2005 ...
##  $ round        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ club_name    : Factor w/ 16 levels "Broncos","Bulldogs",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ round_name   : Factor w/ 31 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ date         : Factor w/ 1171 levels "1-Apr-05","1-Apr-06",..: 182 182 182 182 182 182 182 182 182 182 ...
##  $ venue        : Factor w/ 41 levels "Adelaide","Barlow",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ attendance   : int  43488 43488 43488 43488 43488 43488 43488 43488 43488 43488 ...
##  $ home_team    : Factor w/ 16 levels "Broncos","Bulldogs",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ home_away    : Factor w/ 2 levels "away","home": 2 2 2 2 2 2 2 2 2 2 ...
##  $ home_score   : int  29 29 29 29 29 29 29 29 29 29 ...
##  $ home_tries   : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ home_goals   : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ home_fg      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ away_team    : Factor w/ 16 levels "Broncos","Bulldogs",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ away_score   : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ away_tries   : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ away_goals   : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ away_fg      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ referee1     : Factor w/ 29 levels "Adam Devcich",..: 27 27 27 27 27 27 27 27 27 27 ...
##  $ referee2     : Factor w/ 30 levels "","Adam Devcich",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ home_scrum   : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ home_penalty : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ away_scrum   : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ away_penalty : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ position     : Factor w/ 10 levels "BE","CN","FB",..: 4 5 3 2 9 9 7 1 1 1 ...
##  $ player_name  : Factor w/ 1170 levels "Aaron Cannings",..: 298 168 595 1006 238 321 1105 135 190 299 ...
##  $ player_code.y: Factor w/ 1170 levels "AaronCannings",..: 298 168 595 1006 238 321 1105 135 190 299 ...
##  $ first_name   : Factor w/ 441 levels "Aaron ","Adam ",..: 93 46 210 362 77 96 405 37 59 93 ...
##  $ surname      : Factor w/ 944 levels "Abraham","Achurch",..: 443 743 347 51 624 779 100 839 512 481 ...
##  $ team_code    : Factor w/ 16 levels "BR","CB","CN",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ tries        : int  1 1 1 1 1 0 0 0 0 0 ...
##  $ goals        : int  4 0 0 0 0 0 0 0 0 0 ...
##  $ goal_attempts: int  6 0 0 0 0 0 0 0 0 0 ...
##  $ field_goals  : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ FPP          : num  35.7 33.4 26.2 35.7 56 46.2 44.8 31.8 18.9 26.6 ...
##  $ FPN          : int  28 16 2 6 4 8 8 6 6 2 ...
##  $ home_win     : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ FP           : num  7.7 17.4 24.2 29.7 52 38.2 36.8 25.8 12.9 24.6 ...
##  $ win          : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

Part 2: Penalties

One of the first stats that I thought might be interesting were the penalties against each player; recorded as the “PenF” variable.

Lets see how many penalties per year there were.

season total
2005 2264
2006 2128
2007 2547
2008 2472
2009 2317
2010 2222
2011 2132
2012 2002
2013 17
2014 0
2015 2576

Woops; looks like something is up. Either the referees weren’t looking in 2013 and 2014, or there was an issue with the data source.

Unfortunately it looks like the original data on the NRL website shows the same information. Whilst there are other sources for the penalty data, most of this is on a by game basis…and I’d like to look a little deeper.

In the meantime we’ll just have to exclude 2013 and 2014 from our analyses, until someone can help us out with the missing data (any volunteers?).

Who is the most penalised player in our data set?

player_name total
Anthony Watmough 165
Michael Ennis 157
Johnathan Thurston 126
Beau Scott 115
Cameron Smith 115

And on an average penalties per game basis.

player_name games pen_per_game
Jim Curtis 2 2.000000
Ben Walker 21 1.380952
Brett Oliver 1 1.000000
Daniel Fepuleai 1 1.000000
Daniel Rauicava 2 1.000000

Not very insightful…players with low game counts are going to potentially skew our numbers.

Lets filter by player with at least 10 games.

player_name games pen_per_game
Ben Walker 21 1.3809524
Glenn Morrison 45 1.0000000
Matt Adamson 16 0.8750000
Jason Smith 57 0.8596491
Travis Burns 127 0.8582677

And last year?

player_name games pen_per_game
Ryan James 21 1.4285714
Nate Myles 16 1.2500000
Andrew Fifita 18 1.0555556
Cameron Smith 25 1.0000000
James Maloney 26 0.9615385

The data seems to match some quick googling.

This article mentions the Gold Coast Titans as having a reputation as the “bad-boys” of the NRL.

“It is a concern,” Henry said of the Gold Coast’s discipline.

“We were down 6-3 (in the penalty count) at half-time (against Parramatta) and talked about avoidable penalties around the ruck.”

“We’ll be looking to reduce that.”

Let's have a look at the overall average penalties per game for each club in the 2015 season, to see where the Titans sit.

Hmmm - nothing so far

They don’t seem to stand out here.

The article is dated April, which is fairly early in the season. It is possible that the Titans had a higher penalty count in the early rounds of the season. Assuming that Neil Henry had a chat to his players after round 8, we’ll look at the average penalties in rounds 1-8, and 9-26 as two separate samples.

We’ll have to exclude the byes from rounds 8 to 26 so as to not unfairly lower the average penalties for the “post chewing out” period.

Looking promising…

The mean penalties for the two parts of the season certain appear to correlate to our assumption

We can do a t-test to determine if the difference in the means is statistically significant. To do this, we have to rearrange the data a little bit.

tmp = nrl %>% group_by(season, round, club_name) %>%
  summarise(games = length(unique(round)), pen_per_game = sum(PenF)/games) %>%
  filter(season == 2015, club_name == "Titans", pen_per_game != 0) %>%
  mutate(before_after = round > 8)

t.test(pen_per_game ~ before_after, data = tmp, var.equal=T)
## 
##  Two Sample t-test
## 
## data:  pen_per_game by before_after
## t = 2.491, df = 22, p-value = 0.02077
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2302531 2.5197469
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##               8.000               6.625

T-test interpretation

The t-test p-value of 0.02 is less than 0.05, so we reject the NULL hypothesis and conclude that the difference the means is significant.

The Titans lowered their average penalties per game after round 8 of 2015. Whether this is due to Neil Henry’s coaching, I’ll leave to the reader.