Who is the most penalised player in in the last 10 years? What about last year (2015)?
In 2015 the titans were perceived as being the bad boys of the NRL due to their high penalty count early in the season. Was this really the case? Titans coach Neil Henry vowed to correct and reduce this. Did he?
Guest poster Matt Dick had a look at these questions and tried to answer them (with statistics of course)! Lets find out what he came up with!
‘
Chapter 1
Matthew Dick
3 March 2016
Beginnings
Hi all my name is Matthew Dick and like Kane I have an interest in data science and machine learning. Kane has been kind enough to allow me to use maxwellAi as platform to share some interesting stuff about the NRL stats that he has put together. This is the first article, but i’ve had a few ideas about concepts/theories I’d like to look into regarding the historical NRL data.
Hopefully some of this can be used to help the prediction model…
I took an interest the National Rugby League (NRL) Experiment a couple weeks ago and have been putting together some player data to join up with all the round by round data Kane had already collected.
The first chapter of this data analysis vignette is a look at the data collected from the NRL website, as well as from NRLStats and AFL tables
The Data
The dataset I am working from is still a CSV (comma separated variable file) that was created by joining several thousand individual CSVs from each team and round, for seasons 2005-2015.
A quick look at the variables…
str(nrl)
## 'data.frame': 72664 obs. of 66 variables:
## $ X.2 : int 1 2 3 4 5 6 7 8 9 10 ...
## $ X.1 : int 1 2 3 4 5 6 7 8 9 10 ...
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ player_code : Factor w/ 1171 levels "AaronCannings",..: 297 167 595 1007 237 320 1107 135 189 298 ...
## $ P : int 12 5 4 4 4 0 0 0 0 0 ...
## $ T : int 1 1 1 1 1 0 0 0 0 0 ...
## $ G : int 4 0 0 0 0 0 0 0 0 0 ...
## $ AR : int 4 4 8 10 14 8 10 14 3 9 ...
## $ ARM : int 27 24 82 77 130 62 88 128 49 76 ...
## $ LB : int 0 1 1 0 0 0 1 0 0 0 ...
## $ TA : int 0 1 0 0 0 0 0 0 0 0 ...
## $ OL : int 0 0 2 1 2 0 1 1 0 0 ...
## $ TK : int 17 14 2 18 31 40 30 17 14 19 ...
## $ MT : int 10 8 0 1 2 3 3 2 2 1 ...
## $ E : int 3 0 1 2 0 0 1 0 1 0 ...
## $ K : int 8 7 1 0 0 0 0 0 1 0 ...
## $ LBA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ KM : int 0 0 0 0 0 0 0 0 0 0 ...
## $ KR : int 0 0 0 0 0 0 0 0 0 0 ...
## $ CN : int 4 0 0 0 0 0 0 0 0 0 ...
## $ FG : int 0 1 0 0 0 0 0 0 0 0 ...
## $ X40 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ DH : int 0 0 0 2 0 2 0 0 0 0 ...
## $ S : int 0 0 0 0 0 0 0 0 0 0 ...
## $ SF : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PenF : int 1 0 0 0 0 1 0 1 0 0 ...
## $ CM : int 0 0 0 0 0 0 0 0 0 0 ...
## $ season : int 2005 2005 2005 2005 2005 2005 2005 2005 2005 2005 ...
## $ round : int 1 1 1 1 1 1 1 1 1 1 ...
## $ club_name : Factor w/ 16 levels "Broncos","Bulldogs",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ round_name : Factor w/ 31 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ date : Factor w/ 1171 levels "1-Apr-05","1-Apr-06",..: 182 182 182 182 182 182 182 182 182 182 ...
## $ venue : Factor w/ 41 levels "Adelaide","Barlow",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ attendance : int 43488 43488 43488 43488 43488 43488 43488 43488 43488 43488 ...
## $ home_team : Factor w/ 16 levels "Broncos","Bulldogs",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ home_away : Factor w/ 2 levels "away","home": 2 2 2 2 2 2 2 2 2 2 ...
## $ home_score : int 29 29 29 29 29 29 29 29 29 29 ...
## $ home_tries : int 5 5 5 5 5 5 5 5 5 5 ...
## $ home_goals : int 4 4 4 4 4 4 4 4 4 4 ...
## $ home_fg : int 1 1 1 1 1 1 1 1 1 1 ...
## $ away_team : Factor w/ 16 levels "Broncos","Bulldogs",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ away_score : int 16 16 16 16 16 16 16 16 16 16 ...
## $ away_tries : int 3 3 3 3 3 3 3 3 3 3 ...
## $ away_goals : int 2 2 2 2 2 2 2 2 2 2 ...
## $ away_fg : int 0 0 0 0 0 0 0 0 0 0 ...
## $ referee1 : Factor w/ 29 levels "Adam Devcich",..: 27 27 27 27 27 27 27 27 27 27 ...
## $ referee2 : Factor w/ 30 levels "","Adam Devcich",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ home_scrum : int 10 10 10 10 10 10 10 10 10 10 ...
## $ home_penalty : int 7 7 7 7 7 7 7 7 7 7 ...
## $ away_scrum : int 7 7 7 7 7 7 7 7 7 7 ...
## $ away_penalty : int 4 4 4 4 4 4 4 4 4 4 ...
## $ position : Factor w/ 10 levels "BE","CN","FB",..: 4 5 3 2 9 9 7 1 1 1 ...
## $ player_name : Factor w/ 1170 levels "Aaron Cannings",..: 298 168 595 1006 238 321 1105 135 190 299 ...
## $ player_code.y: Factor w/ 1170 levels "AaronCannings",..: 298 168 595 1006 238 321 1105 135 190 299 ...
## $ first_name : Factor w/ 441 levels "Aaron ","Adam ",..: 93 46 210 362 77 96 405 37 59 93 ...
## $ surname : Factor w/ 944 levels "Abraham","Achurch",..: 443 743 347 51 624 779 100 839 512 481 ...
## $ team_code : Factor w/ 16 levels "BR","CB","CN",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ tries : int 1 1 1 1 1 0 0 0 0 0 ...
## $ goals : int 4 0 0 0 0 0 0 0 0 0 ...
## $ goal_attempts: int 6 0 0 0 0 0 0 0 0 0 ...
## $ field_goals : int 0 1 0 0 0 0 0 0 0 0 ...
## $ FPP : num 35.7 33.4 26.2 35.7 56 46.2 44.8 31.8 18.9 26.6 ...
## $ FPN : int 28 16 2 6 4 8 8 6 6 2 ...
## $ home_win : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ FP : num 7.7 17.4 24.2 29.7 52 38.2 36.8 25.8 12.9 24.6 ...
## $ win : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
Part 2: Penalties
One of the first stats that I thought might be interesting were the penalties against each player; recorded as the “PenF” variable.
Lets see how many penalties per year there were.
season | total |
---|---|
2005 | 2264 |
2006 | 2128 |
2007 | 2547 |
2008 | 2472 |
2009 | 2317 |
2010 | 2222 |
2011 | 2132 |
2012 | 2002 |
2013 | 17 |
2014 | 0 |
2015 | 2576 |
Woops; looks like something is up. Either the referees weren’t looking in 2013 and 2014, or there was an issue with the data source.
Unfortunately it looks like the original data on the NRL website shows the same information. Whilst there are other sources for the penalty data, most of this is on a by game basis…and I’d like to look a little deeper.
In the meantime we’ll just have to exclude 2013 and 2014 from our analyses, until someone can help us out with the missing data (any volunteers?).
Who is the most penalised player in our data set?
player_name | total |
---|---|
Anthony Watmough | 165 |
Michael Ennis | 157 |
Johnathan Thurston | 126 |
Beau Scott | 115 |
Cameron Smith | 115 |
And on an average penalties per game basis.
player_name | games | pen_per_game |
---|---|---|
Jim Curtis | 2 | 2.000000 |
Ben Walker | 21 | 1.380952 |
Brett Oliver | 1 | 1.000000 |
Daniel Fepuleai | 1 | 1.000000 |
Daniel Rauicava | 2 | 1.000000 |
Not very insightful…players with low game counts are going to potentially skew our numbers.
Lets filter by player with at least 10 games.
player_name | games | pen_per_game |
---|---|---|
Ben Walker | 21 | 1.3809524 |
Glenn Morrison | 45 | 1.0000000 |
Matt Adamson | 16 | 0.8750000 |
Jason Smith | 57 | 0.8596491 |
Travis Burns | 127 | 0.8582677 |
And last year?
player_name | games | pen_per_game |
---|---|---|
Ryan James | 21 | 1.4285714 |
Nate Myles | 16 | 1.2500000 |
Andrew Fifita | 18 | 1.0555556 |
Cameron Smith | 25 | 1.0000000 |
James Maloney | 26 | 0.9615385 |
The data seems to match some quick googling.
This article mentions the Gold Coast Titans as having a reputation as the “bad-boys” of the NRL.
“It is a concern,” Henry said of the Gold Coast’s discipline.
“We were down 6-3 (in the penalty count) at half-time (against Parramatta) and talked about avoidable penalties around the ruck.”
“We’ll be looking to reduce that.”
Let's have a look at the overall average penalties per game for each club in the 2015 season, to see where the Titans sit.
Hmmm - nothing so far
They don’t seem to stand out here.
The article is dated April, which is fairly early in the season. It is possible that the Titans had a higher penalty count in the early rounds of the season. Assuming that Neil Henry had a chat to his players after round 8, we’ll look at the average penalties in rounds 1-8, and 9-26 as two separate samples.
We’ll have to exclude the byes from rounds 8 to 26 so as to not unfairly lower the average penalties for the “post chewing out” period.
Looking promising…
The mean penalties for the two parts of the season certain appear to correlate to our assumption
We can do a t-test to determine if the difference in the means is statistically significant. To do this, we have to rearrange the data a little bit.
tmp = nrl %>% group_by(season, round, club_name) %>%
summarise(games = length(unique(round)), pen_per_game = sum(PenF)/games) %>%
filter(season == 2015, club_name == "Titans", pen_per_game != 0) %>%
mutate(before_after = round > 8)
t.test(pen_per_game ~ before_after, data = tmp, var.equal=T)
##
## Two Sample t-test
##
## data: pen_per_game by before_after
## t = 2.491, df = 22, p-value = 0.02077
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.2302531 2.5197469
## sample estimates:
## mean in group FALSE mean in group TRUE
## 8.000 6.625
T-test interpretation
The t-test p-value of 0.02 is less than 0.05, so we reject the NULL hypothesis and conclude that the difference the means is significant.
The Titans lowered their average penalties per game after round 8 of 2015. Whether this is due to Neil Henry’s coaching, I’ll leave to the reader.