Penalties analysis

Who is the most penalised player in in the last 10 years?  What about last year (2015)?

In 2015 the titans were perceived as being the bad boys of the NRL due to their high penalty count early in the season. Was this really the case? Titans coach Neil Henry vowed to correct and reduce this. Did he?

Guest poster Matt Dick had a look at these questions and tried to answer them (with statistics of course)! Lets find out what he came up with!

‘ Chapter 1

Beginnings

Hi all my name is Matthew Dick and like Kane I have an interest in data science and machine learning. Kane has been kind enough to allow me to use maxwellAi as platform to share some interesting stuff about the NRL stats that he has put together. This is the first article, but i’ve had a few ideas about concepts/theories I’d like to look into regarding the historical NRL data.
Hopefully some of this can be used to help the prediction model…

I took an interest the National Rugby League (NRL) Experiment a couple weeks ago and have been putting together some player data to join up with all the round by round data Kane had already collected.

The first chapter of this data analysis vignette is a look at the data collected from the NRL website, as well as from NRLStats and AFL tables

The Data

The dataset I am working from is still a CSV (comma separated variable file) that was created by joining several thousand individual CSVs from each team and round, for seasons 2005-2015.

A quick look at the variables…

str(nrl)
## 'data.frame':    72664 obs. of  66 variables:
##  $ X.2          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ X.1          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ X            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ player_code  : Factor w/ 1171 levels "AaronCannings",..: 297 167 595 1007 237 320 1107 135 189 298 ...
##  $ P            : int  12 5 4 4 4 0 0 0 0 0 ...
##  $ T            : int  1 1 1 1 1 0 0 0 0 0 ...
##  $ G            : int  4 0 0 0 0 0 0 0 0 0 ...
##  $ AR           : int  4 4 8 10 14 8 10 14 3 9 ...
##  $ ARM          : int  27 24 82 77 130 62 88 128 49 76 ...
##  $ LB           : int  0 1 1 0 0 0 1 0 0 0 ...
##  $ TA           : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ OL           : int  0 0 2 1 2 0 1 1 0 0 ...
##  $ TK           : int  17 14 2 18 31 40 30 17 14 19 ...
##  $ MT           : int  10 8 0 1 2 3 3 2 2 1 ...
##  $ E            : int  3 0 1 2 0 0 1 0 1 0 ...
##  $ K            : int  8 7 1 0 0 0 0 0 1 0 ...
##  $ LBA          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ KM           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ KR           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ CN           : int  4 0 0 0 0 0 0 0 0 0 ...
##  $ FG           : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ X40          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ DH           : int  0 0 0 2 0 2 0 0 0 0 ...
##  $ S            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ SF           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PenF         : int  1 0 0 0 0 1 0 1 0 0 ...
##  $ CM           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ season       : int  2005 2005 2005 2005 2005 2005 2005 2005 2005 2005 ...
##  $ round        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ club_name    : Factor w/ 16 levels "Broncos","Bulldogs",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ round_name   : Factor w/ 31 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ date         : Factor w/ 1171 levels "1-Apr-05","1-Apr-06",..: 182 182 182 182 182 182 182 182 182 182 ...
##  $ venue        : Factor w/ 41 levels "Adelaide","Barlow",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ attendance   : int  43488 43488 43488 43488 43488 43488 43488 43488 43488 43488 ...
##  $ home_team    : Factor w/ 16 levels "Broncos","Bulldogs",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ home_away    : Factor w/ 2 levels "away","home": 2 2 2 2 2 2 2 2 2 2 ...
##  $ home_score   : int  29 29 29 29 29 29 29 29 29 29 ...
##  $ home_tries   : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ home_goals   : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ home_fg      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ away_team    : Factor w/ 16 levels "Broncos","Bulldogs",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ away_score   : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ away_tries   : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ away_goals   : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ away_fg      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ referee1     : Factor w/ 29 levels "Adam Devcich",..: 27 27 27 27 27 27 27 27 27 27 ...
##  $ referee2     : Factor w/ 30 levels "","Adam Devcich",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ home_scrum   : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ home_penalty : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ away_scrum   : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ away_penalty : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ position     : Factor w/ 10 levels "BE","CN","FB",..: 4 5 3 2 9 9 7 1 1 1 ...
##  $ player_name  : Factor w/ 1170 levels "Aaron Cannings",..: 298 168 595 1006 238 321 1105 135 190 299 ...
##  $ player_code.y: Factor w/ 1170 levels "AaronCannings",..: 298 168 595 1006 238 321 1105 135 190 299 ...
##  $ first_name   : Factor w/ 441 levels "Aaron ","Adam ",..: 93 46 210 362 77 96 405 37 59 93 ...
##  $ surname      : Factor w/ 944 levels "Abraham","Achurch",..: 443 743 347 51 624 779 100 839 512 481 ...
##  $ team_code    : Factor w/ 16 levels "BR","CB","CN",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ tries        : int  1 1 1 1 1 0 0 0 0 0 ...
##  $ goals        : int  4 0 0 0 0 0 0 0 0 0 ...
##  $ goal_attempts: int  6 0 0 0 0 0 0 0 0 0 ...
##  $ field_goals  : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ FPP          : num  35.7 33.4 26.2 35.7 56 46.2 44.8 31.8 18.9 26.6 ...
##  $ FPN          : int  28 16 2 6 4 8 8 6 6 2 ...
##  $ home_win     : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ FP           : num  7.7 17.4 24.2 29.7 52 38.2 36.8 25.8 12.9 24.6 ...
##  $ win          : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

Part 2: Penalties

One of the first stats that I thought might be interesting were the penalties against each player; recorded as the “PenF” variable.

Lets see how many penalties per year there were.

season total
2005 2264
2006 2128
2007 2547
2008 2472
2009 2317
2010 2222
2011 2132
2012 2002
2013 17
2014 0
2015 2576

Woops; looks like something is up. Either the referees weren’t looking in 2013 and 2014, or there was an issue with the data source.

Unfortunately it looks like the original data on the NRL website shows the same information. Whilst there are other sources for the penalty data, most of this is on a by game basis…and I’d like to look a little deeper.

In the meantime we’ll just have to exclude 2013 and 2014 from our analyses, until someone can help us out with the missing data (any volunteers?).

Who is the most penalised player in our data set?

player_name total
Anthony Watmough 165
Michael Ennis 157
Johnathan Thurston 126
Beau Scott 115
Cameron Smith 115

And on an average penalties per game basis.

player_name games pen_per_game
Jim Curtis 2 2.000000
Ben Walker 21 1.380952
Brett Oliver 1 1.000000
Daniel Fepuleai 1 1.000000
Daniel Rauicava 2 1.000000

Not very insightful…players with low game counts are going to potentially skew our numbers.

Lets filter by player with at least 10 games.

player_name games pen_per_game
Ben Walker 21 1.3809524
Glenn Morrison 45 1.0000000
Matt Adamson 16 0.8750000
Jason Smith 57 0.8596491
Travis Burns 127 0.8582677

And last year?

player_name games pen_per_game
Ryan James 21 1.4285714
Nate Myles 16 1.2500000
Andrew Fifita 18 1.0555556
Cameron Smith 25 1.0000000
James Maloney 26 0.9615385

The data seems to match some quick googling.

This article mentions the Gold Coast Titans as having a reputation as the “bad-boys” of the NRL.

“It is a concern,” Henry said of the Gold Coast’s discipline.

“We were down 6-3 (in the penalty count) at half-time (against Parramatta) and talked about avoidable penalties around the ruck.”

“We’ll be looking to reduce that.”

Let's have a look at the overall average penalties per game for each club in the 2015 season, to see where the Titans sit.

Hmmm - nothing so far

They don’t seem to stand out here.

The article is dated April, which is fairly early in the season. It is possible that the Titans had a higher penalty count in the early rounds of the season. Assuming that Neil Henry had a chat to his players after round 8, we’ll look at the average penalties in rounds 1-8, and 9-26 as two separate samples.

We’ll have to exclude the byes from rounds 8 to 26 so as to not unfairly lower the average penalties for the “post chewing out” period.

Looking promising…

The mean penalties for the two parts of the season certain appear to correlate to our assumption

We can do a t-test to determine if the difference in the means is statistically significant. To do this, we have to rearrange the data a little bit.

tmp = nrl %>% group_by(season, round, club_name) %>%
  summarise(games = length(unique(round)), pen_per_game = sum(PenF)/games) %>%
  filter(season == 2015, club_name == "Titans", pen_per_game != 0) %>%
  mutate(before_after = round > 8)

t.test(pen_per_game ~ before_after, data = tmp, var.equal=T)
## 
##  Two Sample t-test
## 
## data:  pen_per_game by before_after
## t = 2.491, df = 22, p-value = 0.02077
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2302531 2.5197469
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##               8.000               6.625

T-test interpretation

The t-test p-value of 0.02 is less than 0.05, so we reject the NULL hypothesis and conclude that the difference the means is significant.

The Titans lowered their average penalties per game after round 8 of 2015. Whether this is due to Neil Henry’s coaching, I’ll leave to the reader.