7  Exercises

7.1 Exercise 1

We are given a dataset containing daily counts of diseases y from one geographical area. We want to identify:

  • Is there a general yearly trend (i.e. increasing or decreasing from year to year?)
  • Does seasonality exist (use the categorical variable “season”)?
  • What season has the most cases? (Spring/summer/autumn/winter?)
  • Is numberOfCows associated with the outcome y?

The data for this chapter is available at: https://www.csids.no/longitudinal-analysis-for-surveillance/data/exercise_1.csv

library(data.table)
set.seed(4)

d <- data.table(date=seq.Date(
  from=as.Date("2010-01-01"),
  to=as.Date("2015-12-31"),
  by=1))

d[,numberOfCows:=rpois(.N,5)]

d[,year:=as.numeric(format.Date(date,"%G"))]
d[,week:=as.numeric(format.Date(date,"%V"))]
d[,month:=as.numeric(format.Date(date,"%m"))]
d[,season:="Winter"]
d[month %in% c(3:5), season:="Spring"]
d[month %in% c(6:8), season:="Summer"]
d[month %in% c(9:11), season:="Autumn"]

d[,seasonIntercept:=0]
d[season=="Spring",seasonIntercept:=1]
d[season=="Summer",seasonIntercept:=2]

d[,yearMinus2000:=year-2000]
d[,dayOfSeries:=1:.N]

d[,mu := round(exp(0.1 + yearMinus2000*0.2 + seasonIntercept + 0.2*numberOfCows))]
d[,y:=rpois(.N,mu)]

7.2 Exercise 2

We are given a dataset containing daily counts of diseases y from three geographical areas (fylke). We want to identify:

  • Is there a general yearly trend (i.e. increasing or decreasing from year to year?)
  • Does seasonality exist (use the categorical variable “season”)?
  • What season has the most cases? (Spring/summer/autumn/winter?)
  • Is numberOfCows associated with the outcome y?

The data for this chapter is available at: https://www.csids.no/longitudinal-analysis-for-surveillance/data/exercise_2.csv

library(data.table)
set.seed(4)

d <- data.table(date=seq.Date(
  from=as.Date("2010-01-01"),
  to=as.Date("2015-12-31"),
  by=1))

temp <- vector("list",length=3)
for(i in 1:3){
  temp[[i]] <- copy(d)
  temp[[i]][,fylke:=i]
}
d <- rbindlist(temp)

d[,numberOfCows:=rpois(.N,5)]

d[,year:=as.numeric(format.Date(date,"%G"))]
d[,week:=as.numeric(format.Date(date,"%V"))]
d[,month:=as.numeric(format.Date(date,"%m"))]
d[,season:="Winter"]
d[month %in% c(3:5), season:="Spring"]
d[month %in% c(6:8), season:="Summer"]
d[month %in% c(9:11), season:="Autumn"]

d[,seasonIntercept:=0]
d[season=="Spring",seasonIntercept:=1]
d[season=="Summer",seasonIntercept:=2]

d[,yearMinus2000:=year-2000]
d[,dayOfSeries:=1:.N,by=fylke]

d[,mu := round(exp(0.1 + yearMinus2000*0.2 + seasonIntercept + 0.0*numberOfCows + 0.1*(fylke-2)))]
d[,y:=rpois(.N,mu)]
for(i in 1:3) d[fylke==i,y:=round(as.numeric(arima.sim(model=list("ar"=c(0.5)), rand.gen = rpois, n=.N, lambda=mu)))]

7.3 Exercise 3

We are given a dataset containing counts of diseases y from three geographical areas (fylke). We want to identify:

  • Is there a general yearly trend (i.e. increasing or decreasing from year to year?)
  • Does seasonality exist (use the categorical variable “season”)?
  • What season has the most cases? (Spring/summer/autumn/winter?)
  • Is numberOfCows associated with the outcome y?

The data for this chapter is available at: https://www.csids.no/longitudinal-analysis-for-surveillance/data/exercise_3.csv

library(data.table)
set.seed(4)

d <- data.table(date=seq.Date(
  from=as.Date("2010-01-01"),
  to=as.Date("2015-12-31"),
  by=1))

temp <- vector("list",length=3)
for(i in 1:3){
  temp[[i]] <- copy(d)
  temp[[i]][,fylke:=i]
}
d <- rbindlist(temp)

d[,numberOfCows:=rpois(.N,5)]

d[,year:=as.numeric(format.Date(date,"%G"))]
d[,week:=as.numeric(format.Date(date,"%V"))]
d[,month:=as.numeric(format.Date(date,"%m"))]
d[,season:="Winter"]
d[month %in% c(3:5), season:="Spring"]
d[month %in% c(6:8), season:="Summer"]
d[month %in% c(9:11), season:="Autumn"]

d[,seasonIntercept:=0]
d[season=="Spring",seasonIntercept:=1]
d[season=="Summer",seasonIntercept:=2]

d[,yearMinus2000:=year-2000]

d <- d[sample(1:.N,600)]

d[,mu := round(exp(0.1 + yearMinus2000*0.2 + seasonIntercept + 0.0*numberOfCows + 0.1*(fylke-2)))]
d[,y:=rpois(.N,mu)]