We can see from the data that we have 20 geographical areas (fylke) with 100 observations for each fylke, but the sampling did not happen consistently (some years have multiple measurements, other years have no measurements).
For this scenario, we use the lme4::glmer function in R. We need to introduce a (1|fylke) term to identify the geographical areas (i.e. clusters). In STATA we use the meglm function and introduce a || fylke: term to identify the geographical areas (i.e. clusters).
// STATA CODE STARTS
insheet using "chapter_5.csv", clear
gen yearMinus2000 = year-2000
meglm y x yearMinus2000 || fylke:, family(poisson)
// STATA CODE ENDS
# R CODEd[,yearMinus2000:=year-2000]summary(fit <- lme4::glmer(y~x + yearMinus2000 + (1|fylke),data=d,family=poisson()))
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: poisson ( log )
Formula: y ~ x + yearMinus2000 + (1 | fylke)
Data: d
AIC BIC logLik deviance df.resid
15502.5 15524.9 -7747.2 15494.5 1996
Scaled residuals:
Min 1Q Median 3Q Max
-3.3647 -0.6802 -0.0153 0.6681 4.3638
Random effects:
Groups Name Variance Std.Dev.
fylke (Intercept) 0.6114 0.7819
Number of obs: 2000, groups: fylke, 20
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.375e+00 1.749e-01 19.298 <2e-16 ***
x 3.002e+00 6.000e-03 500.407 <2e-16 ***
yearMinus2000 8.884e-05 7.270e-05 1.222 0.222
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) x
x -0.024
yearMns2000 0.006 0.033
optimizer (Nelder_Mead) convergence code: 0 (OK)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
You can see that the format of the results is the same as an ordinary regression.