Further examples of areal data modeling: Child poverty in London

1 Code to reproduce Figure11.8
2 Code for modeling

rm(list=ls())
knitr::opts_knit$set(root.dir = '~/Dropbox/sks/bookproject/rbook/chapter_code_files')
figurepath <- "~/Dropbox/sks/bookproject/bmstdr_Sahu/figures/chap11figures/london"
filepath <- "~/Dropbox/sks/bookproject/bmstdr_Sahu/chapter_code_files/txttables/"
mappath <- "mapfiles"
dpath <- "datafiles/"
knitr::opts_chunk$set(collapse = TRUE)
load(file=paste0(dpath, "C11childpovertylondon.RData"))
dim(childpoverty)

## [1] 330  19

colnames(childpoverty)

##  [1] "id"              "Areacode"        "LA"              "Areaname"       
##  [5] "Areahectares"    "landareaSqKm"    "imdavscore07"    "year"           
##  [9] "population"      "medianincome"    "econinactive"    "houseprice"     
## [13] "nGPregistration" "punder16"        "pall"            "mincome"        
## [17] "log10price"      "logitpall"       "logitinactive"

summary(childpoverty)

##        id       Areacode              LA              Areaname        
##  Min.   : 0   Length:330         Length:330         Length:330        
##  1st Qu.: 8   Class :character   Class :character   Class :character  
##  Median :16   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :16                                                           
##  3rd Qu.:24                                                           
##  Max.   :32                                                           
##   Areahectares    landareaSqKm     imdavscore07        year     
##  Min.   :  290   Min.   :  2.90   Min.   : 9.55   Min.   :2006  
##  1st Qu.: 2681   1st Qu.: 26.82   1st Qu.:16.21   1st Qu.:2008  
##  Median : 3762   Median : 37.61   Median :25.10   Median :2010  
##  Mean   : 4764   Mean   : 47.64   Mean   :25.68   Mean   :2010  
##  3rd Qu.: 5642   3rd Qu.: 56.41   3rd Qu.:33.33   3rd Qu.:2013  
##  Max.   :15013   Max.   :150.15   Max.   :46.10   Max.   :2015  
##    population      medianincome    econinactive     houseprice     
##  Min.   :  7300   Min.   :15200   Min.   :14.70   Min.   : 160000  
##  1st Qu.:203275   1st Qu.:20900   1st Qu.:21.62   1st Qu.: 245000  
##  Median :248792   Median :23100   Median :24.85   Median : 290000  
##  Mean   :246210   Mean   :24745   Mean   :25.28   Mean   : 337575  
##  3rd Qu.:293950   3rd Qu.:26075   3rd Qu.:28.20   3rd Qu.: 385000  
##  Max.   :379691   Max.   :65300   Max.   :41.30   Max.   :1197500  
##  nGPregistration    punder16          pall          mincome       
##  Min.   :  115   Min.   : 8.10   Min.   : 8.30   Min.   :-1.3441  
##  1st Qu.: 4280   1st Qu.:19.20   1st Qu.:18.60   1st Qu.:-0.5414  
##  Median : 6290   Median :25.55   Median :25.20   Median :-0.2317  
##  Mean   : 6464   Mean   :26.36   Mean   :26.33   Mean   : 0.0000  
##  3rd Qu.: 8576   3rd Qu.:32.58   3rd Qu.:32.77   3rd Qu.: 0.1873  
##  Max.   :25429   Max.   :63.00   Max.   :63.60   Max.   : 5.7107  
##    log10price      logitpall       logitinactive    
##  Min.   :5.204   Min.   :-2.4023   Min.   :-1.7583  
##  1st Qu.:5.389   1st Qu.:-1.4763   1st Qu.:-1.2877  
##  Median :5.462   Median :-1.0880   Median :-1.1066  
##  Mean   :5.497   Mean   :-1.0874   Mean   :-1.0999  
##  3rd Qu.:5.585   3rd Qu.:-0.7184   3rd Qu.:-0.9346  
##  Max.   :6.078   Max.   : 0.5580   Max.   :-0.3516

## Loading required package: Rcpp
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## Warning in .recacheSubclasses(def@className, def, env): undefined subclass
## "numericVector" of class "Mnumeric"; definition not updated
## Loading required package: databmstdr
## Loading required package: ggsn
## Loading required package: grid
## Loading required package: sp
## rgdal: version: 1.5-23, (SVN revision 1121)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
## Path to GDAL shared files: /usr/share/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
## Path to PROJ shared files: /usr/share/proj
## Linking to sp version:1.4-5
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
## Loading required package: MASS

1 Code to reproduce Figure11.8



## Get the average fix data
head(lainfo)
##   id  Areacode             Areaname  hectares Abbrevname  centerx  centery
## 1  0 E09000021 Kingston upon Thames  3726.117   Kingston 519297.6 166820.0
## 2  1 E09000008              Croydon  8649.441    Croydon 533290.2 163541.2
## 3  2 E09000006              Bromley 15013.487    Bromley 542895.5 165655.5
## 4  3 E09000018             Hounslow  5658.541   Hounslow 513515.5 175643.2
## 5  4 E09000009               Ealing  5554.428     Ealing 515887.9 181715.5
## 6  5 E09000016             Havering 11445.735   Havering 554049.0 187392.0
head(childpoverty)
##     id  Areacode   LA             Areaname Areahectares landareaSqKm
## 201  0 E09000021 00AX Kingston upon Thames         3726        37.25
## 202  0 E09000021 00AX Kingston upon Thames         3726        37.25
## 203  0 E09000021 00AX Kingston upon Thames         3726        37.25
## 204  0 E09000021 00AX Kingston upon Thames         3726        37.25
## 205  0 E09000021 00AX Kingston upon Thames         3726        37.25
## 206  0 E09000021 00AX Kingston upon Thames         3726        37.25
##     imdavscore07 year population medianincome econinactive houseprice
## 201         13.1 2006     153700        22400         22.2     249999
## 202         13.1 2007     154500        22900         25.4     290000
## 203         13.1 2008     156000        24050         23.4     285000
## 204         13.1 2009     157300        25200         21.6     270000
## 205         13.1 2010     158600        25600         22.4     300000
## 206         13.1 2011     160400        24800         25.9     300000
##     nGPregistration punder16 pall      mincome log10price logitpall
## 201            3622     16.1 15.2 -0.330229493   5.397938 -1.719000
## 202            3721     16.8 16.1 -0.259822636   5.462398 -1.650806
## 203            3700     16.2 15.7 -0.097886866   5.454845 -1.680721
## 204            3576     16.1 15.8  0.064048904   5.431364 -1.673185
## 205            3632     15.0 14.9  0.120374389   5.477121 -1.742466
## 206            3950     13.6 13.8  0.007723419   5.477121 -1.832002
##     logitinactive
## 201     -1.254049
## 202     -1.077391
## 203     -1.185861
## 204     -1.289131
## 205     -1.242506
## 206     -1.051173
colnames(childpoverty)
##  [1] "id"              "Areacode"        "LA"              "Areaname"       
##  [5] "Areahectares"    "landareaSqKm"    "imdavscore07"    "year"           
##  [9] "population"      "medianincome"    "econinactive"    "houseprice"     
## [13] "nGPregistration" "punder16"        "pall"            "mincome"        
## [17] "log10price"      "logitpall"       "logitinactive"
u <- aggregate.data.frame(childpoverty[, c(7, 9:15)], by=list(id=childpoverty$id), FUN = mean)
head(lainfo)
##   id  Areacode             Areaname  hectares Abbrevname  centerx  centery
## 1  0 E09000021 Kingston upon Thames  3726.117   Kingston 519297.6 166820.0
## 2  1 E09000008              Croydon  8649.441    Croydon 533290.2 163541.2
## 3  2 E09000006              Bromley 15013.487    Bromley 542895.5 165655.5
## 4  3 E09000018             Hounslow  5658.541   Hounslow 513515.5 175643.2
## 5  4 E09000009               Ealing  5554.428     Ealing 515887.9 181715.5
## 6  5 E09000016             Havering 11445.735   Havering 554049.0 187392.0
lonfixdata <- merge(lainfo, u)
head(lonfixdata)
##   id  Areacode             Areaname hectares Abbrevname  centerx  centery
## 1  0 E09000021 Kingston upon Thames 3726.117   Kingston 519297.6 166820.0
## 2  1 E09000008              Croydon 8649.441    Croydon 533290.2 163541.2
## 3 10 E09000022              Lambeth 2724.940    Lambeth 530844.7 174356.4
## 4 11 E09000028            Southwark 2991.340  Southwark 533820.6 176665.8
## 5 12 E09000023             Lewisham 3531.706   Lewisham 537670.4 173980.7
## 6 13 E09000011            Greenwich 5044.190  Greenwich 542908.1 176875.7
##   imdavscore07 population medianincome econinactive houseprice nGPregistration
## 1        13.10   161472.5        25265        22.95   314599.9          3792.0
## 2        21.31   360603.1        22310        21.44   233495.0          6211.6
## 3        34.94   301793.1        23910        20.27   323925.0          7454.5
## 4        33.33   287420.1        23375        25.18   329775.0          7886.7
## 5        31.04   276252.5        22315        23.27   253029.8          5513.4
## 6        33.94   252620.3        21295        25.64   256749.0          6006.2
##   punder16  pall
## 1    14.25 14.04
## 2    24.73 23.95
## 3    31.77 31.63
## 4    31.25 31.11
## 5    30.51 30.02
## 6    29.97 29.48
head(lainfo)
##   id  Areacode             Areaname  hectares Abbrevname  centerx  centery
## 1  0 E09000021 Kingston upon Thames  3726.117   Kingston 519297.6 166820.0
## 2  1 E09000008              Croydon  8649.441    Croydon 533290.2 163541.2
## 3  2 E09000006              Bromley 15013.487    Bromley 542895.5 165655.5
## 4  3 E09000018             Hounslow  5658.541   Hounslow 513515.5 175643.2
## 5  4 E09000009               Ealing  5554.428     Ealing 515887.9 181715.5
## 6  5 E09000016             Havering 11445.735   Havering 554049.0 187392.0

lonfixdata <- lonfixdata [order(as.numeric(lonfixdata$id)), ]
head(lonfixdata)
##    id  Areacode             Areaname  hectares Abbrevname  centerx  centery
## 1   0 E09000021 Kingston upon Thames  3726.117   Kingston 519297.6 166820.0
## 2   1 E09000008              Croydon  8649.441    Croydon 533290.2 163541.2
## 13  2 E09000006              Bromley 15013.487    Bromley 542895.5 165655.5
## 24  3 E09000018             Hounslow  5658.541   Hounslow 513515.5 175643.2
## 28  4 E09000009               Ealing  5554.428     Ealing 515887.9 181715.5
## 29  5 E09000016             Havering 11445.735   Havering 554049.0 187392.0
##    imdavscore07 population medianincome econinactive houseprice nGPregistration
## 1         13.10   161472.5        25265        22.95   314599.9          3792.0
## 2         21.31   360603.1        22310        21.44   233495.0          6211.6
## 13        14.36   311275.7        24970        20.33   283400.0          1914.2
## 24        23.20   250287.0        21425        22.40   269389.5          8796.2
## 28        25.10   332975.9        21355        25.19   307380.0         11921.5
## 29        16.07   237488.5        22230        21.60   228175.0          1157.8
##    punder16  pall
## 1     14.25 14.04
## 2     24.73 23.95
## 13    16.65 15.98
## 24    24.53 24.33
## 28    25.01 25.12
## 29    19.05 18.08
colnames(lonfixdata)
##  [1] "id"              "Areacode"        "Areaname"        "hectares"       
##  [5] "Abbrevname"      "centerx"         "centery"         "imdavscore07"   
##  [9] "population"      "medianincome"    "econinactive"    "houseprice"     
## [13] "nGPregistration" "punder16"        "pall"
b <- c(0, 20, 25, 30, 40, 100)
labss <- c("< 20", "20-25", "25-30", "30-40", "> 40")
lonfixdata$cutpall <- factor(cut(lonfixdata$pall, breaks=b,  labels=labss))
lonfixdata$cutpunder16 <- factor(cut(lonfixdata$punder16, breaks=b,  labels=labss))

bdf <- merge(adf, lonfixdata, by="id")
head(bdf)
##   id     long      lat order  hole piece group  Areacode             Areaname
## 1  0 516401.6 160201.8     1 FALSE     1   0.1 E09000021 Kingston upon Thames
## 2  0 516407.3 160210.5     2 FALSE     1   0.1 E09000021 Kingston upon Thames
## 3  0 516413.3 160217.4     3 FALSE     1   0.1 E09000021 Kingston upon Thames
## 4  0 516419.9 160224.2     4 FALSE     1   0.1 E09000021 Kingston upon Thames
## 5  0 516427.9 160231.3     5 FALSE     1   0.1 E09000021 Kingston upon Thames
## 6  0 516453.8 160253.0     6 FALSE     1   0.1 E09000021 Kingston upon Thames
##   hectares Abbrevname  centerx centery imdavscore07 population medianincome
## 1 3726.117   Kingston 519297.6  166820         13.1   161472.5        25265
## 2 3726.117   Kingston 519297.6  166820         13.1   161472.5        25265
## 3 3726.117   Kingston 519297.6  166820         13.1   161472.5        25265
## 4 3726.117   Kingston 519297.6  166820         13.1   161472.5        25265
## 5 3726.117   Kingston 519297.6  166820         13.1   161472.5        25265
## 6 3726.117   Kingston 519297.6  166820         13.1   161472.5        25265
##   econinactive houseprice nGPregistration punder16  pall cutpall cutpunder16
## 1        22.95   314599.9            3792    14.25 14.04    < 20        < 20
## 2        22.95   314599.9            3792    14.25 14.04    < 20        < 20
## 3        22.95   314599.9            3792    14.25 14.04    < 20        < 20
## 4        22.95   314599.9            3792    14.25 14.04    < 20        < 20
## 5        22.95   314599.9            3792    14.25 14.04    < 20        < 20
## 6        22.95   314599.9            3792    14.25 14.04    < 20        < 20

# 1 = City of London, 2 = Tower Hamlets,
# 3 = Islington, 4 = Camden, 5 = Westminster, 6 = Kensington and Chelsea,
# 7 = Hammersmith and Fulham.

londonframe <- ggplot(data=adf, aes(x=long, y=lat, group = group, fill="White")) +
  # scale_fill_gradientn(colours=com, na.value="black",limits=range(bdf$no2))  +
  geom_path(colour='red',size=0.5) +
  geom_text(data=lainfo, inherit.aes=F, fontface="bold",  aes(label =Abbrevname, x=centerx, y=centery),  size = 3) +
  coord_equal()  +
  theme_bw()+ # theme(text=element_text(family="Times")) +
  labs(title= "A map of 33 boroughs of London",  x="", y = "", size=3) +
  theme(axis.text.x = element_blank(), axis.text.y = element_blank(),axis.ticks = element_blank())   +
  ggsn::scalebar(data=adf, dist =5, location = "topleft", transform=F, dist_unit = "km",
                 st.dist = .05, st.size =4, height = .06, st.bottom=T)
londonframe


ggsave(filename=paste0(figurepath, "london_frame.png"))
## Saving 7 x 5 in image


colnames(childpoverty)
##  [1] "id"              "Areacode"        "LA"              "Areaname"       
##  [5] "Areahectares"    "landareaSqKm"    "imdavscore07"    "year"           
##  [9] "population"      "medianincome"    "econinactive"    "houseprice"     
## [13] "nGPregistration" "punder16"        "pall"            "mincome"        
## [17] "log10price"      "logitpall"       "logitinactive"
dp1 <- childpoverty[, c(16, 17, 19, 18)]


p3 <- dp1 %>% ggpairs(., # mapping = ggplot2::aes(colour="blue"),
            lower = list(continuous = wrap("smooth", alpha = 0.3, size=0.1),
                         discrete = "blank", combo = wrap("box_no_facet", alpha=0.5)),
            diag = list(discrete="barDiag",
                        continuous = wrap("densityDiag", alpha=0.5 )),
            upper = list(combo = "blank", discrete = "blank", continuous = wrap("cor", family="sans", size=4, align_percent=0.5))) +
  theme(panel.grid.major = element_blank())
p3

ggsave(filename=paste0(figurepath, "pairs_logit.png"))
## Saving 7 x 5 in image

2 Code for modeling


colnames(childpoverty)
##  [1] "id"              "Areacode"        "LA"              "Areaname"       
##  [5] "Areahectares"    "landareaSqKm"    "imdavscore07"    "year"           
##  [9] "population"      "medianincome"    "econinactive"    "houseprice"     
## [13] "nGPregistration" "punder16"        "pall"            "mincome"        
## [17] "log10price"      "logitpall"       "logitinactive"
# data transform
childpoverty$mincome <- (childpoverty$medianincome - mean(childpoverty$medianincome))/sd(childpoverty$medianincome)
childpoverty$logitpall <- log(childpoverty$pall/(100-childpoverty$pall))
childpoverty$logitinactive <- log(childpoverty$econinactive/(100-childpoverty$econinactive))
childpoverty$log10price <- log10(childpoverty$houseprice)
childpoverty$id <- as.numeric(childpoverty$id)
childpoverty <- childpoverty[order(childpoverty$id, childpoverty$year), ]

N <- 120000
burn.in <- 20000
thin <- 100

#N <- 220000
#burn.in <- 20000
#thin <- 100

#N <- 2000
#burn.in <- 1000
#thin <- 10


childpoverty$yt <- (childpoverty$year - mean(2006:2015))/sd(2006:2015)
f1 <- logitpall ~   mincome + logitinactive + log10(houseprice) + yt
f2 <- logitpall ~   mincome + logitinactive + log10(houseprice)


m01  <- Bcartime(formula  = f1, data =childpoverty,   family ="gaussian",
                 model="glm",  package="CARBayes",
                 N=N, burn.in = burn.in, thin=thin, verbose=FALSE)
## ##
## # Total time taken:: 9.81  - Sec.

summary(m01)
## 
##  The  glm  model has been fitted using the  CARBayes  package.
## Call:
## Bcartime(formula = f1, data = childpoverty, family = "gaussian", 
##     package = "CARBayes", model = "glm", N = N, burn.in = burn.in, 
##     thin = thin, verbose = FALSE)
## ##
## # Total time taken:: 9.81  - Sec.
## Model formula
## logitpall ~ mincome + logitinactive + log10(houseprice) + yt
## 
## 
## Parameter Estimates:
##                   Median   2.5%  97.5% n.sample % accept n.effective
## (Intercept)       -6.260 -8.093 -4.246     1000      100        1000
## mincome           -0.342 -0.398 -0.291     1000      100        1000
## logitinactive      1.086  0.934  1.248     1000      100        1000
## log10(houseprice)  1.162  0.788  1.494     1000      100        1000
## yt                -0.118 -0.164 -0.070     1000      100        1000
## nu2                0.122  0.104  0.144     1000      100        1000
##                   Geweke.diag
## (Intercept)               0.1
## mincome                  -0.7
## logitinactive             0.0
## log10(houseprice)        -0.1
## yt                        0.7
## nu2                      -0.5
## 
## Model Choice Statistics:
##           DIC           p.d          WAIC           p.w          LMPL 
##       235.902         5.879       235.484         5.320      -117.751 
## loglikelihood 
##      -112.072

m12  <- Bcartime(formula  = f1, data =childpoverty,  W = Wlon, family ="gaussian",
                 model="anova",  package="CARBayesST",  interaction=T,
                 scol="id", tcol="year", N=N,
                 burn.in = burn.in, thin=thin, verbose=FALSE)
## ##
## # Total time taken:: 45.63  - Sec.
summary(m12)
## 
##  The  anova  model has been fitted using the  CARBayesST  package.
## Call:
## Bcartime(formula = f1, data = childpoverty, family = "gaussian", 
##     scol = "id", tcol = "year", package = "CARBayesST", model = "anova", 
##     W = Wlon, interaction = T, N = N, burn.in = burn.in, thin = thin, 
##     verbose = FALSE)
## ##
## # Total time taken:: 45.63  - Sec.
## Model formula
## logitpall ~ mincome + logitinactive + log10(houseprice) + yt
## 
## 
## Parameter Estimates:
##                   Median   2.5%  97.5% n.sample % accept n.effective
## (Intercept)        0.249 -2.228  2.520     1000    100.0      1156.4
## mincome           -0.245 -0.295 -0.199     1000    100.0      1000.0
## logitinactive      0.727  0.550  0.888     1000    100.0      1000.0
## log10(houseprice) -0.098 -0.515  0.356     1000    100.0      1158.1
## yt                -0.084 -0.125 -0.049     1000    100.0      1000.0
## tau2.S             0.073  0.046  0.129     1000    100.0      1000.0
## tau2.T             0.219  0.114  0.537     1000    100.0       872.0
## nu2                0.066  0.056  0.078     1000    100.0      1000.0
## rho.S              0.813  0.504  0.979     1000     47.8      1000.0
## rho.T              0.460  0.040  0.900     1000     88.8      1136.7
##                   Geweke.diag
## (Intercept)               0.8
## mincome                   2.3
## logitinactive            -1.2
## log10(houseprice)        -0.9
## yt                       -0.7
## tau2.S                   -0.3
## tau2.T                    0.0
## nu2                       0.4
## rho.S                     0.0
## rho.T                    -1.5
## 
## Model Choice Statistics:
##           DIC           p.d          WAIC           p.w          LMPL 
##        51.738        39.208        50.302        34.144       -25.626 
## loglikelihood 
##        13.338

m13  <- Bcartime(formula  = f1, data =childpoverty,  W = Wlon, family ="gaussian",
                  model="adaptive",  package="CARBayesST",
                  scol="id", tcol="year", N=N,
                  burn.in = burn.in, thin=thin, verbose=FALSE)
## ##
## # Total time taken:: 3  - Mins. 45.06  - Sec.
summary(m13)
## 
##  The  adaptive  model has been fitted using the  CARBayesST  package.
## Call:
## Bcartime(formula = f1, data = childpoverty, family = "gaussian", 
##     scol = "id", tcol = "year", package = "CARBayesST", model = "adaptive", 
##     W = Wlon, N = N, burn.in = burn.in, thin = thin, verbose = FALSE)
## ##
## # Total time taken:: 3  - Mins. 45.06  - Sec.
## Model formula
## logitpall ~ mincome + logitinactive + log10(houseprice) + yt
## 
## 
## Parameter Estimates:
##                   Median   2.5%   97.5% n.sample % accept n.effective
## (Intercept)       -1.027 -3.553   1.888     1000    100.0      1000.0
## mincome           -0.268 -0.324  -0.210     1000    100.0      1000.0
## logitinactive      0.623  0.432   0.802     1000    100.0      1000.0
## log10(houseprice)  0.110 -0.404   0.565     1000    100.0      1000.0
## yt                -0.104 -0.151  -0.059     1000    100.0      1000.0
## nu2                0.043  0.034   0.054     1000    100.0      1000.0
## tau2               0.063  0.045   0.087     1000    100.0       864.1
## rho.S              0.967  0.928   0.988     1000     45.8      1000.0
## rho.T              0.054  0.002   0.214     1000    100.0       890.0
## tau2.w            72.474 32.658 141.337     1000    100.0       693.9
##                   Geweke.diag
## (Intercept)              -1.0
## mincome                   0.0
## logitinactive            -2.7
## log10(houseprice)         0.8
## yt                       -1.6
## nu2                       0.6
## tau2                      1.2
## rho.S                    -2.3
## rho.T                    -0.7
## tau2.w                   -0.5
## 
## Model Choice Statistics:
##           DIC           p.d          WAIC           p.w          LMPL 
##       -20.614       124.904       -38.726        85.289         7.214 
## loglikelihood 
##       135.212


m14  <- Bcartime(formula  = f1, data =childpoverty,  W = Wlon, family ="gaussian",
                 model="ar",  package="CARBayesST",  interaction=T,
                 scol="id", tcol="year", N=N,
                 burn.in = burn.in, thin=thin, verbose=FALSE)
## ##
## # Total time taken:: 30.27  - Sec.



summary(m14)
## 
##  The  ar  model has been fitted using the  CARBayesST  package.
## Call:
## Bcartime(formula = f1, data = childpoverty, family = "gaussian", 
##     scol = "id", tcol = "year", package = "CARBayesST", model = "ar", 
##     W = Wlon, interaction = T, N = N, burn.in = burn.in, thin = thin, 
##     verbose = FALSE)
## ##
## # Total time taken:: 30.27  - Sec.
## Model formula
## logitpall ~ mincome + logitinactive + log10(houseprice) + yt
## 
## 
## Parameter Estimates:
##                   Median   2.5%  97.5% n.sample % accept n.effective
## (Intercept)       -0.980 -3.460  1.654     1000    100.0      1000.0
## mincome           -0.265 -0.323 -0.213     1000    100.0      1000.0
## logitinactive      0.621  0.425  0.788     1000    100.0      1000.0
## log10(houseprice)  0.107 -0.373  0.567     1000    100.0      1000.0
## yt                -0.104 -0.146 -0.055     1000    100.0      1127.9
## tau2               0.067  0.048  0.094     1000    100.0      1000.0
## nu2                0.044  0.036  0.055     1000    100.0      1000.0
## rho.S              0.967  0.924  0.988     1000     45.2      1063.8
## rho.T              0.055  0.002  0.228     1000    100.0      1000.0
##                   Geweke.diag
## (Intercept)              -1.6
## mincome                  -2.0
## logitinactive             0.3
## log10(houseprice)         1.7
## yt                       -0.7
## tau2                      0.3
## nu2                       0.5
## rho.S                    -0.1
## rho.T                    -0.8
## 
## Model Choice Statistics:
##           DIC           p.d          WAIC           p.w          LMPL 
##       -14.126       118.934       -28.760        83.532         2.249 
## loglikelihood 
##       125.997

## Tables 11.4 and 11.5

calerrss <- function(modfit) {
# fits <- modfit$fitted.values
# errorss <- sum((childpoverty$logitpall - fits)^2)
# errorss ## Logit scale
x <- modfit$samples
y <- 100*exp(x$fitted)/(1+exp(x$fitted))
dim(y)
pfits <- apply(y, 2, mean)
length(pfits)
u <- childpoverty$pall - pfits
sqrt(errss <- sum(u^2)/330)
}

sd(childpoverty$pall)
## [1] 9.559422
ss01 <- calerrss(m01$fit)
ss01
## [1] 6.320632
ss12 <- calerrss(m12$fit)
ss12
## [1] 4.260659
ss13 <- calerrss(m13$fit)
ss13
## [1] 2.647707
ss14 <- calerrss(m14$fit)
ss14
## [1] 2.767647
# m12 ANOVA
# m14 AR
# m13 Adaptive


u <- rbind(m01$mchoice, m12$mchoice, m14$mchoice, m13$mchoice)
u <- u[, 1:4]
u
##            DIC        p.d      WAIC       p.w
## [1,] 235.90205   5.878743 235.48423  5.320421
## [2,]  51.73850  39.207523  50.30171 34.144182
## [3,] -14.12644 118.934056 -28.76045 83.531758
## [4,] -20.61420 124.904497 -38.72571 85.289259
class(u)
## [1] "matrix" "array"
a <- c(ss01, ss12, ss14, ss13)
table11.4 <- data.frame(u[, c(2, 1, 4, 3)], rmse=a)
dput(table11.4, file=paste0(filepath, "table11.4.txt"))

table11.5 <- m13$params
dput(table11.5, file=paste0(filepath, "table11.5.txt"))

modfit <- m13$fit
names(modfit)
##  [1] "summary.results"     "samples"             "fitted.values"      
##  [4] "residuals"           "modelfit"            "accept"             
##  [7] "localised.structure" "formula"             "model"              
## [10] "X"
colnames(childpoverty)
##  [1] "id"              "Areacode"        "LA"              "Areaname"       
##  [5] "Areahectares"    "landareaSqKm"    "imdavscore07"    "year"           
##  [9] "population"      "medianincome"    "econinactive"    "houseprice"     
## [13] "nGPregistration" "punder16"        "pall"            "mincome"        
## [17] "log10price"      "logitpall"       "logitinactive"   "yt"

fits <- modfit$fitted.values
errorss <- sum((childpoverty$logitpall - fits)^2)
errorss ## Logit scale
## [1] 6.875655

names(modfit)
##  [1] "summary.results"     "samples"             "fitted.values"      
##  [4] "residuals"           "modelfit"            "accept"             
##  [7] "localised.structure" "formula"             "model"              
## [10] "X"
x <- modfit$samples
names(x)
## [1] "beta"   "phi"    "rho"    "tau2"   "nu2"    "w"      "fitted"
y <- 100*exp(x$fitted)/(1+exp(x$fitted))
dim(y)
## [1] 1000  330

pfits <- apply(y, 2, mean)
length(pfits)
## [1] 330

fullres <- data.frame(childpoverty[, c("id", "Areaname", "year", "pall")], pfits)
head(fullres)
##     id             Areaname year pall    pfits
## 201  0 Kingston upon Thames 2006 15.2 19.63309
## 202  0 Kingston upon Thames 2007 16.1 21.04756
## 203  0 Kingston upon Thames 2008 15.7 20.80568
## 204  0 Kingston upon Thames 2009 15.8 16.31415
## 205  0 Kingston upon Thames 2010 14.9 15.56187
## 206  0 Kingston upon Thames 2011 13.8 17.76220

mean(fullres$pfits[fullres$id==26])
## [1] 43.17342


lafits <- tapply(pfits, childpoverty$id, FUN=mean)
lafits
##        0        1        2        3        4        5        6        7 
## 16.51804 21.26114 16.63437 22.46444 24.65485 19.90265 21.76556 20.65532 
##        8        9       10       11       12       13       14       15 
## 26.35972 22.06127 29.65368 30.89916 30.82941 28.74513 21.10275 29.58257 
##       16       17       18       19       20       21       22       23 
## 27.28293 22.66896 16.28389 12.01424 20.73031 20.72185 27.67475 28.82767 
##       24       25       26       27       28       29       30       31 
## 34.56025 34.39132 43.17342 37.40044 38.25495 34.16961 36.03272 32.47360 
##       32 
## 15.27744
res <- data.frame(lonfixdata[, c("id", "Areaname", "pall")], fits=lafits)


colnames(res)
## [1] "id"       "Areaname" "pall"     "fits"
# seeres <- res[, c("id", "Areaname", "pall", "fits", "sd", "low", "up")]
# seeres

## find la means

acodes <- childpoverty$id
library(doBy)
callameans <- function(x, lapattern=acodes) {
  u <- data.frame(x=x, lacodes=lapattern)
  v <- summaryBy(x~lacodes, data=u)
  as.vector(v$x.mean)
}
u <- callameans(y[1,])
u
##  [1] 16.55167 21.56213 16.95523 22.32532 25.14940 20.44220 21.69079 20.30508
##  [9] 26.27977 20.66385 28.50892 29.46138 31.49624 29.16358 20.99575 28.78621
## [17] 29.15046 24.38189 16.35726 12.25856 20.41130 21.36683 26.61867 27.99601
## [25] 35.12074 33.52926 43.62984 36.92885 39.46274 35.48965 35.80175 31.66643
## [33] 14.08150
a <- apply(y, 1, callameans)
dim(a)
## [1]   33 1000
lafits <- apply(a, 1, mean)
summary(lafits)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.01   20.73   26.36   26.21   30.90   43.17
lalims <-  apply(a, 1, FUN=quantile, probs=c(0.025, 0.975))
lalims
##           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
## 2.5%  15.05716 19.52701 15.23392 20.90580 22.87180 18.38590 20.26256 18.93596
## 97.5% 18.01118 23.01347 18.04198 24.03797 26.42565 21.60674 23.43514 22.39419
##           [,9]    [,10]    [,11]    [,12]    [,13]    [,14]    [,15]    [,16]
## 2.5%  24.59906 20.25581 27.49846 28.66139 28.67832 26.96024 19.40374 27.47540
## 97.5% 28.24810 23.97672 31.76133 33.14261 32.97958 30.48123 22.92177 31.73856
##          [,17]    [,18]    [,19]    [,20]    [,21]    [,22]    [,23]    [,24]
## 2.5%  25.52488 20.82629 14.92230 10.89133 19.05183 19.04312 25.56244 26.79243
## 97.5% 29.15129 24.52287 17.74039 13.17237 22.65770 22.55394 29.84278 30.90785
##          [,25]    [,26]    [,27]    [,28]    [,29]    [,30]    [,31]    [,32]
## 2.5%  32.40247 32.25738 40.88652 34.98639 35.90501 31.97256 33.76040 30.18432
## 97.5% 37.01251 36.61264 45.38455 39.74407 40.64608 36.43148 38.44218 34.84163
##          [,33]
## 2.5%  13.69458
## 97.5% 17.01549
sdfit <-  apply(a, 1, sd)

#head(lainfo)

fits <- data.frame(id=as.character(0:32), fits=lafits, sd=sdfit, low=lalims[1, ], up=lalims[2, ] )
head(fits)
##   id     fits        sd      low       up
## 1  0 16.51804 0.7583289 15.05716 18.01118
## 2  1 21.26114 0.9137625 19.52701 23.01347
## 3  2 16.63437 0.6991910 15.23392 18.04198
## 4  3 22.46444 0.8009900 20.90580 24.03797
## 5  4 24.65485 0.9407411 22.87180 26.42565
## 6  5 19.90265 0.8139314 18.38590 21.60674
colnames(lonfixdata)
##  [1] "id"              "Areacode"        "Areaname"        "hectares"       
##  [5] "Abbrevname"      "centerx"         "centery"         "imdavscore07"   
##  [9] "population"      "medianincome"    "econinactive"    "houseprice"     
## [13] "nGPregistration" "punder16"        "pall"            "cutpall"        
## [17] "cutpunder16"
res <- merge(lonfixdata, fits)
class(lonfixdata$id)
## [1] "character"
class(fits$id)
## [1] "character"
# head(res)
dim(res)
## [1] 33 21
res <- res[order(as.numeric(res$id)), ]
colnames(res)
##  [1] "id"              "Areacode"        "Areaname"        "hectares"       
##  [5] "Abbrevname"      "centerx"         "centery"         "imdavscore07"   
##  [9] "population"      "medianincome"    "econinactive"    "houseprice"     
## [13] "nGPregistration" "punder16"        "pall"            "cutpall"        
## [17] "cutpunder16"     "fits"            "sd"              "low"            
## [21] "up"
seeres <- res[, c("id", "Areaname", "pall", "fits", "sd", "low", "up")]


b <- c(0, 20, 25, 30, 40, 100)
labss <- c("< 20", "20-25", "25-30", "30-40", "> 40")
res$cutpfits <- factor(cut(res$fits, breaks=b,  labels=labss))
table(res$cutpfits)
## 
##  < 20 20-25 25-30 30-40  > 40 
##     6    10     7     9     1
table(res$cutpall)
## 
##  < 20 20-25 25-30 30-40  > 40 
##     9     6     5    12     1

dim(x$w)
## [1] 1000   68
head(x$w[, 1:10])
## Markov Chain Monte Carlo (MCMC) output:
## Start = 1 
## End = 7 
## Thinning interval = 1 
##            2.3       4.5       4.7        5.7       5.8       7.8         5.9
## [1,] 0.9989650 0.9999857 0.9999961 0.99999840 0.9915676 0.9999985 0.009729045
## [2,] 0.9872969 0.9999957 0.9998559 0.96805343 0.9991653 0.9999974 0.999970850
## [3,] 0.9039917 0.9999561 0.9999992 0.99999742 0.9999723 0.9999995 0.999997442
## [4,] 0.9999964 0.9999962 0.9998972 0.08713447 0.9147984 0.9979626 0.998915643
## [5,] 0.9999729 0.9999992 0.9999213 0.99981437 0.4226217 0.9999872 0.949751818
## [6,] 0.9992246 0.9999767 0.9979245 0.99998143 0.9972251 0.9997651 0.992438208
## [7,] 0.9996779 0.9999630 0.9936820 0.98865416 0.9998494 0.9999119 0.999788793
##            8.9      8.10      9.10
## [1,] 0.9999767 0.9999984 0.9998465
## [2,] 0.4653634 0.9888685 0.9754024
## [3,] 0.9663152 0.9968180 0.9999993
## [4,] 0.9995880 0.9999389 0.5732566
## [5,] 0.9995084 0.9995878 0.9998910
## [6,] 0.9999972 0.9998667 0.9999991
## [7,] 0.9999996 0.9999959 0.9999545
colnames(x$w)
##  [1] "2.3"   "4.5"   "4.7"   "5.7"   "5.8"   "7.8"   "5.9"   "8.9"   "8.10" 
## [10] "9.10"  "2.11"  "3.11"  "3.12"  "11.12" "3.13"  "12.13" "3.14"  "13.14"
## [19] "3.15"  "14.15" "10.16" "16.17" "6.18"  "17.18" "1.19"  "2.19"  "1.20" 
## [28] "4.20"  "1.21"  "2.21"  "11.21" "19.21" "1.22"  "11.22" "20.22" "21.22"
## [37] "4.23"  "5.23"  "9.23"  "9.24"  "23.24" "9.25"  "24.25" "9.26"  "10.26"
## [46] "25.26" "26.28" "17.29" "27.29" "28.29" "10.30" "16.30" "17.30" "26.30"
## [55] "28.30" "29.30" "17.31" "18.31" "27.31" "29.31" "6.32"  "18.32" "31.32"
## [64] "25.33" "26.33" "27.33" "28.33" "29.33"
y <- modfit$localised.structure
names(y)
## [1] "Wmedian" "W99"
Wmed <- modfit$localised.structure$Wmedian
W99 <- modfit$localised.structure$W99

border.locations <- W99
boundary.final <- highlight.borders(border.locations=W99,
                                    spdata=london)

boundary.final
## class       : SpatialPoints 
## features    : 13052 
## extent      : 507187.7, 549856.1, 159658.1, 198211.5  (xmin, xmax, ymin, ymax)
## crs         : NA
hbs <- as.data.frame(boundary.final)
head(hbs)
##          X        Y
## 1 539596.3 160796.3
## 2 539595.5 160800.1
## 3 539592.8 160816.1
## 4 539590.5 160832.6
## 5 539588.5 160866.6
## 6 539597.4 160883.8
dim(hbs)
## [1] 13052     2

justres <- res[, c("Areacode", "fits", "cutpfits" )]
head(justres)
##     Areacode     fits cutpfits
## 1  E09000021 16.51804     < 20
## 2  E09000008 21.26114    20-25
## 13 E09000006 16.63437     < 20
## 24 E09000018 22.46444    20-25
## 28 E09000009 24.65485    20-25
## 29 E09000016 19.90265     < 20

bdf <- merge(adf, lonfixdata, by="id")
# head(bdf)
ddf <- merge(bdf, justres, by="Areacode")
head(ddf)
##    Areacode id     long      lat order  hole piece group       Areaname
## 1 E09000001 32 531145.1 180782.1     1 FALSE     1  32.1 City of London
## 2 E09000001 32 531143.8 180799.3     2 FALSE     1  32.1 City of London
## 3 E09000001 32 531143.5 180844.6     3 FALSE     1  32.1 City of London
## 4 E09000001 32 531144.3 180858.7     4 FALSE     1  32.1 City of London
## 5 E09000001 32 531145.6 180880.9     5 FALSE     1  32.1 City of London
## 6 E09000001 32 531142.8 180890.6     6 FALSE     1  32.1 City of London
##   hectares Abbrevname  centerx  centery imdavscore07 population medianincome
## 1  314.942          1 532479.6 181271.8        12.84       7656        57005
## 2  314.942          1 532479.6 181271.8        12.84       7656        57005
## 3  314.942          1 532479.6 181271.8        12.84       7656        57005
## 4  314.942          1 532479.6 181271.8        12.84       7656        57005
## 5  314.942          1 532479.6 181271.8        12.84       7656        57005
## 6  314.942          1 532479.6 181271.8        12.84       7656        57005
##   econinactive houseprice nGPregistration punder16 pall cutpall cutpunder16
## 1        38.33     510285           218.6    15.22 15.9    < 20        < 20
## 2        38.33     510285           218.6    15.22 15.9    < 20        < 20
## 3        38.33     510285           218.6    15.22 15.9    < 20        < 20
## 4        38.33     510285           218.6    15.22 15.9    < 20        < 20
## 5        38.33     510285           218.6    15.22 15.9    < 20        < 20
## 6        38.33     510285           218.6    15.22 15.9    < 20        < 20
##       fits cutpfits
## 1 15.27744     < 20
## 2 15.27744     < 20
## 3 15.27744     < 20
## 4 15.27744     < 20
## 5 15.27744     < 20
## 6 15.27744     < 20
com <- c("lightyellow2", "lightyellow3" , "yellow2", "firebrick2", "purple")
lonfitmap <- ggplot(data=ddf, aes(x=long, y=lat, group = group, fill=cutpfits)) +
  geom_polygon(colour='black',size=0.25) +
  geom_point(data=hbs, aes(x=X, y=Y), inherit.aes = FALSE, size = 0.2, colour = "blue") +
  geom_text(data=lainfo, inherit.aes=F, fontface="bold",  aes(label =Abbrevname, x=centerx, y=centery),  size = 3) +
  scale_fill_manual(values =com, guides("Percentage"), guide = guide_legend(reverse=TRUE)) +
  theme_bw()+theme(text=element_text(family="Times")) +
  coord_equal()  +
  theme(legend.position = c(0.9, 0.25)) +
  labs(title= "Fitted percentage of children living in poverty, 2006-15",  x="", y = "", size=2.5) +
  theme(axis.text.x = element_blank(), axis.text.y = element_blank(),axis.ticks = element_blank())
lonfitmap


ggsave(filename=paste0(figurepath, "fitted_london_poverty.png"))
## Saving 7 x 5 in image

Further examples of areal data modeling: Child poverty in London

Sujit K. Sahu

2021-08-11

1 Code to reproduce Figure11.8

2 Code for modeling