library(sf)
## Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
wards <- sf::st_read("./DataRegression/LondonWards.shp")
## Reading layer `LondonWards' from data source
## `C:\Users\rodri\OneDrive - stud.sbg.ac.at\CDE\2S_SpatialStatistics\Assignments\Assignment4\DataRegression\LondonWards.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 626 features and 73 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 503568.2 ymin: 155850.8 xmax: 561957.5 ymax: 200933.9
## Projected CRS: OSGB36 / British National Grid
ggplot(data = wards) +
geom_sf(aes(fill = AGc_2)) +
scale_fill_viridis_c(direction = -1) +
labs(fill='Average GCSE')
ggplot(data = wards, aes(x = MdA_2013, y = AGc_2)) +
geom_point() +
xlab("Unauthorised Absence in All Schools (%) - 2013") +
ylab("Average GCSE - 2014") +
theme_minimal()
ggplot(data = wards, aes(x = MdA_2013, y = AGc_2)) +
geom_point() +
xlab("Median Age - 2013") +
ylab("Average GCSE - 2014") +
geom_smooth(method=lm , color="red", fill="#69b3a2", se=FALSE) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
model1 <- lm(AGc_2 ~ MdA_2013, data=wards)
summary(model1)
##
## Call:
## lm(formula = AGc_2 ~ MdA_2013, data = wards)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.834 -12.915 -1.517 11.265 65.587
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 234.6639 6.6078 35.51 <2e-16 ***
## MdA_2013 2.6867 0.1904 14.11 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.8 on 624 degrees of freedom
## Multiple R-squared: 0.2419, Adjusted R-squared: 0.2407
## F-statistic: 199.2 on 1 and 624 DF, p-value: < 2.2e-16
The linear regression between Median Age (2013) and the Average GSCE score has a positive slope. When the mean age is 0, the grade is 234.66; 24.19% of the variance in GSCE score is explained by the model. It shows that there’s a positive correlation between these two variables, however, it implies in a causality.
There are more important factors that really have impact in the GSCE score, such as income, absence in class, amount of studied hours. Another thing to take into account is: one student that is one or two year old younger can have a better score than an older one, specially if he/she was more dedicated, studied more, or had better educational/financial support.
Despite having a highly significant model, it implies causality.