THIS README IS FOR THE DATA/MODELS/MINIMUM_CONTRAST DIRECTORY.

THIS DIRECTORY CONTAINS TWO R SCRIPTS WHERE THE FIRST R SCRIPT PRODUCES THE DATA FOR THE 2014 SOCIO-ECONOMIC VARIABLES (POPULATION AND AVERAGE INCOME) ON THE LOS ANGELES-ONLY CENSUS TRACTS (AS FOR THE 2015 DATA IN THE DATA/RAW_DATA/COVARIATES DIRECTORY). WE THEN USE THESE TO IMPLEMENT THE minimum.contrast FUNCTION FROM THE rLGCP R PACKAGE WITH USING BOTH THE RIPLEY'S K AND PAIR CORRELATION FUNCTION WITH DIFFERENT TRANSFORMATIONS OF THESE FUNCTIONS TO GET DIFFERENT ESTIMATE OF THE RANGE  AND VARIANCE THAT MINIMISE THE SQUARED DISCREPANCY BETWEEN THE EMPIRICAL AND THEORETICAL FORMS OF THESE FUNCTIONS. THE SECOND ONE GENERATES THE COUNT DATA FOR THE 2014 LOS ANGELES CRIMES ONTO THE RELEVANT GRIDS, NAMELY THE 200m-BY-200m GRIDS FOR BRIEF GLM FITS USING INLA, THE RESULTS OF THIS FINAL R SCRIPT WERE FOR INTEREST AND FOR THE SAKE OF COMPARISON TO THE COVARIATE EFFECTS SIZE AND MAGNITUDE FOR THE GRID-MESH OPTIMISATION IMPLEMENTATION ON THE LOS ANGELES POLYGON.

THE MANIPULATED DATA FILES (CREATED WITHIN THIS DIRECTORY) ARE NOT ALL CONTAINED WITHIN THIS ARCHIVED FOLDER, BUT CAN BE CREATED USING THE R SCRIPTS IN THIS DIRECTORY. IMPORTANTLY, THE COUNT DATA OVER THE CENSUS TRACTS AND GRIDS FOR THE 2014 DATA ARE ARCHIVED, HOWEVER THE SOCIO-ECONOMIC VARIABLES MANIPULATED DATA, OF THE FORM LA_CT*_14_proj.rds, ARE NOT. WHILE NOT ALL THE DATA ARE ARCHIVED WITHIN THIS DIRECTORY, WE  DISCUSS THE NAMING CONVENTIONS AS THEY ARE USED WITHIN THE R SCRIPTS IN THIS DIRECTORY.

- MinimumContrast_final.R: this R script generates the socio-economic variables (total population and average income, with the latter as in CovDataGen_Inc_final.R) for Los Angeles for the year 2014 over the requires Los Angeles City census tracts in order to generate the 2014 census tract count data sets. The count data in combination with the incidents of crime that occur in 2014 were initially used to calculate and minimise the squared discrepancy for the Ripley's K and pair correlation functions using their inhomogenous formulations. Therefore, we keep this code even though these data sets are no longer directly used. Instead we use the separate population and average income data sets on the census tracts (with the income interpolated) that are generated within this R script. These are then used to calculate and minimise the squared discrepancy for the Ripley's K and pair correlation functions using their inhomogenous formulations which requires using the ppm R functions from the spatstat R package to model the points as a point pattern with the population and average income at the census tract level the spatial covariates. The estimate intercept and covariate effects can be used to generate a spatially varying estimate of the mean field for the inhomogeneous Ripley's K and pair correlation function. We consider the identity and log transform for the two functions, and later considered the fourth root when using the Ripley's K function. This is done for both the homicide and motor vehicle thefts point patterns for Los Angeles in 2014.
	-- Outputs: the only outputs from the MinimumContrast_final.R script is:
		--- LA_CTPop_14_proj.rds: the total population from the 2014 ACS data in DATA/RAW_DATA/COVARIATES on the Los Angeles City census tracts.
		--- LA_CTInc_14_0imp_proj.rds: the average income from the 2014 ACS data in DATA/RAW_DATA/COVARIATES on the Los Angeles City census tracts. In this we set any missing data to zero if the total households estimate for that census tract is zero. Then any remaining missing data are interpolated using the average of the neighbouring census tract values. (As in CovDataGen_Inc_final.R).
		--- LA2014CTCountData_proj_Final.rda: this contains the count data for the 2014 crime data in Los Angeles along with the total population and average income over the corresponding census tracts.
		--- LA2014CTSFCountData_proj_Final.rda: this is the same as the census tract count data above but stored as an sf object.

- LA2014CountDataGenandModelRun_final.R: this R script quickly produces a gridded count data frame for the 2014 data, set-up in the same way that the 2015 data is for the Grid-Mesh Optimisation method, where all missing values in the average income variable are treated the same and interpolated, regardless of the corresponding total population estimate for that census tract (this income data is generated within the script - as in CovDataGen_final.R -  while the population income is assumed to have already been generated within the MinimumContrast_final.R script). Then the areal interpolation for the average income uses the proportion of the census tract areas intersected with the grid cells, as in the GMO data set-up. Then we can use the created data-sets to quickly run some INLA models to check the magnitude and directions of the parameter values. While we considered the 2015 data in order to begin investigatin possible parameter values and prior parameterisations, the 2014 data was also interesting to check and compare in general, but did not play a large role in the selection of the parameter values and prior parameterisations (mostly for the fixed effects) for the Grid-Mesh Optimisation method.
	-- Outputs: the only outputs from this R script are:
		--- LA_CTInc_14_imp_proj.rds: the average income from the 2014 ACS data in DATA/RAW_DATA/COVARIATES on the Los Angeles City census tracts. In this we treat all missing data the same even if the estimated total households in the census tracts are zero. (As generated in CovDataGen_final.R).
		--- LA2014CT236359CountData_proj.rda: this is the output count data for both homicides (hom_countdf) and motor vehicle thefts (gta_countdf) on a discretiation grid with dimensions 236 x 259 (0.2km) over the Los Angeles study region with socio-economic variables population and average income interpolated. Population is interpolated using the proportion of the area of the census tract intersected with each grid cell with a similar approach for the average income variable. Additionally, the missing data in the average income is treated the same and interpolated with the average of the values for the neighbouring censs tracts even though some of the missing data values may correspond to census tracts with an estimated total population of zero. (These interpolations follow the methodology in the R code to generate the 2015 gridded data in DATA/PROCESSED_DATA/CRIME/COUNT_DATA_GMO/CountDataGen_GMO_final.R)
		--- LA2014CT236359SFCountData_proj.rda: this is the output count data for both homicides (sf_homcount) and motor vehicle thefts (sf_gtacount) as above but saved as class sf.