THIS README IS FOR THE DATA/PROCESSED_DATA/CRIME/COUNT_DATA_GMO DIRECTORY.

THIS DIRECTORY CONTAINS THE CODE TO CREATE COUNT DATA FRAMES FOR LOS ANGELES OVER A GRID WHERE WE INTERPOLATE THE SOCIO-ECONOMIC VARIABLES OVER THESE GRID CELLS FROM THE CENSUS TRACT LEVEL VARIABLES. THE GRID DATA GENERATED IN THIS DIRECTORY IS INTENDED FOR THE USE IN THE GRID-MESH OPTIMISATION METHOD IMPLEMENTATION ON THE LOS ANGELES POLYGON, MAINLY THE SOCIO-ECONOMIC VARIABLES TOTAL POPULATION AND AVERAGE INCOME.

THE MANIPULATED DATA FILES (CREATED WITHIN THIS DIRECTORY) ARE CONTAINED WITHIN THIS ARCHIVED FOLDER.


- CountDataGen_GMO_final.R: takes the census tract level socio-economic variables stored in PROCESSED_DATA/COVARIATES (copied over from RAW_DATA/COVARIATES) and the crime data stored in PROCESSED_DATA/CRIME/POINT_PATTERN (copied over from RAW_DATA/CRIME) to generate the count data over different resolution grids. Importantly, this produces the gridded count data for the creation of the covariates for the Grid-Mesh Optimisation method implemented on the Los Angeles window (Chapter 4) where the creation of the second covariate arises from the average income LA_CTInc_15_imp_proj.rds where all missing data for census tracts are interpolated, regardless of the total household estimates for the census tract. This is then interpolated slightly differently, using the proportion of the area of the census tract contained within the grid cells as the weights for the areal interpolation. This is discussed in more detail in Chapter 4 of my thesis.
	-- Outputs:
		--- LA2015CTXYCountData_proj.rda: this is the output count data for both homicides (hom_countdf) and motor vehicle thefts (gta_countdf) on a discretisation grid with dimensions X x Y over the Los Angeles study region with socio-economic variables interpolated as described and, in particular, the average income treated slightly differently in order to generate the second covariate variables for the Grid-Mesh Optimisation method that is based on the average income but not necessarily the average income variable.
		--- LA2015CTXYSFCountData_proj.rda: this is the output count data for both homicides (sf_homcount) and motor vehicle thefts (sf_gtacount) as above but saved as class sf.
		--- LA_Cov_XY_proj.rda: the data frames for the intersection of the census tracts and grid with resolution X x Y. It contains the id for the census tracts and the grids as well as the area of intersections and the corresponding census tracts and grid cells and the proportions of the areas of intersection with respect to the total census tract and total grid cell areas. this data frame also contains the values of each variable for the corresponding census tracts as well as these values producted with both the proportion of the area of the relevant census tract contained within the relevant grid cell and the proportion of the area of the relevant grid cell contained within the relevant census tract for each intersection. These are then intended to be combined as necessary for the interpolated variables.
		--- LAGridCellsXY_proj.rda: the grid cells for dimension X x Y as sf class at UTM projection.

where X and Y specified to get grid cell widths of approximately: 5km, 2km, 1km, 0.5km and 0.2km using the following number of cells in the x and y direction:
X = 10, 24, 48, 95, 236
Y = 15, 36, 72, 144, 359

The key data set here is the one aggregated over the 236x359 grid, as the socio-economic variables interpolated over this grid are used for the data simulation in the Grid-Mesh Optimisation method for Chapter 4.

NOTE THAT THE OUTPUTS LA2015CTXYCountData_proj.rda AND LA2015CTXYSFCountData_proj.rda CONTAIN ADDITIONAL SOCIO-ECONOMIC VARIABLES HOWEVER, ANY USE OF THESE DATA SETS WITHIN THE THESIS ARE FOCUSSED ONLY ON THE TOTAL POPULATION AND AVERAGE INCOME. THEREFORE, ANY MENTION OF THESE ADDITIONAL VARIABLES IN RELEVANT README DOCUMENTS ARE AS AN ASIDE AND THE R CODE TO INCLUDE THESE ADDITIONAL VARIABLES ARE COMMENTED OUT. HOWEVER, THE CODE AND THE ACCESS TO THESE VARIABLES ARE STILL AVAILABLE IF REQUIRED.