THIS README IS FOR THE DATA/PROCESSED_DATA/CRIME/COUNT_DATA_FINAL DIRECTORY, EACH SUB-DIRECTORY WILL ALSO CONTAIN IT'S OWN README FILE.

THIS DIRECTORY CONTAINS TWO R SCRIPTS IN ORDER TO GENERATE COUNT DATA FRAMES FOR EACH CITY OF INTEREST OVER GRIDS WITH DIFFERENT RESOLUTIONS WHERE WE INTERPOLATE THE SOCIO-ECONOMIC VARIABLES OVER THESE GRID CELLS FROM THE CENSUS TRACT LEVEL VARIABLES. THE FIRST R SCRIPT GENERATES THE PROJECTED COUNT DATA OVER THE GRIDS WITH THE INTERPOLATED SOCIO-ECONOMIC VARIABLES WHILE THE SECOND R SCRIPT SCALES AND TRANSFORMS THE DATA SETS PRODUCED AND OUTPUTS THE SHIFTED AND SCALE GRIDDED COUNT DATA AS WELL AS OTHER OUTPUTS SUCH AS MESHES FOR THE MODELLING OF THE CRIME DATA USING THE INLA OR UNIVARIATE AND MULTIVARIATE INLA WITHIN MCMC ALGORITHMS.

THE MANIPULATED DATA FILES (CREATED WITHIN THIS DIRECTORY) ARE CONTAINED WITHIN THIS ARCHIVED FOLDER.

- CountDataGen_final.R: takes the census tract level socio-economic variables stored in PROCESSED_DATA/COVARIATES (copied over from RAW_DATA/COVARIATES) and the crime data stored in PROCESSED_DATA/CRIME/POINT_PATTERN (copied over from RAW_DATA/CRIME) to generate the count data over different resolution grids. This R script produces the count data on different grid resolutions at the projected UTM coordinates. Unlike the count data generated in DATA/PROCESSED_DATA/CIMRE/COUNT_DATA_GMO we use the average income *_CTInc_15_0imp_proj.rds (as with the census tract count data sets), where the missing data that corresponds to census tracts with estimated zero total households is set to zero instead of imputed. Additionally, the interpolation of the average income onto the grids involves using the proportion of the area of the grid cell contained within the census tracts. This is discussed in more detail in Chapter 4 of my thesis.
	-- Outputs:
		--- *QuadXY_projFinal.rda: the quadrats for grid dimension X x Y used to aggregate the point patterns to the count data.
		--- *GridCellsXY_projFinal.rda: the grid cells for dimension X x Y as sf class at UTM projection produced from the quadrats saved in *QuadXY_projFinal.rda.
		--- *_Cov_XY_projFinal.rda: the data frames for the intersection of the census tracts and grid with resolution X x Y. It contains the id for the census tracts and the grids as well as the area of intersections and the corresponding census tracts and grid cells and the proportions of the areas of intersection with respect to the total census tract and total grid cell areas. this data frame also contains the values of each variable for the corresponding census tracts as well as these values producted with both the proportion of the area of the relevant census tract contained within the relevant grid cell and the proportion of the area of the relevant grid cell contained within the relevant census tract for each intersection. These are then intended to be combined as necessary for the interpolated variables.
		--- *2015CTXYCountData_projFinal.rda: the aggregated points on the X x Y grid cells along with the interpolated socio-economic variables as well as the area of the intersection of the grid cells with the city window.
		--- *2015CTXYSFCountData_projFinal.rda: the above data stored as an sf object.

- CountDataGen_Scale_final.R: this R code takes this count data frame as well as the grids used to create it and transforms them so that they are scaled where a unit shiftin the x or y direction relates to 10km shift rather than 1m shift and and then shifting the locations so that the bottom-left of the boundary window of the city's polygon is at (0,0).
	-- Outputs:
		--- *WindowProjScale.rda: scaled and shifted city window so that a unit shift in the x or y direction is a shift in 10km distance and the bottom left corner of the windows bounding box is at the origin, (0,0).
		--- *OrdDFXY_projFinalScale.rda: this is a data frame with a set of indices that allows us to re-order the cells from the discretisation so that we go down the y-axis for each grid cell before incremented across the x-axis for the grid cells.
		--- *CoordXY_projFinalScale.rda: this contains the adjusted coordinates for the intersection of the grid cells and the city window. We adjust the coordinates because, due to the irregularity of the study region, the grid centres produced initially may in fact lie outside the study region. Therefore, we have a function that shifts these particular coordinates into the study region to the closest possible point. These coordinates are still on the projected  UTM coordinates.
		--- *2015CTXYCountData_projFinalScale.rda: this contains the scaled and shifted count data over the grid cell using the count data generated from CountDataGen_final.R, including the adjusted coordinates and then scaling and shifting with the same values for the city's window so that the coordinates will now lie within the transformed city window in *WindowProjScale.rda. These are also re-ordered using the indices in *OrdDFXY_projFinalScale.rda.
		--- *2015CTXYSFCountData_projFinalScale.rda: the same as in *2015CTXYCountData_projFinalScale.rda but stored as an sf object. 
		--- *MeshXY_projFinalScale.rda: this takes in the coordinates from either the homicide or motor vehicle theft count data frames in *2015CTXYCountData_projFinalScale.rda and produces a mesh over city * using these coordinates and matching the maximum mesh edge with the relevant grid cell size for the resolution X x Y.

where X and Y specified to get grid cell widths of approximately: 5km, 2km, 1km and 0.5km (and 0.2km for interest) using the following number of cells in the x and y direction for the different cities:
- LA:
	-- X = 10, 24, 48, 95 (, 236)
	   Y = 15, 36, 72, 144 (, 359)
- NYC:
	-- X = 10, 24, 47, 94 (, 235)
	   Y = 10, 24, 48, 96 (, 239)
- Portland:
	-- X = 8, 19, 38, 76 (, 190)
	   Y = 6, 13, 26, 52 (, 129)

NOTE THAT THE OUTPUTS  *2015CTXYCountData_projFinal.rda AND *2015CTXYSFCountData_projFinal.rda CONTAIN ADDITIONAL SOCIO-ECONOMIC VARIABLES HOWEVER, ANY USE OF THESE DATA SETS WITHIN THE THESIS ARE FOCUSSED ONLY ON THE TOTAL POPULATION AND AVERAGE INCOME. THEREFORE, ANY MENTION OF THESE ADDITIONAL VARIABLES IN RELEVANT README DOCUMENTS ARE AS AN ASIDE AND THE R CODE TO INCLUDE THESE ADDITIONAL VARIABLES ARE COMMENTED OUT. HOWEVER, THE CODE AND THE ACCESS TO THESE VARIABLES ARE STILL AVAILABLE IF REQUIRED.

