THIS README IS FOR THE DATA DIRECTORY, EACH SUB-DIRECTORY WILL ALSO CONTAIN IT'S OWN README FILE.

THIS DIRECTORY CONTAINS THE CODE AND OUTPUTS FROM THE DATA MANIPULATION FOR THE CRIME POINT PATTERNS AND THE SOCIO-ECONOMIC VARIABLES. THERE ARE SEVERAL DATA OUTPUTS, FIRSTLY THE CRIME POINT DATA AND SOCIO-ECONOMIC VARIABLES ARE EXTRACTED FOR THE RELEVANT CRIMES AND CENSUS TRACTS, RESPECTIVELY. ADDITIONALLY WE GENERATE COUNT DATA OVER THE CENSUS TRACTS AS WELL AS TWO DIFFERENT GRIDDED AGGREGATIONS OF THE CRIME DATA, ONE FOR USE IN CHAPTER 4 FOR THE GRID-MESH METHOD (ONLY GENERATED FOR LOS ANGELES) AND THE OTHER FOR CHAPTER 4 AND 5 FOR THE FINAL MODEL FITS. ADDITIONALLY, WE GENERATE THE RELEVANT POLYGONS AND MESHES FOR THE INLA AND INLA WITHIN MCMC IMPLEMENRATIONS. IT ALSO CONTAINS THE OUTPUTS FOR THE EXPLORATORY DATA ANALYSIS AND INITIAL MODELS FOR THE CRIME DATA, THE GLM MODELS FOR THE CENSUS TRACT COUNT DATA, WITH RIPLEY'S K (DISCUSSED IN CHAPTER 1 + APPENDIX A) AND THE MINIMUM CONTRAST (FOR THE SET-UP OF THE GRID-MESH METHOD IN CHAPTER 4) IMPLEMENTATIONS FOR THE POINT PATTERNS.

THE RAW AND SOME OF THE MANIPULATED DATA FILES ARE NOT CONTAINED WITHIN THESE ARCHIVED FOLDERS BUT THE NECESSARY RAW DATA FOR EACH SUB-DIRECTORY CAN BE ACCESSED THROUGH AS DISCUSSED DataAccessInformation.pdf AS WELL AS IN APPENDIX F OF MY THESIS. ALSO THE MANIPULATED DATA CAN BE CREATED THROUGH THE R FILES IN EACH RELEVANT SUB-DIRECTORY. WHILE WE DO NOT HAVE THE FILES ARCHIVED, WE  DISCUSS THE NAMING CONVENTIONS BELOW AS THEY ARE USED WITHIN THE R SCRIPTS FOR EACH SUB-DIRECTORY.

- RAW_DATA: this contains 3 sub-directories for each of the crime point patterns, socio-economic variables and the shapefiles. The contain the original data files for the crime point patterns, socio-economic variables and shapefiles as well as the code to extract the required crimes (homicide and motor vehicle theft) as well as the code to extract the required census tract data for the socio-economic variables. We also have code to extract the census tracts within the cities of interest where the census tract data downloaded was by state which contain the cities.

- PROCESSED_DATA: this also contains 3 sub-directories labelled similarly to the RAW_DATA sub-directories. In this directory the COVARIATE and SHAPEFILE sub-directory only contain the outputs from the RAW_DATA sub-directories R scripts. The CRIME sub-directory contains the code for the creation of the count data sets from the census tracts, point patterns and socio-economic variables, creating grids to interpolate the socio-economic variables onto using areal interpolation.

- EDA: this contains plots for the crime count data on the census tracts and also the interpolated data onto the grids as well as some census tract plots and point pattern plots.

- MODELS: this contains 2 sub-directories, one for the code to implement the Ripley's K and MCMC GLM models for Chapter 1 and the other for running the Minimum Contrast method for the estimation of the range and standard deviation for the LA crime point patterns using the 2014 data which is generated in this sub-directory. Additionally, we generate 2014 count data over the finest grid in order to run a quick Poisson GLM to generated grid data within INLA for interest with respect to the direction and magnitude of the covariate effects. The interest here lies in the choice of the fixed values for the Grid-Mesh Optimisation method on the LA polygon.