THIS README IS FOR THE DATA/EDA DIRECTORY.

THIS DIRECTORY CONTAINS 3 R SCRIPTS AND 3 SUB-DIRECTORIES, ONE FOR EACH CITY, WITHIN WHICH WE SAVE THE OUTPUT FROM THE R SCRIPTS FOR THE EXPLORATORY DATA ANALYSIS FOR THE CRIME DATA AND SOCIO-ECONOMIC DATA.

THE OUTPUT PLOTS ARE CONTAINED WITHIN THIS ARCHIVED FOLDER, GENERATED THROUGH THE RELEVANT R SCRIPTS IN THIS DIRECTORY AND WE ALSO DISCUSS THE NAMING CONVENTIONS FOR EACH PLOT OUTPUT FROM EACH R SCRIPT.

NOTE: the only socio-economic variables used within my thesis were the total population and average income and so the final R code for data generation was tailored accordingly, with the code for the inclusion of other socio-economic variables commented out. Therefore, we proceed similarly for the plotting of the data, where for any additional variables we comment out they're inclusion and plot outputs and this can be easily uncommented if required to generate the relevant plots.

- EDA_CT_final.R: this R script plots the crime data for each city with respect to the census tracts, for example we plot the crimes with respect to the census tract level socio-economic variables. Plots saved with suffix: *CT.pdf.

- EDA_GRID_final.R: in this R script we more generally plot the crime data, for example number of motor vehicle thefts per year. Importantly, we also plot the maps for the crimes in each city, for all years in the data and also highlighting the year 2015. We also plot the gridded data, for example the gridded socio-economic variables for each resolution as well as the count data plotted against the grid-interpolated socio-economic variables. We also plot heat maps of the census tract-level and grid-interpolated socio-economic variables. Plots saved with suffix: *_final.pdf.

- EDA_TRANSFORMEDGRID_final.R: similarly to EDA_GRID_final.R, we plot the count crime data from the scaled gridded data against the socio-economic variables. This R script aslo produces plots of the scaled gridded socio-economic variables over a map. These should  be similar to the plots from the un-scaled data, however they have been produced using the final data sets that will be used for modelling the necessary data. Plots saved with suffix: *_finalscale.pdf.

- AreaHistograms_final.R: this produces plots for Los Angeles and New York census tract areas as well as the areas of the grid cells for the different resolutions in order to compare the distribution of the census tract areas against the areas of the grid cells. Plots saved with suffix: *CTAreaHistograms*.pdf or *CTGridAreaHistograms*.pdf.