This folder contains data corresponding to locusts experiments by Buhl et al. (2006) as well as code written by the authors to clean the data and perform the analysis presented in the manuscript. ############## FOLDER ORGANISATION ############### There are three folders: raw_data: contains the original raw data provided by the authors of the Buhl et al. (2006). tracked_data: contains the tracking of the locusts after processing the raw data using our own tracking code. spatial_tests: contains the code and simulation data used to test the spatial hypothesis explored in the manuscript and Supplementary Information. Code files in this folder are explained below and within each code as inline comments. Additional README files are provided within each folder as necessary to provide further details on the data and code. ############ ORIGINAL DATA AND CITING ############ Original data provided to us by the authors of Buhl et al. (2006). The data is being shared with the consent of the authors Camille Buhl and Iain Couzin. Please cite the following paper if using the raw data. Please refer to this paper for details on the data collection. C. Buhl et al., From Disorder to Order in Marching Locusts. Science 312,1402-1406 (2006). DOI:10.1126/science.1125142 If using our processed tracked data, please cite our publication. ######## DATA ORGANISATION AND FORMATTING ######### ORIGINAL RAW DATA: The raw data is organised as follows: Each folder contains data from one or more experiments with the number of locusts indicated by the folder name. Each file in the folder corresponds to a different experiment. The files' name is defined as 'raw_[group size]_[experiment number]' (e.g., 'raw_5_28' contains the raw data for experiment 28 with 5 locusts in the arena). In each file, the data is presented in several rows, and columns are separated by commas. Each lines contains the following data: [ Object ID | x position (in pixels) | y coordinate (in pixels) | object area ] A frame of the video ends when the Object ID resets to 1. Note that, occlusions and shadows may cause the number of objects to change between frames. Objects can also correspond to more than one locust, to shadows, or other objects in the arena. In the original data, objects are ordered by y coordinate, hence, the ID does not always correspond to the same object. PROCESSED TRACKED DATA: We have processed the original raw data to track objects consistently, identify occlusions and removed objects not corresponding to locusts from the data (such as shadows). This 'cleaned' data has been made available alongside the original data (see the "tracked_data" folder). As with the raw data, each folder contains data from one or more experiments with the number of locusts indicated by the folder name. Each file in the folder corresponds to a different experiment. The files' name is defined as 'raw_[group size]_[experiment number]' (e.g., 'raw_5_28' contains the raw data for experiment 28 with 5 locusts in the arena). Some experiments have been identified to have different group sizes than originally labelled. Hence, some experiments have had group sizes changed. In each file, the data is presented in several rows, and columns are separated by commas. Each lines contains the following data: [ Object ID | x position (in pixels) | y coordinate (in pixels) | object area | angular position in the arena (radians) | displacement (10 frames window, 2 seconds) | classification (moving clockwise, stopped, moving anticlockwise) | displacement (1 frame window, 0.2 seconds) | frame number ] To classify the locusts as moving clockwise (value -1), stopped (value 0) or moving anticlockwise (value 1), we used the 10 frame displacement of the locusts (based on the angular position of the locust in the arena). This reduces the effect of measurement noise in the position of the locusts in each frame. ################### CODE FILES #################### The code was placed in the folder such that the required folder navigations are defined in terms of the root folder where the code is. Python files have been implemented on version 3.9.16. Install additional requisite packages as indicated at the start of the file. C++ code requires gcc version 13.1.0 or newer to compile. MATLAB code was been implemented on version R2022a. Additional packages such as the optimisation toolbox might be required. Each code script comes with inline comments to provide context and explain what task the code is performing. Here is a brief description of each script. txt2mat.m is a required function to read the raw data files (text) and transform into arrays that MATLAB can work with. cleaning_data.m is the MATLAB code with specific cleaning directives for each particular experimental data file, corresponding to "pre-cleaning" test and observations that help determine the number of locusts observed in the data and initialisations for the tracking. cleaning_data.m calls the function clean_data.m to perform the tracking of the locusts in each experiment given the required directives. This script generates both the tracked data, stored in the "tracked_data" folder and the directional files, indicating the number of clockwise and anticlockwise moving locusts observed in each frame of the data. The latter is stored in the "dir_files" folder. equation_free_analysis_2d.m uses the equation-free method in 2 dimensions to estimate the drift and diffusion of the underlying SDE of the data, fitting the functional forms predicted by the theory for the hyperbolic manifold projection. We fit the data by minimising weighted squared errors using MATLAB's particleswarm algorithm. The script generates files with the fitted rates ("rates_eqfree2d_[group size]_[experiment number]") stored in the "dir_files" folder. data_heatmaps.py is a script to plot heatmaps of the data (dir_* files), and theoretical flow fields and stationary probability distributions using the parameters fitted with the equation free method (rates_eqfree2d_* files). spontaneous_switches.m performs the interaction radius analysis leading to Figure S5 of the Supplementary Information by checking how frequently locusts change direction when "isolated", i.e., no other locusts in the predefined "interaction radius". The code generates estimates of the probability of changing direction when "isolated" for different values of radius and for experiments with different group sizes. estimate_radius_speed.m computes the average length of the arena run by the locusts, estimating the ratio between the interaction radius (20 cm) and the arena length, L. It also computes the average speed of moving locusts in each case. These are then saved in the "spatial_tests" folder in the "radius_speed" file, and used to plot the crosses in the Wasserstein metric figure (Figure S6 of the SI).