Dataset for "Battle for Britain: Analyzing Events as Drivers of Political Tribalism in Twitter Discussions of Brexit”

In this study, we investigate how Brexit tribalism has unfolded over time on Twitter. The dataset contains a corpus of tweets posted to Twitter during a period of 32 months following the 2016 UK European Union membership referendum. The tweets were selected as a result of searching for keywords: firstly for "Brexiteer" and "Remainer" and secondly for "Brextremist" and "Remoaner". The CSV file in this dataset contains both sets of results. There are two columns in the file: timestamp and tweet text, which will be sufficient to replicate our process. Tweet IDs were removed to preserve user anonymity. First, we characterize the nature of the discussion by comparing language use patterns between tweets containing Brexiteer/Remainer and Brextremist/Remoaner keywords. We find that Brextremist/Remoaner are more commonly used in a derogatory way. We also find that all four group identity keywords are used more frequently over time, suggesting an increase in tribal interactions. Finally, we find evidence of a relationship between real‐life Brexit events and spikes in tribal responses online.”

Twitter data, Social media, political tribalism, partisan politics, Brexit, Intergroup conflict, Polarization
Information and communication technologies

Cite this dataset as:
North, S., Piwek, L., Joinson, A., 2020. Dataset for "Battle for Britain: Analyzing Events as Drivers of Political Tribalism in Twitter Discussions of Brexit”. Bath: University of Bath Research Data Archive. Available from:


[QR code for this page]

application/zip (587MB)
Creative Commons: Attribution 4.0

This file contains tweets for both sets of keywords (Brexiteer/Remainer and Brextremist/Remoaner). Tweet IDs have been removed to preserve user anonymity. There are two columns in the file: timestamp and tweet text.


Samantha North
University of Bath

Lukasz Piwek
University of Bath

Adam Joinson
University of Bath


Collection date(s):

From 13 February 2019 to 13 February 2019

Temporal coverage:

From 1 June 2016 to 13 February 2019


Data collection method:

Data was originally extracted from Twitter's Historical Power Track API, via Crimson Hexagon's Forsight platform (Crimson Hexagon, 2019). We queried first for keywords “Brexiteer” and “Remainer,” then for “Brextremist” and “Remoaner,” producing two separate datasets in raw javascript object notation (JSON) format, including all tweet object fields. The tweet object encompasses fundamental variables obtained for each tweet such as unique id, date, and text, along with information about retweets, favorites, and hashtags. The raw JSON files were very large (over 80 GB), so we used a Python script to extract only the four key variables required for analysis: unique tweet id, date and time of tweet, text of tweet, and keyword associated with tweet. The original number of users in the data set was n = 844,881. The resulting tweets were stored in a csv file and included a total of n = 9,027,822 tweets posted between June 1, 2016 and February 13, 2019. For the purposes of this data upload, we retained only date and time of tweet and text of tweet, which is necessary to keep users anonymous.


Engineering and Physical Sciences Research Council (EPSRC)

Cyber Security Across the LifeSpan (cSalsa)

Publication details

Publication date: 12 August 2020
by: University of Bath

Version: 1


URL for this record:

Contact information

Please contact the Research Data Service in the first instance for all matters concerning this item.

Contact person: Samantha North


School of Management
Information, Decisions & Operations