Dataset for "Identifying misleading corporate narratives: The application of linguistic and qualitative methods to commercial determinants of health research"

This dataset concerns the language used by Philip Morris international (PMI) in their external communications. The dataset was collected in order to identify patterns in language used in different corporate communications.

This dataset includes Annual Reports, Investor reports, Investor day slide decks, transcripts and reports. It is a record of the inductive coding conducted for this project.

The dataset contains the final codebook, which was agreed and approved by all coders.

Corporate narratives, Nvivo, Qualitative coding, Corporate communications, Corporate determinants of health
Social policy
Tools, technologies and methods

Cite this dataset as:
Fitzpatrick, I., Bertscher, A., Gilmore, A., 2022. Dataset for "Identifying misleading corporate narratives: The application of linguistic and qualitative methods to commercial determinants of health research". Bath: University of Bath Research Data Archive. Available from:


[QR code for this page]


PMI SiteMap 22 July 2019.xlsx
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (63kB)
Registered users only

Site map data extracted from on 22July 2019. Converted to .xlsx.

Access on request: Data files created for this research may be accessed on reasonable request by academics and other bona fide researchers.


Adam Bertscher
University of Bath

Anna Gilmore
University of Bath


Joanne Cranwell
University of Bath


Collection date(s):

From 1 July 2019 to 1 September 2019

Temporal coverage:

From 1 January 2012 to 31 December 2019


Data collection method:

We developed a mixed-methods protocol blending methods and tools from corpus linguistics (CL) with inductive coding methods. For the “investor” targeted communications, we sampled annual and investor reports and presentations to shareholders, which included any associated slides and scripts. For the “wider audience” targeted communications, we sampled corporate webpages, and corporate YouTube videos. Annual reports, investor reports and presentations to shareholders were accessed and downloaded via PMI’s investor web pages. The sample of corporate webpages was downloaded manually according to a sampling filter developed specifically for this research. Corporate YouTube videos were accessed directly through PMI’s dedicated account. Automatically generated audio transcripts available from YouTube videos on a single playlist, titled “Inside Us” were saved in NVivo and checked for accuracy by IF, to allow for qualitative coding. Annual and investor reports were downloaded from PMI's website. Webpage sampling As there was no site map readily available through PMI's website, we used a freely available online site mapping tool ( to generate an XML site map for and sampled webpages according to the underlying logic of the page hierarchy. This sample consisted of pages at the “top” level (for example, and any pages nested under “home”, “investor relations”, “our business” or “who are we”. Target pages were captured using NVivo and Greenshot, and archived using WayBack Machine.

Technical details and requirements:

Coded in NVivo release 1.5.1 (940). QSR International. Subject to license agreement. File converted to permit reading in latest version of NVivo - release 1.6.1 (1137). SketchEngine ( was used for Corpus Analysis. Subject to licence agreement. (UoB licence not active as of 01/09/2022)

Additional information:

Data files in NVivo are sorted according to data type/source.

Documentation Files

Data Source … Web sources.xlsx
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (21kB)
Registered users only

Record of all captured webpages and Youtube videos and their corresponding web archive URL. URLs active at time of submission.


Bloomberg Philanthropies

Bloomberg STOP

Publication details

Publication date: 24 October 2022
by: University of Bath

Version: 1


URL for this record:

Related papers and books

Fitzpatrick, I., Bertscher, A., and Gilmore, A. B., 2022. Identifying misleading corporate narratives: The application of linguistic and qualitative methods to commercial determinants of health research. PLOS Global Public Health, 2(11), e0000379. Available from:

Contact information

Please contact the Research Data Service in the first instance for all matters concerning this item.

Contact person: Iona Fitzpatrick


Faculty of Humanities & Social Sciences

Research Centres & Institutes
Tobacco Control Research Group (TCRG)