The SPADE Consortium

The SPADE project and development of the ISCAN software would not be possible without the Data Guardians who agreed to share their corpora with the project. This input has been so crucial that we consider the Data Guardians as co-authors of all SPADE-related output which makes use of private, project-external, datasets. Listing all Data Guardians as authors is impractical and we therefore group them into ‘the SPADE Consortium’ to permit abbreviation of the author list. In so doing we have followed many of the conventions adopted by the Atlas of Pidgin and Creole Language Structures (APiCS) (https://apics-online.info/about), and thank the potential Data Guardian who drew our attention to this method of acknowledging the time and effort taken by the Data Guardians to create their corpora.

NB: We have adopted this convention even when we do not use all of the corpora listed below for simplicity. The specific corpora used for a particular output are listed and acknowledged within the text of that output. In addition, we acknowledge the specific publicly available corpora used, an overall list of all such datasets gathered for the project is also provided below.

This page will be updated as more corpora are received.


The SPADE project team would like to thank the following Data Guardians and SPADE consortium members for allowing their corpora to be used in developing the ISCAN software:

  • The late Farhana Shaukat Alam, the Glaswasian Corpus
    • Alam, Farhana (2016). ‘Glaswasian’?: A Sociophonetic Analysis of
      Glasgow-Asian Accent and Identity.
      PhD Dissertation. University of
      Glasgow.
  • Molly Babel and the University of British Columbia, the DRAWL Corpus
    • Babel, Molly, A. Cardoso, K Hayter, R. Pritchard and K Xu, “Populating the map of British Columbia English”. Canadian Linguistic Association CLA-ACL 2019, Vancouver, BC, June 1 – 3, 2019.
    • Cardoso, Amanda, K. Xu, M. Babel and R. Pritchard, “Different Means to a Similar End: Apparent time change in British Columbian Englishes” New Ways of Analyzing Variation 48 (NWAV 48), Eugene, Oregon, October 10-12, 2019.
  • Karen Corrigan, the DECTE Corpus
    • Arts and Humanities Research Board RE11776 + Arts and Humanities Research Council AH/H037691/1
    • Allen, W., Beal, J.C., Corrigan, K.P., Moisl, H. and Maguire, W. (2007). The Newcastle Electronic Corpus of Tyneside English, in Beal, J.C., Corrigan, K.P. and Moisl, H. (eds.)
      Creating and Digitizing Language Corpora: Vol. 2, Diachronic Databases, pp.16-48. Houndsmills: Palgrave Macmillan.
    • Corrigan, K.P., Moisl, H.L. and Beal, J.C. (2005). A Linguistic ‘Time-Capsule’: The Newcastle Electronic Corpus of Tyneside English.https://research.ncl.ac.uk/necte/
    • Corrigan, K.P., Buchstaller, I., Mearns, A. and Moisl, H. (2012). The Diachronic Electronic Corpus of Tyneside English. Newcastle University. https://research.ncl.ac.uk/decte/
    • Mearns, A.J., Corrigan, Karen P. and Buchstaller, I. (2016). The Diachronic Electronic Corpus of Tyneside English and The Talk of the Toon: Issues in Preservation and Public
      Engagement, in Corrigan, K.P. and Mearns, A.J.(eds.) Creating and Digitizing Language Corpora, Volume 3: Corpora for Public Engagement. Houndmills, Basingstoke:
      Palgrave-Macmillan, pp.177-210.
  • Robin Dodsworth, the Raleigh Corpus
    • Dodsworth, R. & Benton, R. (2019). Language variation and change in social networks: A bipartite approach. Routledge.
    • Dodsworth, R. & Benton, R. (2017). Social network cohesion and the retreat from Southern vowels in Raleigh. Language in Society 46, 371-405.
  • Anne Fabricius, the Modern RP Corpus
    • Fabricius, A. H. (2000). T-glottalling between stigma and prestige: a
      sociolinguistic study of Modern RP.
      Unpublished Ph.D. thesis.
      Copenhagen, Denmark: Copenhagen Business School. URL:
      https://forskning.ruc.dk/da/publications/t-glottalling-between-stigma-and-prestige-a-sociolinguistic-study.
  • Ulrike Gut, the ICE-Scotland Corpus
  • Lauren Hall-Lew, the Sunset Corpus
    • Cardoso, Amanda, Lauren Hall-Lew, Yova Kemenchedjieva, and Ruaridh Purse. (2016). Between California and the Pacific Northwest: The Front Lax Vowels in San Francisco English. In Valerie Fridland, Betsy Evans, Tyler Kendall, and Alicia Wassink, eds. Speech in the Western States, Volume 1: The Coastal States, pp. 33-54. Publication of the American Dialect Society. Durham, NC: Duke University Press.
    • Hall-Lew, Lauren. (2013). ‘Flip-flop’ and mergers-in-progress. English Language and Linguistics, 17(2), 359-390.
  • Kirk Hazen, the West Virginia Dialect Corpus
    • Hazen, K. (2018). Listening to Rural Voices: Sociolinguistic Variation in West Virginia. Christine Mallinson & Elizabeth Seale (eds.). Rural Voices: Language, Identity, and Social Change across Place. Washington, DC: Rowman & Littlefield. 75-90.
    • Hazen, K., Lovejoy, J., Daugherty, J. & Vandevender, M. (2016). Continuity and change of English consonants in Appalachia. William Schumann & Rebecca Adkins Fletcher (eds.).Appalachia Revisited: New Perspectives on Place, Tradition, and Progress. Lexington, KY: University Press of Kentucky. 119–138.
  • Sophie Holmes-Elliott, the Hastings Corpus
    • http://sophieholmeselliott.com/
    • Holmes-Elliott, S. & Turner, J. The emergence of gendered production between childhood and adolescence: A real time analysis of /s/ in Southern British English. Proceedings from the XVIV International Congress of Phonetic Sciences, Melbourne, Australia.
    • Holmes-Elliott, S. (2015). London calling: assessing the spread of metropolitan features in the southeast. PhD thesis, University of Glasgow.
  • Mary Kohn, the Kansas Speaks Corpus
    • García, T. & Kohn, M. (2018) Lateral production in Liberal, Kansas: Minority alignment to the new majority. Paper presented at NWAV 47: New Ways of Analyzing Language Variation. New York, NY. October.
    • Villarreal, D. & Kohn, M. (2018) Local meaning for supra-local change: A perception study of TRAP backing in Kansas. Paper presented at Sociolinguistic Symposium 22. Auckland, NZ. June.
  • Eleanor Lawson, the Devon Adolescent Speech Corpus
  • Adrian Leemann, the English Dialects App Corpus
    • Leemann, A., Kolly, M-J., & Britain, D. (2018). The English Dialects App: the creation of a crowdsourced dialect corpus. Ampersand 5, 1-17.
  • Sylvain Navarro & Anne Przewozny, the PAC-Lancashire Corpus
    • Navarro, S. (2013). Rhoticité et ‘r’ de sandhi en anglais: du Lancashire à Boston. PhD Dissertation, University Toulouse Jean Jaurès
    • Noël, E. (2003). English phonology in central Lancashire : a dialectological study. Master’s Dissertation, University Toulouse 2 Le Mirail
  • Jonathan Morris, the North Wales Corpus
    • Morris, J. (2017). Sociophonetic variation in a long-term language contact situation: /l/-darkening in Welsh-English bilingual speech. Journal of Sociolinguistics 21(2), 183-207.
  • Caroline Myrick, the Saban English Corpus
    • Myrick, C., Eberle, N., Schneier, J. and Reaser, J. (2020). Mapping Linguistic Diversity in the English-Speaking Caribbean. Stanley D. Brunn & Roland Kehrein (eds.). Handbook of the Changing World Language Map. Springer. pages 1469-1487
    • Myrick, C. (2014). Putting Saban English on the Map: A descriptive analysis of English language variation on Saba. English World-Wide. 35:2, 161-192.
  • Panayiotis Pappas, Simon Fraser University, the British Columbian Vernacular English Corpus
  • Monika Pukli & Anne Przewozny, the PAC-Ayrshire Corpus
  • Nicole Rosen, the University of Manitoba, the Languages in the Prairies Project Corpus
  • Sadie Ryan, the Polish-Scottish Corpus
  • Vijay Solanki, the Glasgow Brains in Dialogue Corpus
    • Solanki, V.J. (2017). Brains in dialogue: investigating
      accommodation in live conversational speech for both speech and EEG
      data.
      PhD thesis, University of Glasgow.
    • Solanki V., Vinciarelli A., Stuart-Smith J., & Smith R. (2016). When
      the Game Gets Difficult, then it is Time for Mimicry. In: Esposito A.
      et al. (eds). Recent Advances in Nonlinear Speech Processing. Smart
      Innovation, Systems and Technologies, vol 48. Springer, Cham.
  • James Stanford, the Dartmouth New England English Database
    • Stanford, J. (2019). New England English: Large-Scale Acoustic Sociophonetics and Dialectology. New York: Oxford University Press.
    • Stanford, J. (forthcoming). A modern update on New England dialectology: Introducing the Dartmouth New England English Database (DNEED). American Speech.
  • Jane Stuart-Smith, the Sounds of the City and Carnegie Corpora
    • Leverhulme Trust RPG-142 & Carnegie Trust for the Universities of Scotland
    • Stuart-Smith, J., José, B., Rathcke, T., Macdonald, R. and Lawson, E. (2017) Changing sounds in a changing city: an acoustic phonetic investigation of real-time change over a century of Glaswegian. In: Montgomery, C. and Moore, E. (eds.) Language and a Sense of Place: Studies in Language and Region. Cambridge University Press: Cambridge, pp. 38-64.
    • Stuart-Smith, J. , Sonderegger, M., Rathcke, T. and Macdonald, R. (2015) The private life of stops: VOT in a real-time corpus of spontaneous Glaswegian. Laboratory Phonology, 6(3-4), pp. 505-549.
  • Jennifer Smith, the One Speaker Two Dialects Corpus
    • ESRC Grant no ES/K000861/1
    • Smith, J. & Holmes Elliott, S. (2018). The unstoppable glottal: tracking rapid change in an iconic British variable. English Language and Linguistics, 22(3), 323-355.
    • Holmes Elliott, S. & Smith, J. (2018). Dressing down up north: DRESS-lowering and /l/ allophony in a Scottish dialect. Language Variation and Change, 30(1), 23-50.
  • Gerard Van Herk, the Petty Harbour Corpus
    • Childs, B., Van Herk, G. & Thorburn, J. (2011). Safe Harbour: Ethics and Accessibility in Sociolinguistic Corpus Building. CLLT 7(1), 163-180.
  • Cécile Viollain & Sylvain Navarro, the PAC-Boston Corpus
    • Navarro, S. (2013). Rhoticité et ‘r’ de sandhi en anglais: du Lancashire à Boston. PhD Dissertation, University Toulouse Jean Jaurès
    • Viollain, C. (2010). Sociophonologie de l’anglais à Boston : Une étude de la rhoticité et de la liaison. Master’s Dissertation, University Toulouse 2 Le Mirail
  • Kevin Watson, The Origins of Liverpool English (OLIVE) Corpus
    • ESRC Grant no RES-061-25-0458
    • Watson, Kevin, & Clark, Lynn (2017). The Origins of Liverpool English. In R. Hickey (Ed.), Listening to the Past: Audio Records of Accents of English (Studies in English Language, pp. 114-141). Cambridge: Cambridge University Press.
  • Alicia Beckford Wassink, the Pacific Northwest English Corpus
    • Beckford Wassink, A. (2016). The vowels of Washington State. Fridland, Beckford Wassink, Kendall & Evans (eds.). Speech in the Western States Volume 1: The Coastal States. Publication of the American Dialect Society (PADS). Volume 101:1. Durham, NC: Duke University Press. pages 77-105.
    • Beckford Wassink, A. (2015). Sociolinguistic patterns in Seattle English. Language Variation and Change 27:1, 31-58.
  • Jessica Wormald, the PEBL Corpus
    • Wormald, J. (2016). Regional Variation in Panjabi-English. PhD
      thesis, University of York.
    • Wormald, J. (2015). Dynamic Variation in ‘Panjabi-English’: Analysis
      of F1 &F2 Trajectories for FACE /eɪ/ and GOAT /əʊ/. In The Scottish
      Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th
      International Congress of Phonetic Sciences.
      Glasgow, UK: the
      University of Glasgow. ISBN 978-0-85261-941-4. Paper number 0809
      retrieved from
      https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0809.pdf

The SPADE project team would also like to thank the SPADE Consortium members and Data Guardians of the following corpora for allowing their corpora to be used in developing the ISCAN software:

  • The WYRED corpus
    • Gold, E., Ross, S., & Earnshaw, K. (2018). The ‘West Yorkshire Regional English Database’: Investigation into the generalizability of reference populations for forensic speaker comparison paperwork. Proc. Interspeech, Sep 2-6 2018, Hyderabad, 2748-2752.

The SPADE project team would furthermore like to thank those corpus Data Guardians and SPADE Consortium members who wish to remain anonymous for allowing their corpora to be used in developing the ISCAN software.


The SPADE project team would additionally like to acknowledge the collection and/or use of the following publicly available datasets:

  • Audio BNC
    • Coleman, J., Baghai-Ravary, L., Pybus, J. & Grau, S. (2012). Audio BNC: the audio edition of the Spoken British National Corpus.
      Phonetics Laboratory, University of Oxford. http://www.phon.ox.ac.uk/AudioBNC
  • Buckeye
    • https://buckeyecorpus.osu.edu/
    • Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume,
      E. and Fosler-Lussier, E. (2007) Buckeye Corpus of Conversational
      Speech (2nd release)
      [www.buckeyecorpus.osu.edu] Columbus, OH:
      Department of Psychology, Ohio State University (Distributor).
  • CORAAL
    • http://lingtools.uoregon.edu/coraal/
    • Kendall, T. & Farrington, C. (2018). The Corpus of Regional
      African American Language.
      Version 2018.10.06. Eugene. The Online
      Resources for African American Language Project.
      http://coraal.uoregon.edu/coraal
  • Doubletalk
    • http://espf.ppls.ed.ac.uk/
    • Scobbie, J.M., Turk, A., Geng, C., King, S., Lickley, R., & Richmond,
      K. (2013). The Edinburgh Speech Production Facility DoubleTalk
      Corpus. Proceedings of 14th Interspeech, Lyon.
    • Geng, C, Turk, A, Scobbie, JM, Macmartin, C, Hoole, P, Richmond, K,
      Wrench, A, Pouplier, M, Bard, E, Campbell, Z, Dickie, C, Dubourg, E,
      Hardcastle, W, Kainada, E, King, S, Lickley, R, Nakai, S, Renals, S,
      White, K & Wiegand, R. (2013). Recording speech articulation in
      dialogue: Evaluating a synchronized double electromagnetic
      articulography setup. Journal of Phonetics, 41(6), 421-431. DOI:
      10.1016/j.wocn.2013.07.002
    • Further thanks to Alice Turk for drawing our attention to this corpus.
  • DyViS
    • Nolan, F., Dynamic Variability in Speech: a Forensic Phonetic Study
      of British English
      , 2006-2007 [computer file]. Colchester, Essex: UK
      Data Archive [distributor], July 2011. SN: 6790 ,
      http://dx.doi.org/10.5255/UKDA-SN-6790-1. ESRC Grant no
      RES-000-23-1248.
    • Nolan, F., McDougall, K., de Jong, G. & Hudson, T. (2009). ‘The DyViS
      database: style-controlled recordings of 100 homogeneous speakers for
      forensic phonetic research’, International Journal of Speech,
      Language and the Law 16(1)
      , 31-57.
  • Edinburgh (Arthur the Rat)
    • University of Edinburgh. School of Philosophy, Psychology, and
      Language Sciences. Department of Linguistics and English Language.
      (2013). Arthur the Rat, 1949-1966 [sound]. dx.doi.org/10.7488/ds/163.
      https://datashare.is.ed.ac.uk/handle/10283/392
    • Further thanks to James Kirby for aligned textgrids.
  • ICE-CAN
    • http://ice-corpora.net/ice/
  • IViE
    • Grabe, E., Nolan, F. & Post, B. English Intonation in the British
      Isles: The IViE Corpus.
      Phonetics Laboratory, University of Oxford,
      Department of Linguistics, University of Cambridge, ESRC Grant
      R000237145, 1997-2002.
      http://www.phon.ox.ac.uk/files/apps/IViE/
  • LUCID
    • Baker, R & Hazan, V. (2010). LUCID: A corpus of spontaneous and read clear speech in British English.
      In: (Proceedings) DISS-LPSS Joint workshop, Tokyo, Japan, 24-25 September 2010, pp 3-6.
    • Further thanks to Valerie Hazan for drawing our attention to this corpus.
  • Northern Englishes
    • Haddican, W., Foulkes, P. (2013). A comparative study of language
      change in Northern Englishes.
      [data collection]. UK Data Service. SN:
      851013, http://doi.org/10.5255/UKDA-SN-851013
    • Further thanks to Márton Sóskuthy for aligned TextGrids.
  • Santa Barbara
    • http://www.linguistics.ucsb.edu/research/santa-barbara-corpus
    • Du Bois, John W., Wallace L. Chafe, Charles Meyer, Sandra A.
      Thompson, Robert Englebretson, and Nii Martey. 2000-2005. Santa
      Barbara corpus of spoken American English, Parts 1-4.
      Philadelphia:
      Linguistic Data Consortium.
  • The SCOTS corpus
    • https://www.scottishcorpus.ac.uk/
    • Anderson, J., Beavan, D.& Kay, C. (2007). Scots: Scottish corpus of texts and speech. In: Creating and digitizing language corpora. Springer 17–34.
  • The Sound Atlas of Irish English corpus
    • Hickey, R. (2004). A Sound Atlas of Irish English. Berlin, Boston: De Gruyter Mouton.
  • Switchboard
    • https://catalog.ldc.upenn.edu/ldc97s62
    • Godfrey, John, and Edward Holliman. Switchboard-1 Release 2 LDC97S62.
      Web Download. Philadelphia: Linguistic Data Consortium, 1993.
  • TIMIT
    • https://catalog.ldc.upenn.edu/LDC93S1
    • Garofolo, John S., et al. TIMIT Acoustic-Phonetic Continuous Speech
      Corpus LDC93S1.
      Web Download. Philadelphia: Linguistic Data
      Consortium, 1993.

Further Acknowledgements

We gratefully acknowledge the use of webMAUS and the Montreal Forced Aligner in SPADE dataset preparation.

Kisler, Thomas, Uwe D. Reichel, and Florian Schiel (2017): Multilingual processing of speech via web services, Computer Speech & Language, Volume 45, September 2017, pages 326–347.
https://clarin.phonetik.uni-muenchen.de/BASWebServices/interface

McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, Michael Wagner, and Morgan Sonderegger (2017). Montreal Forced Aligner: trainable text-speech alignment using Kaldi. In Proceedings of the 18th Conference of the International Speech Communication Association.
https://montreal-forced-aligner.readthedocs.io/