The SPADE Consortium

The SPADE project and development of the ISCAN software would not be possible without the Data Guardians who agreed to share their corpora with the project. This input has been so crucial that we consider the Data Guardians as co-authors of all SPADE-related output which makes use of private, project-external, datasets. Listing all Data Guardians as authors is impractical and we therefore group them into ‘the SPADE Consortium’ to permit abbreviation of the author list. In so doing we have followed many of the conventions adopted by the Atlas of Pidgin and Creole Language Structures (APiCS) (https://apics-online.info/about), and thank the potential Data Guardian who drew our attention to this method of acknowledging the time and effort taken by the Data Guardians to create their corpora.

NB: We have adopted this convention even when we do not use all of the corpora listed below for simplicity. The specific corpora used for a particular output are listed and acknowledged within the text of that output. In addition, we acknowledge the specific publicly available corpora used, an overall list of all such datasets gathered for the project is also provided below.

This page will be updated as more corpora are received.


The SPADE project team would like to thank the following Data Guardians and SPADE consortium members for allowing their corpora to be used in developing the ISCAN software:

  • The late Farhana Shaukat Alam, the Glaswasian Corpus
    • Alam, Farhana (2016). ‘Glaswasian’?: A Sociophonetic Analysis of
      Glasgow-Asian Accent and Identity.
      PhD Dissertation. University of
      Glasgow.
  • Karen Corrigan, the DECTE Corpus
    • Arts and Humanities Research Board RE11776 + Arts and Humanities Research Council AH/H037691/1
    • Allen, W., Beal, J.C., Corrigan, K.P., Moisl, H. and Maguire, W. (2007). The Newcastle Electronic Corpus of Tyneside English, in Beal, J.C., Corrigan, K.P. and Moisl, H. (eds.)
      Creating and Digitizing Language Corpora: Vol. 2, Diachronic Databases, pp.16-48. Houndsmills: Palgrave Macmillan.
    • Corrigan, K.P., Moisl, H.L. and Beal, J.C. (2005). A Linguistic ‘Time-Capsule’: The Newcastle Electronic Corpus of Tyneside English.https://research.ncl.ac.uk/necte/
    • Corrigan, K.P., Buchstaller, I., Mearns, A. and Moisl, H. (2012). The Diachronic Electronic Corpus of Tyneside English. Newcastle University. https://research.ncl.ac.uk/decte/
    • Mearns, A.J., Corrigan, Karen P. and Buchstaller, I. (2016). The Diachronic Electronic Corpus of Tyneside English and The Talk of the Toon: Issues in Preservation and Public
      Engagement, in Corrigan, K.P. and Mearns, A.J.(eds.) Creating and Digitizing Language Corpora, Volume 3: Corpora for Public Engagement. Houndmills, Basingstoke:
      Palgrave-Macmillan, pp.177-210.
  • Robin Dodsworth, the Raleigh Corpus
    • Dodsworth, R. & Benton, R. (2019). Language variation and change in social networks: A bipartite approach. Routledge.
    • Dodsworth, R. & Benton, R. (2017). Social network cohesion and the retreat from Southern vowels in Raleigh. Language in Society 46, 371-405.
  • Anne Fabricius, the Modern RP Corpus
    • Fabricius, A. H. (2000). T-glottalling between stigma and prestige: a
      sociolinguistic study of Modern RP.
      Unpublished Ph.D. thesis.
      Copenhagen, Denmark: Copenhagen Business School. URL:
      https://forskning.ruc.dk/da/publications/t-glottalling-between-stigma-and-prestige-a-sociolinguistic-study.
  • Lauren Hall-Lew, the Sunset Corpus
    • Cardoso, Amanda, Lauren Hall-Lew, Yova Kemenchedjieva, and Ruaridh Purse. (2016). Between California and the Pacific Northwest: The Front Lax Vowels in San Francisco English. In Valerie Fridland, Betsy Evans, Tyler Kendall, and Alicia Wassink, eds. Speech in the Western States, Volume 1: The Coastal States, pp. 33-54. Publication of the American Dialect Society. Durham, NC: Duke University Press.
    • Hall-Lew, Lauren. (2013). ‘Flip-flop’ and mergers-in-progress. English Language and Linguistics, 17(2), 359-390.
  • Sophie Holmes-Elliott, the Hastings Corpus
    • http://sophieholmeselliott.com/
    • Holmes-Elliott, S. & Turner, J. The emergence of gendered production between childhood and adolescence: A real time analysis of /s/ in Southern British English. Proceedings from the XVIV International Congress of Phonetic Sciences, Melbourne, Australia.
    • Holmes-Elliott, S. (2015). London calling: assessing the spread of metropolitan features in the southeast. PhD thesis, University of Glasgow.
  • Eleanor Lawson, the Devon Adolescent Speech Corpus
  • Adrian Leemann, the English Dialects App Corpus
    • Leemann, A., Kolly, M-J., & Britain, D. (2018). The English Dialects App: the creation of a crowdsourced dialect corpus. Ampersand 5, 1-17.
  • Jonathan Morris, the North Wales Corpus
    • Morris, J. (2017). Sociophonetic variation in a long-term language contact situation: /l/-darkening in Welsh-English bilingual speech. Journal of Sociolinguistics 21(2), 183-207.
  • Nicole Rosen, the University of Manitoba, the Languages in the Prairies Project Corpus
  • Vijay Solanki, the Glasgow Brains in Dialogue Corpus
    • Solanki, V.J. (2017). Brains in dialogue: investigating
      accommodation in live conversational speech for both speech and EEG
      data.
      PhD thesis, University of Glasgow.
    • Solanki V., Vinciarelli A., Stuart-Smith J., & Smith R. (2016). When
      the Game Gets Difficult, then it is Time for Mimicry. In: Esposito A.
      et al. (eds). Recent Advances in Nonlinear Speech Processing. Smart
      Innovation, Systems and Technologies, vol 48. Springer, Cham.
  • Jane Stuart-Smith, the Sounds of the City and Carnegie Corpora
    • Leverhulme Trust RPG-142 & Carnegie Trust for the Universities of Scotland
    • Stuart-Smith, J., José, B., Rathcke, T., Macdonald, R. and Lawson, E. (2017) Changing sounds in a changing city: an acoustic phonetic investigation of real-time change over a century of Glaswegian. In: Montgomery, C. and Moore, E. (eds.) Language and a Sense of Place: Studies in Language and Region. Cambridge University Press: Cambridge, pp. 38-64.
    • Stuart-Smith, J. , Sonderegger, M., Rathcke, T. and Macdonald, R. (2015) The private life of stops: VOT in a real-time corpus of spontaneous Glaswegian. Laboratory Phonology, 6(3-4), pp. 505-549.
  • Jennifer Smith, the One Speaker Two Dialects Corpus
    • ESRC Grant no ES/K000861/1
    • Smith, J. & Holmes Elliott, S. (2018). The unstoppable glottal: tracking rapid change in an iconic British variable. English Language and Linguistics, 22(3), 323-355.
    • Holmes Elliott, S. & Smith, J. (2018). Dressing down up north: DRESS-lowering and /l/ allophony in a Scottish dialect. Language Variation and Change, 30(1), 23-50.
  • Gerard Van Herk, the Petty Harbour Corpus
    • Childs, B., Van Herk, G. & Thorburn, J. (2011). Safe Harbour: Ethics and Accessibility in Sociolinguistic Corpus Building. CLLT 7(1), 163-180.
  • Jessica Wormald, the PEBL Corpus
    • Wormald, J. (2016). Regional Variation in Panjabi-English. PhD
      thesis, University of York.
    • Wormald, J. (2015). Dynamic Variation in ‘Panjabi-English’: Analysis
      of F1 &F2 Trajectories for FACE /eɪ/ and GOAT /əʊ/. In The Scottish
      Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th
      International Congress of Phonetic Sciences.
      Glasgow, UK: the
      University of Glasgow. ISBN 978-0-85261-941-4. Paper number 0809
      retrieved from
      https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0809.pdf

The SPADE project team would also like to thank the SPADE Consortium members and Data Guardians of the following corpora for allowing their corpora to be used in developing the ISCAN software:

  • The WYRED corpus
    • Gold, E., Ross, S., & Earnshaw, K. (2018). The ‘West Yorkshire Regional English Database’: Investigation into the generalizability of reference populations for forensic speaker comparison paperwork. Proc. Interspeech, Sep 2-6 2018, Hyderabad, 2748-2752.

The SPADE project team would furthermore like to thank those corpus Data Guardians and SPADE Consortium members who wish to remain anonymous for allowing their corpora to be used in developing the ISCAN software.


The SPADE project team would additionally like to acknowledge the collection and/or use of the following publicly available datasets:

  • Audio BNC
    • Coleman, J., Baghai-Ravary, L., Pybus, J. & Grau, S. (2012). Audio BNC: the audio edition of the Spoken British National Corpus.
      Phonetics Laboratory, University of Oxford. http://www.phon.ox.ac.uk/AudioBNC
  • Buckeye
    • https://buckeyecorpus.osu.edu/
    • Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume,
      E. and Fosler-Lussier, E. (2007) Buckeye Corpus of Conversational
      Speech (2nd release)
      [www.buckeyecorpus.osu.edu] Columbus, OH:
      Department of Psychology, Ohio State University (Distributor).
  • CORAAL
    • http://lingtools.uoregon.edu/coraal/
    • Kendall, T. & Farrington, C. (2018). The Corpus of Regional
      African American Language.
      Version 2018.10.06. Eugene. The Online
      Resources for African American Language Project.
      http://coraal.uoregon.edu/coraal
  • Doubletalk
    • http://espf.ppls.ed.ac.uk/
    • Scobbie, J.M., Turk, A., Geng, C., King, S., Lickley, R., & Richmond,
      K. (2013). The Edinburgh Speech Production Facility DoubleTalk
      Corpus. Proceedings of 14th Interspeech, Lyon.
    • Geng, C, Turk, A, Scobbie, JM, Macmartin, C, Hoole, P, Richmond, K,
      Wrench, A, Pouplier, M, Bard, E, Campbell, Z, Dickie, C, Dubourg, E,
      Hardcastle, W, Kainada, E, King, S, Lickley, R, Nakai, S, Renals, S,
      White, K & Wiegand, R. (2013). Recording speech articulation in
      dialogue: Evaluating a synchronized double electromagnetic
      articulography setup. Journal of Phonetics, 41(6), 421-431. DOI:
      10.1016/j.wocn.2013.07.002
    • Further thanks to Alice Turk for drawing our attention to this corpus.
  • DyViS
    • Nolan, F., Dynamic Variability in Speech: a Forensic Phonetic Study
      of British English
      , 2006-2007 [computer file]. Colchester, Essex: UK
      Data Archive [distributor], July 2011. SN: 6790 ,
      http://dx.doi.org/10.5255/UKDA-SN-6790-1. ESRC Grant no
      RES-000-23-1248.
    • Nolan, F., McDougall, K., de Jong, G. & Hudson, T. (2009). ‘The DyViS
      database: style-controlled recordings of 100 homogeneous speakers for
      forensic phonetic research’, International Journal of Speech,
      Language and the Law 16(1)
      , 31-57.
  • Edinburgh (Arthur the Rat)
    • University of Edinburgh. School of Philosophy, Psychology, and
      Language Sciences. Department of Linguistics and English Language.
      (2013). Arthur the Rat, 1949-1966 [sound]. dx.doi.org/10.7488/ds/163.
      https://datashare.is.ed.ac.uk/handle/10283/392
    • Further thanks to James Kirby for aligned textgrids.
  • ICE-CAN
    • http://ice-corpora.net/ice/
  • IViE
    • Grabe, E., Nolan, F. & Post, B. English Intonation in the British
      Isles: The IViE Corpus.
      Phonetics Laboratory, University of Oxford,
      Department of Linguistics, University of Cambridge, ESRC Grant
      R000237145, 1997-2002.
      http://www.phon.ox.ac.uk/files/apps/IViE/
  • LUCID
    • Baker, R & Hazan, V. (2010). LUCID: A corpus of spontaneous and read clear speech in British English.
      In: (Proceedings) DISS-LPSS Joint workshop, Tokyo, Japan, 24-25 September 2010, pp 3-6.
    • Further thanks to Valerie Hazan for drawing our attention to this corpus.
  • Northern Englishes
    • Haddican, W., Foulkes, P. (2013). A comparative study of language
      change in Northern Englishes.
      [data collection]. UK Data Service. SN:
      851013, http://doi.org/10.5255/UKDA-SN-851013
    • Further thanks to Márton Sóskuthy for aligned TextGrids.
  • Santa Barbara
    • http://www.linguistics.ucsb.edu/research/santa-barbara-corpus
    • Du Bois, John W., Wallace L. Chafe, Charles Meyer, Sandra A.
      Thompson, Robert Englebretson, and Nii Martey. 2000-2005. Santa
      Barbara corpus of spoken American English, Parts 1-4.
      Philadelphia:
      Linguistic Data Consortium.
  • The SCOTS corpus
    • https://www.scottishcorpus.ac.uk/
    • Anderson, J., Beavan, D.& Kay, C. (2007). Scots: Scottish corpus of texts and speech. In: Creating and digitizing language corpora. Springer 17–34.
  • The Sound Atlas of Irish English corpus
    • Hickey, R. (2004). A Sound Atlas of Irish English. Berlin, Boston: De Gruyter Mouton.
  • Switchboard
    • https://catalog.ldc.upenn.edu/ldc97s62
    • Godfrey, John, and Edward Holliman. Switchboard-1 Release 2 LDC97S62.
      Web Download. Philadelphia: Linguistic Data Consortium, 1993.
  • TIMIT
    • https://catalog.ldc.upenn.edu/LDC93S1
    • Garofolo, John S., et al. TIMIT Acoustic-Phonetic Continuous Speech
      Corpus LDC93S1.
      Web Download. Philadelphia: Linguistic Data
      Consortium, 1993.