Historically, linguistic research has tended to carry out fine-grained analysis of a few aspects of speech from one or a few languages or dialects. The current scale of speech research studies has shaped our understanding of spoken language and the kinds of questions that we ask. Today, speech research is entering its own ‘big data’ revolution – massive digital collections of transcribed speech are available from many different languages, gathered for many different purposes: from oral histories, to large datasets for training speech recognition systems, to legal and political interactions. Sophisticated speech processing tools exist to analyse these data, but require substantial technical skill. Given this confluence of data and tools, linguists have a new opportunity to answer fundamental questions about the nature and development of spoken language. SPADE seeks to establish the key tools to enable large-scale speech research. We seek to exploit methods from computing science and put them to work with tools and methods from speech science, linguistics and digital humanities, to discover how much the sounds of English across the Atlantic vary over space and time.
The project seeks to develop innovative and user-friendly software which exploits the availability of existing speech data and speech processing tools to facilitate large-scale integrated speech corpus analysis across many datasets together. The gains of such an approach are substantial: linguists will be able to scale up answers to existing research questions from one to many varieties of a language, and ask new and different questions about spoken language within and across social, regional, and cultural, contexts. Researchers in areas such as computational linguistics, speech technology, forensic linguistics, and clinical linguistics who engage with variability in spoken language, will also benefit directly from our software. This project will also open up vast potential for those who already use digital scholarship for spoken language collections in the humanities and social sciences more broadly. The possibility of ethically non-invasive inspection of speech will allow analysts to uncover far more than is possible through textual analysis alone.
In addition to new insights into spoken English, this project will also lay the crucial groundwork for large-scale speech studies across many datasets from different languages, of different formats and structures.
- Technical: To use new computational methods to develop usable, open-access, software to link and analyse spoken datasets of multiple forms and sources
- Research: To gain an improved understanding of stability and variation in spoken language, specifically in spoken Cross-Atlantic English (British Isles and North America), across space and time, through innovative large-scale speech data analysis
- Dissemination: To release software, deposit research data, and disseminate research findings within and across academic audiences, and to the international public