Quick links: slides
In this talk, we outline how computational tools can be used to obtain a large corpus of Tweets, and discuss trends identified in the use of Māori loanwords in this data.
Following a wealth of studies which document the use of Māori loanwords in newspaper language (Calude et al In press, Deverson 1991, Davies and Maclagen 2006, Levendis and Calude 2019, Macalister 2006, 2007, 2008, 2009) and a small number considering spoken language (from the late 1990s, Kennedy 2001, Calude et al 2017), children’s picture books (Daly 2007, 2008, 2017) and TV news broadcasts (De Bres 2006), we complement this body of data with analyses of Social Media language. To this end, we propose a novel method of building a corpus of NZE Tweets which is both (relatively) clean and large, using machine learning techniques. The MLT Corpus (Māori Loanword Twitter Corpus) affords the study of Twitter language diachronically (over a ten year period) and idiolectally (by user ID profile). Our data comprises a mix of manually labelled and automatically categorised Tweets (in total, approx. 1 million Tweets, and nearly 20 million word tokens).
Because our main interest lies with the use of Māori loanwords, we analyse patterns observed in the use of hybrid hashtags containing (at least) one Māori word and (at least) one native English word, e.g., #tereostories, #growingupkiwi. We first extracted all the hashtags in the MLT corpus, and then manually inspected them in order to find the 100 most frequently occurring hybrid hashtags. In the talk, we discuss (1) diachronic patterns of these hashtags over the ten-year period analysed, (2) their syntactic structure (categorising them into compound hybrid hashtags, and phrasal hybrid hashtags, see. Caleffi 2015), (3) their discourse function, and (4) trends in their position within Tweets (that is, whether they occur within the main text of the tweet, or as annexed tags at its periphery). Our findings point to creative and novel uses of Māori loanwords in Twitter, not unlike the phenomena classified under “word play” by Zirker and Winter-Froemel (2015).
We hope that this work can contribute to current knowledge of the use of Māori loanwords and to methods in large-scale corpus building.