Collotextualisation: An alternative approach to studying loanwords.

Date:

Quick links: video, slides

Joint presentation with Andreea S. Calude.

In traditional studies, word borrowing has been investigated through frequency-based measures, such as number of types and tokens in a corpus (see Poplack 2018 and references within; New Zealand English examples include Davies & Maclagan 2006, de Bres 2006 and Macalister 2006). This talk introduces an alternative approach to the study of loanwords, which involves building networks of collocation (Firth 1957; see also its grammatical parallel, collostruction, Stefanowitsch & Gries 2003), by extracting sets of borrowings that co-occur within the same text. Collocation is usually operationalised a priori with a specified window size (e.g. five words to the left or right of the keyword); however, the texts analysed are typically much larger than this window and may differ in length. Consequently, we extend the notion of collocation to what we term “collotextualisation”: capturing co-occurrence across the entire text, regardless of size.

We present a case-study of how collotextualisation can be used to complement conventional frequency-of-use measures when exploring loanwords. We compare Māori loanword use across three different corpora of New Zealand English newspaper articles and report three main findings. First, most loanwords in our data occur with several others, supporting the notion that loanwords occur in sets rather than in isolation (see also Macdonald & Daly 2013). Second, there is an inverse relationship between the length of a set and frequency of use, which means that newspaper articles are unlikely to contain infrequent loanwords and no frequently occurring ones. This is consistent with the idea that loanwords might occur in vocabulary frequency bands (as proposed for measuring L2 vocabulary; see Laufer & Nation 1995 and Nation 2006). Third, frequent loanwords take part in more distinct and recurrent relationships than infrequent ones, and are typically the first to occur in a given text.