Joint presentation with Andreea S. Calude for “the first international Twitter conference on linguistics”.
Word borrowing is often investigated using frequency-based measures, such as types and tokens in a corpus. We introduce an alternative approach to studying loanwords, which involves building collocation networks, based on sets of borrowings that co-occur within the same text.
For more than a hundred years, linguists have puzzled over questions regarding word borrowing. Empirical studies generally capture loanword use either by analysing types and tokens of borrowed words in a corpus, or by comparing type/token frequency with near-synonyms native to the receiver language. In such studies, the unit of measurement for investigating loanwords is frequency of use.
This presentation will introduce an alternative approach to studying loanwords. Our method involves building networks of collocation by extracting sets of borrowings that co-occur within the same text. We refer to these sets as “intra-textual relationships”. Collocation is usually operationalised a priori with a specified window size (e.g. five words to the left or right of the keyword); however, the texts analysed are typically much larger than this window and may differ in length. Consequently, we extend the notion of collocation to what we term “collotextualisation”: capturing co-occurrence across the entire text, regardless of size.
We present a case-study of how collotextualisation can be used to complement conventional frequency-of-use measures when exploring loanwords. The data in our analysis consists of New Zealand English newspaper articles, which we use to study indigenous Māori words. Our corpus is themed around Matariki, the Māori New Year, and spans a period of ten years (2007-2016). The corpus comprises 91,958 words and 194 texts, with a borrowing rate of 29 loanwords per 1,000 words. After extracting 107 borrowings that occur at least five times in the corpus, we analysed the data by leveraging a special type of network (called a hypergraph) that preserves intra-textual relationships involving multiple loanwords. This allowed us to bypass the limitations of a standard network, which flattens the data into (less meaningful) pairwise co-occurrences.
We show that hypergraphs can uncover fresh insights into loanword use, especially when explored over time or by examining the (average) size of the intra-textual relationships. We report three main findings. First, most loanwords in our data occur with at least four others (i.e. loans occur in sets rather than in isolation). Second, there is an inverse relationship between intra-textual co-occurrence size and frequency of use, which means that newspaper articles are unlikely to contain an infrequent loanword and no frequently occurring ones. This is consistent with the idea that loanwords might occur in vocabulary frequency bands. Third, frequent loanwords take part in more distinct and recurrent relationships than infrequent ones, and are typically the first to occur in a given text.