Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus

Kalvin Hartwig, Evan Lucas, Timothy Havens


Abstract
The Ojibwe language has several dialects that vary to some degree in both spoken and written form. We present a method of using support vector machines to classify two different dialects (Eastern and Southwestern Ojibwe) using a very small corpus of text. Classification accuracy at the sentence level is 90% across a five-fold cross validation and 72% when the sentence-trained model is applied to a data set of individual words. Our code and the word level data set are released openly on Github at [link to be inserted for final version, working demonstration notebook uploaded with paper].
Anthology ID:
2023.americasnlp-1.8
Volume:
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
58–66
Language:
URL:
https://aclanthology.org/2023.americasnlp-1.8
DOI:
10.18653/v1/2023.americasnlp-1.8
Bibkey:
Cite (ACL):
Kalvin Hartwig, Evan Lucas, and Timothy Havens. 2023. Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 58–66, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus (Hartwig et al., AmericasNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.americasnlp-1.8.pdf