Lenin Laitonjam


2023

pdf bib
Findings of the WMT 2023 Shared Task on Low-Resource Indic Language Translation
Santanu Pal | Partha Pakray | Sahinur Rahman Laskar | Lenin Laitonjam | Vanlalmuansangi Khenglawt | Sunita Warjri | Pankaj Kundan Dadure | Sandeep Kumar Dash
Proceedings of the Eighth Conference on Machine Translation

This paper presents the results of the low-resource Indic language translation task organized alongside the Eighth Conference on Machine Translation (WMT) 2023. In this task, participants were asked to build machine translation systems for any of four language pairs, namely, English-Assamese, English-Mizo, English-Khasi, and English-Manipuri. For this task, the IndicNE-Corp1.0 dataset is released, which consists of parallel and monolingual corpora for northeastern Indic languages such as Assamese, Mizo, Khasi, and Manipuri. The evaluation will be carried out using automatic evaluation metrics (BLEU, TER, RIBES, COMET, ChrF) and human evaluation.

2021

pdf bib
Manipuri-English Machine Translation using Comparable Corpus
Lenin Laitonjam | Sanasam Ranbir Singh
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

Unsupervised Machine Translation (MT) model, which has the ability to perform MT without parallel sentences using comparable corpora, is becoming a promising approach for developing MT in low-resource languages. However, majority of the studies in unsupervised MT have considered resource-rich language pairs with similar linguistic characteristics. In this paper, we investigate the effectiveness of unsupervised MT models over a Manipuri-English comparable corpus. Manipuri is a low-resource language having different linguistic characteristics from that of English. This paper focuses on identifying challenges in building unsupervised MT models over the comparable corpus. From various experimental observations, it is evident that the development of MT over comparable corpus using unsupervised methods is feasible. Further, the paper also identifies future directions of developing effective MT for Manipuri-English language pair under unsupervised scenarios.

2016

pdf bib
Automatic Syllabification for Manipuri language
Loitongbam Gyanendro Singh | Lenin Laitonjam | Sanasam Ranbir Singh
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Development of hand crafted rule for syllabifying words of a language is an expensive task. This paper proposes several data-driven methods for automatic syllabification of words written in Manipuri language. Manipuri is one of the scheduled Indian languages. First, we propose a language-independent rule-based approach formulated using entropy based phonotactic segmentation. Second, we project the syllabification problem as a sequence labeling problem and investigate its effect using various sequence labeling approaches. Third, we combine the effect of sequence labeling and rule-based method and investigate the performance of the hybrid approach. From various experimental observations, it is evident that the proposed methods outperform the baseline rule-based method. The entropy based phonotactic segmentation provides a word accuracy of 96%, CRF (sequence labeling approach) provides 97% and hybrid approach provides 98% word accuracy.