Leveraging Loanword Constraints for Improving Machine Translation in Low-resource Settings
- Felermino Ali
Translating from high-resource to low-resource languages like Emakhuwa remains a challenge due to limited parallel data, orthographic variation, and frequent loanwords and code-switching. In this talk Felermino will discuss how to apply lexicon-guided neural machine translation, integrating bilingual dictionaries, and loanword mappings into the training process to address this challenge.
Our method uses over 8,000 dictionary entries and 12,000 loanword mappings to build sentence-specific glossaries incorporated via input augmentation. Experiments on FLORES+ show improved lexical coverage, reduced inconsistencies, and more contextual accurate translations. Suggesting a promising direction for low-resource MT by bridging data scarcity and vocabulary gaps through structured lexical integration.
-
-
Felermino Ali
ML Researcher
-
-
Regardez suivant
-
-
Session: Compute & Trust (Systems)
- Ashish Panwar,
- Aditya Desai,
- Abhilash Jindal
-
Multimodal & Embodied Intelligence (Pt 1), Panel on Multimodal AI: Progress, Pitfalls, Possibilities
- Madhava Krishna,
- Sriram Ganapathy,
- Somak Aditya
-
Session on Compute & Trust (Security)
- Krishna Pillutla,
- Danish Pruthi
-
-
Session on Reasoning
- Hongxiang Fan,
- Nagarajan Natarajan
-
-
Session on Retrieval
- Lokesh Nagalapatti,
- Soumen Chakrabarti
-
-