Function Word Generation in Statistical Machine Translation Systems

MT Summit 2011 |

Function words play an important role in sentence structures and express grammatical relationships with other words. Most statistical machine translation (SMT) systems do not pay enough attention to translations of function words which are noisy due to data sparseness and word alignment errors. In this paper, a novel method is designed to separate the generation of target function words from target content words in SMT decoding. With this method, the target function words are deleted before the translation modeling while in SMT decoding they are inserted back into the translations. To guide the target function words insertion, a new statistical model is proposed and integrated into the log-linear model for SMT, which can lead to better reordering and partial hypotheses ranking. The experimental results show that our approach improves the SMT performance significantly on Chinese-English translation task.