The Impact of Word Segmentation on CCG-based Arabic-English SMT
Abstract
This paper presents a comparative study of two approaches to statistical machine translation (SMT). We present a study on Factored Machine Translation for the Arabic–English pair of languages. We illustrate pre-processing step for the Arabic source language and the new factors which added to the English target language. Our experiments that injected English side by part-ofspeech (POS) tags, Combinatory Categorical Grammar (CCG) supertags and segmented Arabic sentences displayed a considerable progress in terms of the BLEU scores. First experiments gained the premier examine of the Baseline phrase-based models during two approaches, Segmented (#S1) and non-segmented (#S2). In both approaches, CCG models acquired the greatest BLEU scores. The results show that the segmented approach which consists of CCG produce the highest accuracy at 30.91% (1.84% superior than baseline). As shown, the results presented considerable improvement in translating the segmented Arabic rather than the non-segmented into English language.
Keywords
Phrase-based translation model, Combinatory Categorial Grammar, Part-of-speech, Factored translation model, Word segmentation
DOI
10.12783/dtcse/aita2017/16013
10.12783/dtcse/aita2017/16013
Refbacks
- There are currently no refbacks.