Ergun Biçici. Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation. Springer Nature Computer Science, 3, 2022. ISSN: 2661-8907. [doi:10.1007/s42979-022-01183-0]
Machine translation performance prediction (MTPP) system (MTPPS) is an automatic, accurate, language and natural language processing (NLP) output independent prediction model. MTPPS is optimal by the capability to predict translation performance without even using the translation by using only the source, bypassing MT model complexity. MTPPS was casted for tasks involving similarity of text in machine translation (MT), semantic similarity, and parsing of sentences. We present large scale modeling and prediction experiments on MTPP dataset (MTPPDAT) covering $3800$ document- and $380000$ sentence-level prediction in $7$ different domains using $3800$ different MT systems. We provide theoretical and experimental results, empirical lower and upper bounds on the prediction tasks, rank the features used, and present current results. We show that we only need $57$ labeled instances at the document-level and $17$ at the sentence-level to reach current prediction results. MTPPS achieve $4\%$ error rate at the document-level and $45\%$ at the sentence-level relative to the magnitude of the target, $61\%$ and $27\%$ relatively better than a mean predictor correspondingly, and $40\%$ better than the nearest neighbor baseline. Referential translation machines use MTPPS and achieve top results.