August 19, 2021

Parallel Feature Weight Decay Algorithms for Fast Development of Machine Translation Models

Ergun Biçici. Parallel Feature Weight Decay Algorithms for Fast Development of Machine Translation Models. Machine Translation, volume 35, pages 239 - 263, 2021. ISSN: 0922-6567.

Parallel feature weight decay algorithms, parfwd, are engineered for language- and task-adaptive instance selection to build distinct machine translation (MT) models and enable the fast development of accurate MT using fewer data and less computation. parfwd decay the weights of both source and target features to increase their average coverage. In a conference on MT (WMT), parfwd achieved the lowest translation error rate from French to English in 2015, and a rate $11.7\%$ less than the top phrase-based statistical MT (PBSMT) in 2017. parfwd also achieved a rate $5.8\%$ less than the top in TweetMT and the top from Catalan to English. BLEU upper bounds identify the translation directions that offer the largest room for relative improvement and MT models that use additional data. Performance trends angle shows the power of MT models to convert unit data into unit translation results or more BLEU for an increase in coverage. The source coverage angle of parfwd in the 2013--2019 WMT reached +6\textdegree \, better than the top with $35$\textdegree \, for translation into English, and it was +1.4\textdegree \, better than the top with $22$\textdegree \, overall.