December 13, 2015

Referential translation machines for predicting semantic similarity

Ergun Biçici and Andy Way. Referential translation machines for predicting semantic similarityLanguage Resources and Evaluation, pp 1-27, 2015. ISSN: 1574-020X. [WWW] [doi:10.1007/s10579-015-9322-7]

Referential translation machines (RTMs) are a computational model effective at judging monolingual and bilingual similarity while identifying translation acts between any two data sets with respect to interpretants. RTMs pioneer a language-independent approach to all similarity tasks and remove the need to access any task or domain-specific information or resource. We use RTMs for predicting the semantic similarity of text and present state-of-the-art results showing that RTMs can achieve better results on the test set than on the training set. RTMs judge the quality or the semantic similarity of texts by using relevant retrieved training data as interpretants for reaching shared semantics. Interpretants are used to derive features measuring the closeness of the test sentences to the training data, the difficulty of translating them, and the presence of the acts of translation, which may ubiquitously be observed in communication. RTMs achieve top performance at SemEval in various semantic similarity prediction tasks as well as similarity prediction tasks in bilingual settings. We define MAER, mean absolute error relative to the magnitude of the target, and MRAER, mean absolute error relative to the absolute error of a predictor always predicting the target mean assuming that target mean is known. RTM test performance on various tasks sorted according to MRAER can help identify which tasks and subtasks require more work by design.