Erik Novak,
Luka Bizjak,
Dunja Mladenić,
Publication date
Total citations
Modern cross-lingual document retrieval models are capable of finding documents relevant to the query. However, they do not have the capabilities for explaining why the document is relevant. This paper proposes a novel learning-to-rank model named LM-EMD that uses the multilingual BERT language model and Earth Mover’s Distance (EMD) to measure the document’s relevancy to the input query and provide interpretable insights into why a document is relevant. The model uses the query and document token’s contextual embeddings generated with multilingual BERT to measure their distances in the embedding space, which are then used by EMD to calculate the document’s relevance score and identify which document tokens contribute the most to its relevancy. We evaluate the model on five language pairs of varying degrees of similarity and analyze its performance. We find that the model (1) performs …