Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Email: andros.tjandra.ai6@is.naist.jp

Tested on Google Chrome


In this paper, we explore a method for training speech-to-speech translation tasks without any transcription or linguistic supervision. Our proposed method consists of two steps: First, we train and generate discrete representation with unsupervised term discovery with a discrete quantized autoencoder. Second, we train a sequence-to-sequence model that directly maps the source language speech to the target languageā€™s discrete representation. Our proposed method can directly generate target speech without any auxiliary or pre-training steps with a source or target transcription. To the best of our knowledge, this is the first work that performed pure speech-to-speech translation between untranscribed unknown languages.

ARXIV URL: https://arxiv.org/abs/1910.00795

Translation result from [ France , Japanese ] to English

Source JA Source FR Groundtruth EN FR-EN Speech2Code JA-EN Speech2Code FR-EN Topline JA-EN Topline