Google: Translate AI translated thanks to self-learned language

Google translated into all the languages ​​of AI

Truly multilingual: The new algorithm of Google Translate even accepts input records that go halfway language.

Google Translate translated immediately with a single neural network of each of the 103 supported languages ​​in any other - even if it could not learn with sample sentences for a language pair. Translations in rare languages ​​benefit.

Google has its translation services Translate converted to a single neural network which controls all 103 languages. Thus, no longer each have their own neural network takes care of a pair of source and target language. With the multilingual Translate, Google saves to maintain the effort of thousands of translation systems for individual language pairs, while improving the quality of translations for language pairs to which Google has less training data.

In a blog post Google says that the system even translated sense between languages, which existed no training examples of the direct translation. The system was trained with examples between English and Japanese and English and Korean, it was to translate then also able between Japanese and Korean. Therefore, Google suspects that her translation system internally learned a universal language.

The technology behind Google Translate

Google Translate consists of three interlinked working neural networks. A recurrent neural network with eight layers of LSTM neurons (Long-Short-Term Memory) reads the parts of the word entries. The first layer reads the set both front to rear and from rear to front. LSTMs can store information that they have encountered in previous records. Thus, the first layer word parts interpret sense, the meaning is influenced by other parts of words in the sentence. From layer two, the decoder network then operates only from front to back.

LSTMs with more than four layers usually do not train efficiently located. Therefore Google skips any layer with an additional residual compound is added to activate the inputs to the outputs. This trick is Microsoft succeeded in a very deep to train convolutional neural network and thus to win the last ImageNet Competition for image recognition.

The output of the decoder is a series of multi-dimensional vectors with a fixed width. These represent the "words" in the suspected Google universal language. In addition, they serve as input to the encoder network, which also consists of eight LSTM layers. The still gets additional guidance from a small network of normal neurons with only three layers. The little network is used to draw the attention of the decoder to important parts of the sentence.

When a decoder beam search algorithm is used to find a phrase in the target language that matches as closely as possible to the inputs but also translates all the words you type. To assess the results during training, Google uses a fitness function that is based on the BLEU algorithm. details "Google's Neural Machine Translation system" explains the associated paper.

Zero-Shot Translation

The vectors within the neural network at the interface between decoder and encoder are close together in different languages ​​at the same meaningless sentences. The vectors within the neural network at the interface between decoders and encoders are close together in different languages ​​at the same meaningless sentences.(Image: Google)

In a second paper explains Google's research team (PDF) why the network probably learn a universal language. The researchers transformed vectors at the interface between the decoder and encoder with the t-SNE algorithm in three-dimensional space. Here came out for the sets from different source languages ​​points that were close together. Translated the network between a language pair for which there were no examples in the training data, the vectors were also close to the rates of other languages.

Quite universally the intermediate representation seems not to be, as Google integrates the target language as a special part of word in the input data. But even if Google has not yet found the holy grail of linguistics, proves the Translate AI that it encodes the meaning of sentences to a certain extent. So the AI ​​is the natural language understanding closer to a big step.

More about neural networks:

(JME)