Pages Navigation Menu

The blog of DataDiggers

Categories Navigation Menu

Google’s Translatotron converts one spoken language to another, no text involved

Posted by on May 15, 2019 in Artificial Intelligence, Google, machine learning, machine translation, Science, Translation | 0 comments

Every day we creep a little closer to Douglas Adams’ famous and prescient Babel fish. A new research project from Google takes spoken sentences in one language and outputs spoken words in another — but unlike most translation techniques, it uses no intermediate text, working solely with the audio. This makes it quick, but more importantly lets it more easily reflect the cadence and tone of the speaker’s voice.

Translatotron, as the project is called, is the culmination of several years of related work, though it’s still very much an experiment. Google’s researchers, and others, have been looking into the possibility of direct speech-to-speech translation for years, but only recently have those efforts borne fruit worth harvesting.

Translating speech is usually done by breaking down the problem into smaller sequential ones: turning the source speech into text (speech-to-text, or STT), turning text in one language into text in another (machine translation), and then turning the resulting text back into speech (text-to-speech, or TTS). This works quite well, really, but it isn’t perfect; each step has types of errors it is prone to, and these can compound one another.

Furthermore, it’s not really how multilingual people translate in their own heads, as testimony about their own thought processes suggests. How exactly it works is impossible to say with certainty, but few would say that they break down the text and visualize it changing to a new language, then read the new text. Human cognition is frequently a guide for how to advance machine learning algorithms.

Spectrograms of source and translated speech. The translation, let us admit, is not the best. But it sounds better!

To that end, researchers began looking into converting spectrograms, detailed frequency breakdowns of audio, of speech in one language directly to spectrograms in another. This is a very different process from the three-step one, and has its own weaknesses, but it also has advantages.

One is that, while complex, it is essentially a single-step process rather than multi-step, which means, assuming you have enough processing power, Translatotron could work quicker. But more importantly for many, the process makes it easy to retain the character of the source voice, so the translation doesn’t come out robotically, but with the tone and cadence of the original sentence.

Naturally this has a huge impact on expression, and someone who relies on translation or voice synthesis regularly will appreciate that not only what they say comes through, but how they say it. It’s hard to overstate how important this is for regular users of synthetic speech.

The accuracy of the translation, the researchers admit, is not as good as the traditional systems, which have had more time to hone their accuracy. But many of the resulting translations are (at least partially) quite good, and being able to include expression is too great an advantage to pass up. In the end, the team modestly describes their work as a starting point demonstrating the feasibility of the approach, though it’s easy to see that it is also a major step forward in an important domain.

The paper describing the new technique was published on Arxiv, and you can browse samples of speech, from source to traditional translation to Translatotron, at this page. Just be aware that these are not all selected for the quality of their translation, but serve more as examples of how the system retains expression while getting the gist of the meaning.


Source: The Tech Crunch

Read More

Google Home can now translate conversations on-the-fly

Posted by on Feb 5, 2019 in Gadgets, Google, google home, TC, Translation | 0 comments

Just last month, Google showed off an “Interpreter mode” that would let Google Home devices act as an on-the-fly translator. One person speaks one language, the other person speaks another, and Google Assistant tries to be the middleman between the two.

They were only testing it in select locations (hotel front desks, mostly) at the time, but it looks like it’s gotten a much wider rollout now.

Though Google hasn’t officially announced it, AndroidPolice noticed that a support page for the feature just went public. We tested it on our own Google Home devices, and sure enough: interpreter mode fired right up.

To get started, you just say something like “Hey Google, be my Spanish interpreter,” or “Hey Google, help me speak Italian.”

Curiously, you currently have to say the initial command in English, French, German, Italian, Japanese or Spanish, but once it’s up and running you should be able to translate between the following languages:


• Czech
• Danish
• Dutch
• English
• Finnish
• French
• German
• Greek
• Hindi
• Hungarian
• Indonesian
• Italian
• Japanese
• Korean
• Mandarin
• Polish
• Portuguese
• Romanian
• Russian
• Slovak
• Spanish
• Swedish
• Thai
• Turkish
• Ukrainian
• Vietnamese

It works pretty well for basic conversations in our quick testing, but it has its quirks. Saying “Goodbye,” for example, ends the translation rather than translating it into the target language, which might be a little confusing if one half of the conversation didn’t realize the chat was nearing its end.

The new feature should work on any Google Home device — and if it’s one with a screen (like Google’s Home Hub), you’ll see the words as they’re translated.


Source: The Tech Crunch

Read More

Judge says ‘literal but nonsensical’ Google translation isn’t consent for police search

Posted by on Jun 15, 2018 in Artificial Intelligence, Google, Google Translate, Government, machine translation, Translation | 2 comments

Machine translation of foreign languages is undoubtedly a very useful thing, but if you’re going for anything more than directions or recommendations for lunch, its shallowness is a real barrier. And when it comes to the law and constitutional rights, a “good enough” translation doesn’t cut it, a judge has ruled.

The ruling (PDF) is not hugely consequential, but it is indicative of the evolving place in which translation apps find themselves in our lives and legal system. We are fortunate to live in a multilingual society, but for the present and foreseeable future it seems humans are still needed to bridge language gaps.

The case in question involved a Mexican man named Omar Cruz-Zamora, who was pulled over by cops in Kansas. When they searched his car, with his consent, they found quite a stash of meth and cocaine, which naturally led to his arrest.

But there’s a catch: Cruz-Zamora doesn’t speak English well, so the consent to search the car was obtained via an exchange facilitated by Google Translate — an exchange that the court found was insufficiently accurate to constitute consent given “freely and intelligently.”

The fourth amendment prohibits unreasonable search and seizure, and lacking a warrant or probable cause, the officers required Cruz-Zamora to understand that he could refuse to let them search the car. That understanding is not evident from the exchange, during which both sides repeatedly fail to comprehend what the other is saying.

Not only that, but the actual translations provided by the app weren’t good enough to accurately communicate the question. For example, the officer asked “¿Puedo buscar el auto?” — the literal meaning of which is closer to “can I find the car,” not “can I search the car.” There’s no evidence that Cruz-Zamora made the connection between this “literal but nonsensical” translation and the real question of whether he consented to a search, let alone whether he understood that he had a choice at all.

With consent invalidated, the search of the car is rendered unconstitutional, and the charges against Cruz-Zamora are suppressed.

It doesn’t mean that consent is impossible via Google Translate or any other app — for example, if Cruz-Zamora had himself opened his trunk or doors to allow the search, that likely would have constituted consent. But it’s clear that app-based interactions are not a sure thing. This will be a case to consider not just for cops on the beat looking to help or investigate people who don’t speak English, but in courts as well.

Providers of machine translation services would have us all believe that those translations are accurate enough to use in most cases, and that in a few years they will replace human translators in all but the most demanding situations. This case suggests that machine translation can fail even the most basic tests, and as long as that possibility remains, we have to maintain a healthy skepticism.


Source: The Tech Crunch

Read More