05.11.2010 by Peter Argondizzo

Google Translate and the Power of Plagiarism

The efficacy of Google’s Translation engine was recently put to the test by a New York Times article. After reviewing the article and the results of the various tests I couldn’t help but notice that the power of Google’s translation efforts are rooted in plagiarism.

Most providers of translation services use a technology known as Translation Memory. This technology basically stores translation created and reviewed by human translators in a database. The idea behind this effort is to store the sentences so that if the same sentence comes up again the same translation is reused. This technology provider for faster completion times, higher accuracy, more consistency and of course a lower cost.

Google is basically building a Translation Memory engine. We all know that Google is incredibly good at indexing information. The effort behind Google Translation indexes web content as well as many published works through its Google Books project. The NY times article shows that Google does a reasonably good job with the translation of published works. This is an obvious result for books that have been translated into many languages. The example of the passage from The Little Prince is obviously quite good. The book has been translated into more than180 languages and has sold over 80 million copies worldwide.

So it is easy to understand that the dialog from The Little Prince and other popular books should be easy for Google to convert into other languages. Afterall, the work has already been done by humans. Google is just indexing the material.

Isn’t this a form of Plagiarism? If I submit a passage for translation to Google and it spits back text coming from the translation of a published work, that is plagiarism. Plagiarism is generally defined as the unauthorized use or close imitation of the language and thoughts of another author and the representation of them as one’s own original work. I wonder if Google checked with Antoine de Saint-Exupéry’s estate before indexing his book and the subsequent translations of the The Little Prince?

The natural question is how do Language Service providers handle the large body of content in their own translation memories. In our firm, we actually store the data separated by each client. This is by design. We have non disclosure agreements with most of our clients. We translate some very sensitive information like market research, clinical studies, marketing materials, internal memos and legal documentation. Sharing this information with no consideration to our clients is not a good idea.

The other side of this discussion is what does Google do with materials you translate and refine using their tools? Re-use it of course. The idea behind the newly minted DocTranslator is that you will pretranslate a document using the Google Translation engine and then refine the translation to meet your needs. The refined translation now goes back into the database for reuse.

The internet has become a valuable tool for competitive intelligence. Will translation be the next tool for valuable competitive intelligence. Will applications come out that query the Google engine for terminology in an effort to discover what competitors are up to? I would assume that if companies start openly using Google as their translation engine this could happen.