Google Translate and the Power of Plagiarism

Innovative Technology

The efficacy of Google's Translation engine was recently put to the test by a New York Times article. After reviewing the article and the results of the various tests, I couldn't help but notice that the power of Google's translation efforts is rooted in plagiarism.

Most providers of translation services use a technology known as Translation Memory. This technology stores translations created and reviewed by human translators in a database. The idea behind this effort is to store the sentences so that the exact translation is reused if the same sentence comes up again. This technology provides faster completion times, higher accuracy, more consistency, and a lower cost.

Google is building a Translation Memory engine. We all know that Google is incredibly good at indexing information. The effort behind Google Translation indexes web content and many published works through its Google Books project. The New York Times article shows that Google does a reasonably good job translating published works. This effort is an obvious result of books that have been translated into many languages. The example of the passage from The Little Prince is obviously quite good. The book has been translated into more than 180 languages and has sold over 80 million copies worldwide.

It is easy to understand that the dialog from The Little Prince and other famous books should be easy for Google to convert into other languages. After all, the work has already been done by humans. Google is just indexing the material.

Isn't this a form of plagiarism? If I submit a passage for translation to Google and it spits back text coming from the translation of a published work, that is plagiarism. Plagiarism is generally defined as the unauthorized use or close imitation of the language and thoughts of another author and the representation of them as one's own original work. I wonder if Google checked with Antoine de Saint-Exupéry's estate before indexing his book and the subsequent translations of The Little Prince?

The natural question is how Language Service providers handle the large body of content in their own translation memories. In our firm, we store the data separately by each client. This strategy is by design. We have nondisclosure agreements with most of our clients. We translate some very sensitive information like market research, clinical studies, marketing materials, internal memos, and legal documentation. Sharing this information without consideration for our clients is not a good idea.

The other side of this discussion is what Google does with materials you translate and refine using their tools. Reuse it, of course. The idea behind the DocTranslator or any other translation tools in the Google Translate suite of products is that you will pre-translate a document using the Google Translation engine and then refine the translation to meet your needs. The refined translation now goes back into the database for reuse.

The internet has become a valuable tool for competitive intelligence. Will translation be the next tool for valuable competitive intelligence? Will applications come out that query the Google engine for terminology to discover what competitors are up to? If companies start openly using Google as their translation engine, this could happen.