Note: this blog post was updated in October 2016.
The efficacy of machine translators like Google Translate has been tested for years, but how does it relate to the theory of plagiarism? To understand this, let’s take a quick look at machine and human translation processes.
Most providers of translation services use a technology known as Translation Memory. This technology basically stores translation created and reviewed by human translators in a database. The idea behind this effort is to store the sentences so that if the same sentence comes up again, the same translation is reused. This technology provides faster completion times, higher accuracy, more consistency and of course a lower cost.
Google Translate software operates on what they call Statistical Machine Translation. Using monolingual text and aligned text, or existing human translations, the software creates a translation model for each language. Google is basically building a kind of Translation Memory engine, and we all know that Google is incredibly good at indexing information.
In 2010, the efficacy of Google Translate was challenged in a New York Times article. After reviewing the article and the results of the various tests I couldn’t help but notice that the power of Google’s translation efforts are rooted in plagiarism.
The article references the published book “The Little Prince” as an example. How does Google translate that? It’s a very popular book that has been translated into more than 180 languages and sold over 80 million copies worldwide. So it is easy to understand that the dialogue from “The Little Prince” and other popular books should be easy for Google to convert into other languages. After all, the work has already been done by humans. Google is just indexing the material.
But, if I submit a passage for translation to Google and it spits back text coming from the translation of a published work, then that is considered plagiarism. Plagiarism is generally defined as the unauthorized use or close imitation of the language and thoughts of another author and the representation of them as one’s own original work.
So how do Language Service Providers handle the large body of content in their own translation memories, and is there a risk of plagiarism? At Argo Translation, we store the data separated by each client, which basically eliminates the risk of plagiarism. This is by design. We have non-disclosure agreements with many of our clients as we translate some very sensitive information like market research, clinical studies, marketing materials, internal memos, and legal documentation. Sharing this information with no consideration to our clients is not a good idea.
From the Machine Translation side: what does Google do with materials you translate and refine using their tools? Re-use it, of course. The idea behind Google’s DocTranslator app is that you will pre-translate a document using the Google Translation software and then refine the translation to meet your needs. The refined translation then goes back into the database for reuse.
The Google Translate software recently received an accuracy boost with the new Google Neural Machine Translation system. The new system boasts the reduction of errors by up to 60% by utilizing the neural network, similar to human memory. However, the system is still rooted in indexing, therefore it isn’t anticipated to have an effect on the current theory of plagiarism.
For more about machine vs. human translation, check out this blog post.