With all of the jokes and Facebook posts about erroneous auto-translate errors that float around, you would think there’s no reason to question machine translations.
Let’s look at some examples of how machine translation really works and what that means for the quality of the end result.
Consider how a person learns a language. The subtle collection of words and syntax through repetition over time. That produces one language worth of understanding.
For those exposed to multiple languages growing up or coming of age, the sense of play between competing syntaxes allows for a level of malleability.
In contrast, standard machine translations do their work by using dictionaries and set vocabularies and comparing one language to another. When a machine translates a phrase, it does so through a process of candidate words. The reason to choose a word and the ability to rank likeliness of candidates come through computational linguistics.
Different algorithms will power-weigh the candidates and return a most likely choice. These algorithms, much like the syntax play of a multi-lingual child, can create undesirable and often humorous results.
A human translator applies more than probability weight to a sentence. That is because they understand important factors between languages.
Neural machine translations, which can evolve over time and have access to the entirety of the internet to weigh potential translation choices, are significantly more accurate. As of 2016, Google boasted its translation tool could produce 5.43 out of 6 while humans produced 5.5 on a scale of 0-6 for translation accuracy. The gap between humans and machines is undoubtedly closing but we’re still not quite there for a variety of reasons.
Let’s look at some of the stumbling blocks that create problems for machines.
Content vs. Context
Particularly when faced with idioms and slang, a translations machine protocol will fail to understand the context while still embracing content.
Look at a simple example such as the English preposition ‘in.’ In translations from English to Arabic, a machine will default ‘in’ to ‘by.’ This works adequately when used in a sentence, “The omelet was made in his home.”
The problem comes from understanding ‘in his home’ references the location in which the making occurred. A swap to ‘by his home’ gives the impression of a sentient house.
So an English label stating, “Made in an allergy-free factory” translates into Arabic as if the product itself was made to be allergy-free. A product that was made in a factory careful not to cross-contaminate with allergens doesn’t mean the product itself isn’t an allergen.
The machine can’t mark this subtle difference because it sees the prepositions interchangeably as a matter of common use.
Addressee and Addressed
In countries which deal with honorifics and formal reference hierarchies, a machine defaults to a middle-of-the-road style of translation.
When it comes to explaining Khmer concepts to a computer, they will mistake both the addressee and the addressed. This will flatten the communication into peer to peer only. In the case of a child addressing a monk, this would be seen as highly rude.
Any type of advertisement or correspondence mistranslated in such a way would be a default insult. On top of that, it may completely miss the target demographic.
In a language with more than seven ways to express ‘to eat’ you don’t want a machine picking the most base, ‘hop’ or the eating of peasants, when you invite over an honored guest. ‘Borepok’ will make a person feel honored, but even that might be too much honor, and ‘pisaa’ may be most appropriate.
The intention of a sentence can easily be lost by a machine translation today. After all, the machine doesn’t know if it is editing a VCR manual or a novel.
Notable foreign authors used consistent translating partners and re-releases, or retranslations of books have changed meaning dramatically. Gabriel Garcia Marquez worked with the same translator to issue his works in English from his native Spanish.
Franz Kafka’s work, upon the first issue, was in many people’s opinion, gutted of the nuance and subtle sarcasm of the Czech author. The interjection of religious ideology on top of the author’s words created issues. This was a human translation, but it illustrates that a flawed intent can radically change anything coming after.
Words, Not Meanings
The ultimate failing of any programmed machine translation will come down to the concept of words, not meanings. Without going into the complexities of semiology and how we determine the difference between the signified and the signifier, we just have to understand that words have meaning.
Meaning isn’t intrinsic to the word, but it is intrinsic to the use. A person using a word which may appear in one language as a strong curse or taboo will have no such meaning in another language.
Theater and theatre are both correct spellings depending on the geographic location. A machine can be told it is dealing with an area, but that requires the area itself having a type of programmed barrier. Few linguistic regions are so cut and dry.
Consider the use of words within regional contexts. In the American South referring to someone as ‘Honey’ shows neither affection or familiarity. It is simply another version of default place older for a name or ‘you.’ Outside of the South, this is often interpreted poorly, even among other Americans.
A machine will be even more likely to make bee and sugar substitute translations, further confusing the matter.
While machine translations offer utility in speed, they often lack in precision. Gleaning the basic details from a translated web page may be enough to understand what the web page itself is offering. However, the use of the website may not go so well.
Humans continue to offer the most thorough and precise translation services. When it comes to your business endeavors, being understood has no middle ground.