For most of us, machine translation equals Google Translate and all kinds of its silly mistakes that make good memes. Some of them you can read on Engrish.com. But what in fact is MT and what can we do with it? Read the interview with Marta Bartnicka – IT specialist and machine learning expert! We are talking sex, protein and context.

For starters, something that keeps every translator out there awake at night: when will human translators (which you tend to call “protein translators”) be substituted by MT powered by AI? Will our industry turn into an abandoned ghost town?

No, not yet. I reply to the standard question about the ending of our profession with a standard answer: it’s not the twilight; it’s the dust settling after the explosion. MT is in fact stimulating the market to grow and provides human translators with work. How does it work? AI is becoming cheaper and better; therefore, it can manage such tasks as chatbots, internet shop’s comment moderation and other (sorry for phrasing it like this) junk stuff, meaning tasks where there is neither budget nor time for professional translation.

At the same time, there is growing hunger for communication in the language native to the user. Translations into many languages are now needed for text elements that some time ago usually functioned only in English (or Chineese), and no one would find it problematic in any way. Such elements include chatbot FAQs or Chinese shop catalogues. Even if we use MT here, we have to edit it, and this is a job for a human expert.

Recently, there has been a big technological leap in MT powered by Artificial Intelligence: Neural MT. Now, we work on machines trained by human translators, which have mechanisms designed to resemble human neural structure. It came to life a couple of years ago, and the year 2017 marked its entrance on the market. Neural MT has been introduced by Google, Facebook, Kantan and IBM Watson. Recently, a new player has entered the game: German DeepL.

The leap was a success: Neural MT is much better that its last generation, SMT (statistical machine translation), especially in terms of language fluency and „catching up with” grammar. In terms of compatibility and precision, it is as good as SMT. And what do the new technology providers do with it? Of course, they make money. I predict that NMT will not be evolving so fast. Probably, in the privacy of server rooms, quietly, there has already been a new generation designed, but we will learn about it after some time. By the way, “quiet” doesn’t seem to be the best word here, as NMT is operating on giant servers, which can be loud. DeepL, for example, is based in Iceland, where cooling and electricity is cheap.


Is Polish language really “difficult” for MT to learn? Apart from declension, is there anything else that makes it a greater challenge?

Sex or rather gender. Polish is difficult for MT in the same way as all Slavic languages; it’s a group with complicated morphology. There is one more thing: one has to move a lot of words to translate an English phrase into Polish or Russian. Romanian languages, like Italian or Spanish are much easier for MT. Even with SMT, the quality of such translation was acceptable. 

But we are not the worst case. Such languages as Finnish or Hungarian – this is fun! The MT industry is really looking forward to anyone introducing good NMT to cover this language family. I think there is strong chance for it with eTranslation, financed by the European Comission. They’ve decided to move from SMT to NMT starting with the most difficult language pairs.

How did you get into localisation and MT?

For money. Having completed IT studies in Wrocław, I worked as an IT expert and earned extra money by translating technical texts. When the money from the second activity became bigger than the IT salary, I thought it over and said goodbye to programming.

It started funny because a simple-sounding task „you will be arranging windows after a translation” was in fact a half-a-year project with millions of words. After that everything went easier.

machine translation - software menu

MT was a consequence: it arrived in the industry and it had to be checked and introduced. My technical education made it easier for me; without proper training, I wouldn’t be able to start an MT server. I understand the specs and, most of all, I understand the particular solution creator’s message. The basics of computer linguistics made it easier to review MT in terms of its ability to help translators.

Find out more about Marta’s work in our podcast, where she and Agenor Hofmann-Delbor speak about their book. (Podcast is in Polish).

The common problem is that many LSPs and translators try to deal with localisation without any knowledge of the process. They treat it as a “different kind of translation”. How to learn it? Where to look for help?

It is difficult to answer that without promoting myself. The handbook to localisation of software that I co-wrote with Agenor Hofmann-Delbor is the only one up to date (published in 2017 by Helion). I would like to know if anything new was released.

I also teach software localisation together with Agenor as part of our Localize.pl project. 

There are of course other LSPs that offer their own courses on CAT and other software, as well as software producers who teach how to use their own products. There are also some online courses on the; sadly none of them are in Polish. I can also name some places that I visit to train people: Translation and Localization Conference, Soap! and EAMT.

What are the most common mistakes that translators, software designers and project managers make?

People have kept on making the same mistakes for about 20 years! Software designers prepare the interface and messages by arranging the text in a dynamic way. What’s more, even Slavs do it and often forget that their language works in a different way: it’s inflectional. Software will do it this way:

String1=New
String2=Open
String3=Save
String4=Configuration
Option1=String1+String4
Option2=String2+String4
Option3=String3+String4

… and even with a proper translation the interface ends up wrong.

machine translation - software menu


Other really common problem is delivering source material texts (strings) in Excel spreadsheets, sometimes sorted alphabetically. Why do they do it? Because they are used to do it, generation after generation. Translators work only in Word and Excel, right? Well no. Sometimes, they are literate in Java, XML, JSON and in other formats. Sometimes they are eager to learn them. Transferring the text to Excel results in losing the context, often also the author’s comments. There is also a big risk of causing more mistakes while the text is transferred back into the proper file format.

There are many other funny ways to make mistakes when localising user interface (UI) and its help (UA). It seems unheard of but sometimes people hire two separate LSPs or translators to localise it. I always tell translators that when a user presses F1 or uses „help” on the Internet, he/she already has a problem, so why should we cause them any more? A boring, effortful but effective solution is providing a lexicon of options and delivering it to translators with project materials.

Last but not least – never forget about the context! In software and web-page localisation or even dynamically generated documentation, context is often left behind and lost. The programmer should provide translators with comments. From such, one can learn that the mysterious “{0} at {1}” means “{day} at {time}”. That means that „at” should be translated as a time indicator. The translator will use such a hint and, with a domino effect, the CAT operator will be able to see the comments section, and the person preparing the translation package will be able to properly use such information.

_______________________

The interview was made by Tatiana Saternus and published on “Babole” fanpage. 

Our customers:

Language selection