Translation Memory Introduction

Translation business is centered around one key metric: the number of words to translate. The quantity of untranslated text, along with its type (technical, medical, fiction, certificate, etc) determines the amount of effort that will go into the project, and, ultimately, the cost to the customer.

There are many tools that help the translator maintain the consistency of translation from one project to another, and can significantly increase the speed of translation while simultaneously decreasing the total turnaround time and lowering the overall cost. These tools are called Machine Translation (or TM), and they play a vital role in any long-term translation cooperation. What they do is identify repeating blocks of text and translations done previously, and substitute them in as translator works on the text, as well as marking annotations for translator where additional editing is needed. With each subsequent translation your TM memory grows, and the effort of translating new text decreases. At the same time it also allows the language vendor to assign another translator or spread the work between the team members and maintain the consistency of terms.

Translation Memory Process

Consider the following scenario: you are localizing a website or an application. You have a resource file with strings in it, manuals or online help that goes along with it and marketing collateral as well. You need to have all of that translated into a specific target language (let’s say, Latin American Spanish). You package up all these individual materials and send them off to a translation company for a quote.

Now that the language vendor receives your files, they load it into the TM studio and run analysis on the contents. If this is the first time you are working on these, a new project is started and new Translation Memory dictionary is created. Language vendor has no previous translations specific to you available, so what they can do is run a repetitions and context match analysis on all files grouped together to see – how much effort can be saved when the TM Studio identifies segments (words grouped together or whole sentences) that repeat across the files.

Once the analysis is complete, TM Studio will produce a full report that will show the total volume of words that need to be translated, and how closely they resemble each other. With technical documentation the high level of repetition is not uncommon, because you use the same segments both in the help manuals, as well as marketing materials, and your resource files as well.

The Studio will find “100% match repetitions” – identical segments, as well as fuzzy matches – segments that resemble each other, but differ to a certain degree: anywhere from 99% to 50%. Looking at anything below 50% is essentially looking at original untranslated text. In fact, for all intents and purposes, anything below 75% most of the time is already too far off on the similarity to save translator much time. It is not going to be a quick review and a preposition or syntax adjustment – translator will have to carefully read the original and then do a translation with maybe a few words automatically substituted by the TM Studio.

There’s a variety of tools on the market. At Bilingva we prefer to use the most popular and well-developed tool: SDL Trados Studio.

Initial Estimate

Now that the language vendor has analyzed your files and generated a report, they can provide an estimate to you. For the initial work, the savings to you will come mostly from 100% repetitions: segments of text that fully match each other. As the translator is working on the text, the next time the same segment is encountered, the TM tool will automatically substitute a previous translation into the block of text being translated. Conversely, if, down the road, translator decides there is a better translation for this specific block and updates it, the TM tool can automatically update all previous translations to maintain consistency of text.

The estimate the translation agency will provide to you will look something like this:

Bilingva Estimate #2450
ACME Fantastic App documents
Source language: English
Target language: Latin American Spanish
Total number of words: 15,000
Repetitions: 2,500 x $0.10/word = $250.00
No match: 12,500 x $0.17/word = $2,125
Total: $2,375

As you can see, repetitions are priced lower than no match words (those that are absolutely unique), because translator has to do less work, and can complete the translations quicker. They are still priced at a certain rate, because translator still has to verify that each and every segment are translated properly: even with 100% match, the context in which the segment appears makes the difference, so you can’t do a blind “search and replace” without further review.

However, even on the first stage, TM tool brings you a savings of $175 (all rates are sample for the benefit of this example).


If you and translation vendor agree on the estimate, you sign off on it, and translation agency gets to work. A few days later your translations are returned. You may wonder: how will the agency preserve all my formatting, special characters and tags in my resource files and online help, as well as carefully selected fonts and styles for my manuals that will go into print later.

TM tools help with this aspect as well: when the source files received from you are loaded into TM Studio, it locks out the tags in the editor environment which is used by translators, so they can’t accidentally remove an extra bracket or a special character. In fact, most translators work with those tags hidden from view, so they don’t get in the way of actual text. Once translation is complete, translation agency exports the file back into original format with Studio reinserting localized text into original markup.

After export the agency verifies the general look and feel, and delivers translated package back to you, the customer.

Next Iteration

Now that you have your localized materials back, you continue with your release process – and the website, app or article go live. Some time later you have a next release coming up, or, perhaps another article on the same subject. You send the edited versions of your resource files, manuals and marketing collateral back to translation agency for another round.

Without TM tool in place, the agency would have an extremely hard time tracking the changes and coming up with the estimate for the translation work. Even with Word docs the markup gets in the way of built-in comparison, and for non-Word documents like Excel, Resource Strings, PO files and many others, a reliable comparison may not exist at all. So, the agency would have to charge you as if the project started from scratch – for the full amount of text.

At Bilingva we, of course, leverage the power of TM tools that were built with this process in mind. When the updated files are received, they are loaded back into TM Studio and analyzed – not only for new content, but for repetitions and matches with the Translation Memory from the previous project. Now, even if some contents has changed – the Studio is able to perform a fuzzy match and assign probability of content being the same for every segment it encounters.

The new report will look something like this (simplified. We will dig into technical portions later):

File: Fantastic App Document Rev. 2
Total word count: 15,500
Not translated: 5,375
50-74%: 2,445
75%-84%: 1,230
85%-94%: 550
95%-99%: 750
100%: 400
Translated: 10,125

Now the agency has a clear understanding of how much content has changed, and how much work they need to do to translate the second revision. As you can see from report, 10,125 words have not changed, and do not require translation. New contents is only 5,375 words.

Out of those, 400 words have 100% matched with the previous Translation Memory segments. Translator will still need to verify the correctness of the translation in the context of the changed file, but it will be a faster process. The rest of the new contents falls on the spectrum of anywhere from 99% (very little change) to 50% (mostly new content). The agency can now provide various rates to the customer based on these percentages and the difficulty of the text and overall document.

As you can see, with every new iteration, the change of having seen the segments before grows, and your translation costs significantly benefit from the usage of the tool.

As another benefit to you as a customer, should you ever watch to switch the agency, they can provide you with the Translation Memory file which you can use with another agency – and retain all your past translations and terminology.

As another benefit to you as a customer, should you ever watch to switch the agency, they can provide you with the Translation Memory file which you can use with another agency – and retain all your past translations and terminology.