Many customers come to us with legacy translation projects that are not contained in a translation memory database. The problem is how do we get that content into the database?
Many of the modern translation management tools come with alignment functionality that can basically take a source document and a translated version of the same exact document and align them sentence by sentence so that the content can be committed to a translation memory database. However, there are a few key points to consider before using this method.
The structure of the content must be identical
Since the alignment process simply runs through the document sentence by sentence and keys on structural items in the documents like hard returns, soft returns, bullets, indentation, styles and special formatting, the two documents must be exactly in sync. Any deviation in alignment will cause mismatched sentences. A quick visual on what that looks like:
A few stray hard returns causes this mismatch. The absolute best way to provide content that you would like added to a translation memory is either in a two column table in MS Word or in MS Excel. If the content contains HTML or XML markup it is better to just provide the two documents for alignment. The same is true for highly formatted documents in programs like Quark Xpress or Adobe InDesign. The same caveats apply though, the documents have to be completely identical in content and structure. If for some reason either one of the documents has more or less content the whole process goes from something quite easy to manage to a completely manual clean-up process. The time required for this work can be significant and typically requires additional compensation. If the alignment issues aren’t fixed your import into the translation memory will contain errors and is basically useless.
Garbage in, Garbage out
Something that should also be considered is the quality of the input. Please be sure that the documents you submit have been approved and considered to be final. Once you commit the materials to the translation memory your translation team will rely on the material as accurate and acceptable for use. Some translation workflows call on existing materials in the memory to be locked and not touched when reused in subsequent projects. This approach is the same approach used by popular content management system software. So the old adage of garbage in, garbage out definitely applies. Be very careful in choosing the documents you submit for alignment.