Post-editing of machine translation output (MTPE) is the process of reviewing machine translated content and editing it in such a way that it meets the requirements of the client.
Two types of post-editing (PE)
Before commencing work on a post-editing job, you should ensure whether the task is Full post-editing or Light post-editing.
- Full post-editing
- human-like publishable quality (same as if no MT had been used)
The goal of full post-editing is to make the most of the usable parts of the MT text, and at the same time, to make the translation linguistically correct, stylistically good, terminologically accurate, and consistent.
- Light post-editing
- understandable level of translation (making the most of the MT, focusing on speed over quality)
The goal of light post-editing is to make the MT text understandable and adhere to client's specific requirements concerning the quality of certain elements of the text, e.g.: the client may ask to make sure that product names are left untranslated, or always capitalised, etc.
In most Sandberg MTPE projects, MT is used as a productivity tool in the same way as CAT tools are used. The quality of post-edited MT output (final product) expected during the delivery needs to be clearly defined during project's set-up. And any specific client's requirements concerning the translation quality need to be included in the project's instructions.
Our clients usually order full MTPE, where the final product should be of the same high quality as human translation.
If the client specifically requests light MTPE, where the translation quality is lower, their decision is usually financially motivated. Light post-editing is cheaper and less time-consuming, therefore post-editors will not be paid more even if they spend more time producing better quality than requested.
Important MTPE details
Always consider the below points when post-editing a project which involves MT:
MT suggestions in CAT tools
Usually, the client provides pretranslated files, in which the MT output is already inserted in the target segments in the CAT tool used in a particular project. In most cases, any matches above 70-75% come from TM(s), and are handled as in any project. Anything below a 70-75% match is machine-translated content and requires post-editing (full or light). If there is no existing TM leverage, the whole text may be machine translated. Post-editors can choose whether to opt for the MT match, a lower TM fuzzy match, or come up with their own translation. But the goal of post-editing is to identify usable parts of MT text and build around these rather than ignore MT suggestions completely and start translating from scratch.
MTPE training
MTPE is a different challenge than regular translation and we recommend undertaking some training to help you become more efficient post-editors.
RWS offers a free course in MTPE which we recommend for anyone new to post-editing. It has good content for basic MTPE training to help you build on your post-editing skills. It takes about an hour to complete. The course requires registration, but is free once you have made an account. You can sign up for the course here
Sandberg also has access to an MTPE course offered by TAUS. This course is longer (12 hours) and we helped develop it when it was first launched. The course is not free but do consider it if you are interested in deepening your MTPE skills.
Post-editing: Speed
The speed a linguist can carry out post-editing is directly linked to the quality of the raw MT output and the post-editor's experience.
As a guideline, a linguist may be expected to process 2500-3000 words a day instead of the standard 2000 words, provided a MT engine is well trained and produces good quality output. The post-editing speed is linked to the price charged for MTPE.
Post-editing vs Translating:
Post-editing is a very different process from translation. The below table outlines the central differences.
POST-EDITING MT |
TRANSLATING WITH HIGH TM FUZZY |
|
TRANSLATING FROM SCRATCH |
Firstly, read the target segment (raw MT) |
Firstly, read the source segment |
|
Firstly, read the source segment |
Now read the source segment |
Read a TM suggestion
(you work in the post-editing mode) |
Start translating in your head
(you work in the standard translation mode) |
Start translating in your head |
Ask yourself whether the meaning is the same |
Start translating in your head |
Read a TM suggestion |
Translate |
Ask yourself whether the existing mistakes really need to be fixed
or if you are wasting your own time on preferential changes |
Ask yourself whether the difference between the translation
sitting in your head and TM fuzzy match is relevant |
|
Check your work |
Edit raw MT if required
or start translating from scratch |
Edit the TM suggestion if required
or start translating from scratch |
|
|
Check your work |
Check your work |
|
|
Post-editing: Tips
- To post-edit or not to post-edit?
A good rule of thumb is that if you spend 2 seconds looking at an MT segment, and see that you cannot easily edit it to produce a well-flowing translation, discard it and translate it from scratch or use a lower fuzzy match from the TM instead. - How to post-edit long sentences:
Use the MT as a source of inspiration when looking for the correct translation and pick out bits of the sentence to reuse rather than trying to keep as much of the sentence as possible. This is particularly relevant for longer sentences. Even sentences that are largely incorrect can be useful so long as deleting the incorrect material is not time-consuming. - Note that the MT output coming from statistical engines that we use at Sandberg may at first glance seem like a perfectly smooth translation but in reality the chances are that the target segments do not render the meaning included in the source segments – this is the most common MT pitfall that you should be constantly aware of.
- Save time and do not introduce preferencial changes.
- Check project-specific instructions to see what exactly needs to be fixed. Do not correct segments that do not require fixing.
- If raw MT includes a repetitive issue that is easily fixable with Find & Replace function, use it. If possible inform PM about it, to have the MT engine refined,
Post-editing: MT flaws
MT output is seldom flawless and post-editors need to notice typical MT issues and evaluate which parts of MT output can be used for the final translation. Post-editing is most efficient when the translator knows what kind of issues to expect.
These are the most notorious MT flaws that you will be correcting in the MT output:
- Tags (incorrect or missing)
- Spacing (especially between numerical values and units of measurement)
- Capitalisation
- Hyphenation
- Spelling
- Words omitted
- Words added
- Words untranslated
- Awkward sentence structure
- inconsistent terminology (especially if there is a glossary to be followed). For example: you may have a term in the source text, and five different terms occurring in MT output in the target segments. It is your task as the post-editor to select the correct term and apply it consistently in the project file(s).
Post-editing: Common errors made by post-editors
MTPE error type |
Explanation |
Unedited TM fuzzy matches |
Errors where different terms/opposite meaning/different numbers are not edited properly from TM fuzzy matches are unlikely to come from MT engine; it is post-editors who make the error of accepting unedited TM fuzzy matches. |
Inconsistently translated terms |
The MT engine does not produce consistent translations of terms and it does not communicate with project's TM or TB. Remember to check segments surrounding the one you are editing, including locked content, for context; and to use concordance with TM and TB to the extent that you normally would. |
Translated Do Not Translate Words (DNTs) |
The MT engine has no recognition of Do Not Translate Words (unless it is specifically trained to do so). This is why any company or product names that include nouns or verbs become machine translated. It is a serious mistake if a post-editor does not revert DNTs back to the source. |
Unattended mistranslations and false friends |
The MT engine does not know which of polysemous words is correct for the context, it just uses those words and phrases that are statistically the most frequent in the engine corpus. Mistranslated segments may, at first glance, seem like perfectly smooth translations. It is your job to identify and correct such target segments that do not render the meaning included in the source segments. |
Unnoticed untranslated words, omitted words, added words |
If an MT engine comes across a word that is not part of its corpus or is too complex to machine translate, it either leaves it untranslated, totally omits it, or adds an extra nonsensical wording - no linguist would do that. Be vigilant. |
Acronyms incorrectly rendered in target |
If an acronym is not part of the engine corpus, it will be incorrectly rendered during machine translation. You will need to take care of the acronyms in the target segments, i.e.: spell out the meaning of an acronym or use a different one in your language. |
Wrong spelling |
The MT engine rarely would use a correctly spelt word in the wrong context, whereas post-editors would (e.g. from/form). Watch out for typos while post-editing. Run a spell check. |
Grammar mistakes |
Incorrect word order and gender incongruence tends to happen more frequently during MTPE. Pay particular attention to the grammar errors reported during spellcheck. MS Word spellchecker is recommended because it seems to be better at picking up grammatical errors. |
Under-edited content |
Always read through the translation in its entirety before submitting it. Machine translated content includes false friends and spacing issues - take it for granted and be vigilant. Set OA settings to pick up typos, duplicate words, and trailing spaces. Watch out for term consistency. Usually, the content you are post-editing should be of the same high quality as human translation. |
Over-edited content |
Avoid introducing preferential changes - you are risking introducing inconsistent translations and wasting your time. Just follow client-specific instructions and consult project's TM and TB. |
Post-editing: Feedback
All our in-house translators who work on projects which specifically involve Sandberg-internal MT should log repetitive MT issues and core terminology inconsistencies that slow down their post editing time in our online shared MT feedback form. Also, they are to inform our MT specialist about poor performance of specific MT engines.
Our freelance translators are only asked to provide their subjective opinion and report any major technical or linguistic issues that slow down their work when MT is involved. However, some of our freelance translators may be asked to provide a regular feedback on the quality of MT.
Tracking time spent on post-editing
Currently we do not track the time spent on projects involving MTPE, but some of our linguists may be asked to track their time.
Also, if you are working on a large MT project, you may be asked early on in the project to provide feedback on MT output, and the time already spent on post-editing, so that we have a chance to advise the client and discuss delivery times and rates again if it turns out the quality is poorer than expected.
Price & sliding scales
Compensation for MTPE can vary depending on the quality of the MT output, and the time it takes for a translator to post-edit it.
- We will usually propose a discount of around 20% (i.e. the 80% of the full word rate) for MT processed words, depending on the quality of the engine and the nature of the text. A 5% discount (i.e. 95% of the full word rate) is generally charged for Finnish, owing to the relative difficulty of training Finnish MTPE engines.
- The discount will always be related to the potential productivity increase, but the potential productivity increase will always be considerably higher than the rate discount.
- Only words with no TM leverage will be MT processed, and consequently the discount will only be applied to such words (so called "no-match words".
- We use sliding scales of charges to calculate a weighted wordcount for each project. We will use sliding scales with adjusted discount rates for machine translated projects.
- The compensation percentage level for post-editing does not mean that all MT segments are of an equal quality, rather, some will be unusable and you will need to translate those segments from scratch, and some will be very good and you will need to do very little with them. It is the overall quality level and time spent that counts, and that's what the discount will be based on.
- If you find that the discounted MT level is not on at least on par with the productivity increase, please report this with examples to your PM.
If you are working on a large MT project, typically where the MT has been applied by our client, you may be asked early on in the project to provide feedback on the quality of the raw MT , and the time already spent on post-editing, so that we have a chance to advise the client and discuss delivery times and rates again if it turns out the quality is poorer than expected.