Home   ·   Index   ·   Search

memoQ 9: Match settings and case sensitivity

All terms per language in a memoQ term base have settings for Matching and Case Sensitivity . These settings can dramatically affect the usefulness of the term so it is important to be aware of these. Ideally, well-maintained TBs would have individual settings for each term but as almost all glossary content we handle come from outside sources, most terms in most TBs will use the same default settings. But when manually adding terms to a project TB, it is important that the translator applies the correct settings. These settings can also be set in the term base editor

To set defaults for new terms, select a TB and click Properties, then click New term defaults. Select a language from the list, and change the case sensitivity setting using the drop-down menu. Do this for all the languages in your project.

Note that you need to set defaults for EACH language term, not just the source or target language.

Case sensitivity settings:  

  • Yes sets the case sensitivity to sensitive. Only terms with identical case as the word in the text will be suggested.
    Example: with case sensitivity set to Yes, "IT" in the term base will only be suggested if the source text has "IT" in uppercase, not for the word "it". Conversely, "Can" in the term base would only be suggested if the source text has "Can" with an uppercase C, not for "CAN" or "can".
  • Permissive sets case sensitivity to No for small caps and Yes for capital letters in the source text. This is the default setting. If your term base contains a word with caps, for example "IT", it will only be suggested when the word appears in the text with identical capitalization; the word "it" won't trigger a term suggestion. But the term "sample" would be suggested for the word "SAMPLE".  
    This is one of the main reasons for unproductive TB use: many imported glossaries contain initial capitals on the terms and with Permissive case sensitivity, these terms will NOT be suggested unless the word appears in the text with the same initial capital. You should ALWAYS set the case sensitivity to No for imported glossaries with initial capitals.
  • No sets the case sensitivity to insensitive. No matter how the term is typed in the source text, the results will include the same term written either in small caps or in capital letters, or both. Case is simply not relevant with this setting.

Same thing explained in a table. The green boxes indicate which TB hits you can see for different capitalisations of the same words in the text.

Matching settings:

  • Fuzzy allows fuzzy matching between terms and content. In short, this will allow a match between the source text and the term base as long as the two are 80% similar. So for example, this setting will give a term match for "superman" if the text has "superwoman". This setting will give a lot more hits than the other settings but will also flag loads of false positives when checking terms during QA, so do use it with caution.
  • 50% prefix suggests terms for words in the text where at least the first half of the word matches the term base term. So for example, if your term base has "man" it will be suggested for "manner" but not for "mannequin"  This is the default setting and works for most terms.
  • Exact matching only allows exact, letter by letter matching between text and TB. Should be used for shorter terms.
  • Custom allows using wildcard asterisk and pipe characters. Entering these into a term will automatically change the matching to Custom. These are highly useful for Germanic languages and Finnish that tend to write composite words together. Using wildcards correctly will increase the accuracy of the term base and produce fewer false positives and false negatives during QA.

    Examples of Custom match settings (all using Swedish):

    The pipe character | is used to indicate the stem of the word:
    "flick|a" will give matches for "flicka", "flickor" etc – in other words, for any term that begins with the stem "flick-" but not for "flick" itself".

    The asterisk is a wildcard that represents any number of any characters (except whitespace). It can be used at any location in a term, so be cautious:
    - "tid*" will match "tid", "tider", "tidning" but not "kvällstidning"
    - "*tid" will match "kvällstid", "dagtid" but not "kvällstidning"
    - "*tid*" will match "tidning", "dagtid", "kvällstidning", "partidagar" etc.
    - "ti*d" will match "timid", "timjanbröd" etc.

    You can also combine pipe characters and asterisks, and over multiple words:
    - "gul|t hårstrå*" will match "gult hårstrå" and "gula hårstrån"