Saturday, December 19, 2009

Terminology

AppleTrans allows you to track terminology in order to maintain high levels of consistency throughout your document. Set up your translation as shown in How to AppleTrans, with your source in a translation document and an open translation memory window for the translation.

Now create a second translation memory that will serve as the glossary and save it with a name like "Project Glossary" that makes the purpose of this memory obvious.

In the main translation memory window, click the small colored Terminology button on the bottom (in my setup a purple number 6, but you can tell by the tooltip). A panel named Terminology should slide out of the window. Check the Auto Verification button and select "Project Glossary" (or the name of the glossary memory file that you chose) from the popup menu.

To add terms as you translate, select the term in both the source and translation and option-click the Verify button on the translation panel. You should see it appear in the glossary memory window as a new record. Click verify again (no option key this time) and you should see the term appear in black in the terminology panel. Now, whenever you get a source sentence that contains the source term, it will appear in red in the terminology panel if it does not appear in the translation to remind you to include it in a consistent manner.

Don't forget to save your glossary file as you add terms to prevent data loss.

Quick Reminder Section:

Add term: select term in both source and target, then option-click Verify in the Terminology panel.

Thursday, December 17, 2009

Updates 2009/12/17

Added a preface, made a few edits to the main How To article.

More updates to come regarding WildCAT and working with office formats.

Wednesday, December 16, 2009

Preface - Who, What, Why, and Where

Who:

This list of articles is written and maintained primarily by Chris Moore of Fruit in Season with contributions from other people interested in AppleTrans.

What:

AppleTrans is a piece of software made available by Apple to assist translators. The software appears to be maintained by hiruneko (link to his blog related to AppleTrans with tutorials on advanced topics).

Why:

AppleTrans is an application intended to assist professional translators in the work of translation. Some of its main features are:

• Translation memory: tracks your translations and shows you things you've already translated that are related to the portion you are currently working on to ensure consistency throughout a document.
• Works with rich text without codes and tags. Other similar pieces of software often force translators to look at tags and codes that break up the flow of a text with styles. AppleTrans does not impose this extra burden on the translator.
• Native mac application: works, looks, and feels the way a mac application should. Uses Apple's built in components and interacts well with the system including applescripting and services. Similar apps for the mac are often built in Java and don't meet most mac user's expectations of software quality.
• Free. Not open source, but made freely available to use. Similar applications can cost over $200 for a single license.

On the other hand:

• Despite meeting a large number of "mac-like" conventions, much of the interface is unlabeled, and many of the most important features are hidden behind key and click combinations that are not clearly documented in a single location. Thus, this set of pages hopes to stand as a guide for those interested in the feature set, but get lost in trying to actually use AppleTrans to do the work as advertised.

Where:

AppleTrans is available here.

Monday, August 25, 2008

Updates August 26

Updated Layout (Introduction, Table Of Contents, Subscription)

New Post:
Segmenting (1)
Working With Sources (5) PDF - the good

Updated Posts:
Working With Sources (2)
Working With Sources (3)
Working With Sources (4)

Segmenting (1) Microsoft Office OOXML

In previous posts on getting at source text, I outlined how to get at Microsoft Office formats by saving to the new OOXML formats (.docx, .xlsx, .pptx) and then unzipping the package.

I'd like to expand on those articles by talking about segmentation. When opening the OOXML file parts (XML files contained inside) basically all of the text you will want to access is in <t></t> tags. Most of the time these tags have a single letter namespace prefix (but not always). The HTML segmenting rules in AppleTrans can get at the text you want, but you may find that they also segment things you don't want. The HTML rule is also very complex and takes even longer than it might otherwise.

To address these issues we will add a segment rule specifically for Microsoft Office OOXML.

First, go find AppleTrans in the Finder and select Show Package Contents from the context menu. Navigate to Contents/Resources/English.lproj/SegmentRules.plist. This file contains the rules for segmenting. If you think you will edit this file fairly regularly, you might find it useful to make an alias and save it somewhere for easy access.

The SegmentRules.plist file is a standard Apple property list file. If you have the developer tools installed you can use the handy Property List Editor application to do the following steps, but a plain text editor will do fine as well.

Segmentation rules are basically just groups of regular expressions. I haven't tested all the possibilities thoroughly, but it appears that they are not greedy (meaning that when multiple substrings match a pattern it takes the shortest).

To make our pattern we can borrow the rule that looks the most like the one we want. The plist rule matches strings in an XML file by looking for segments in <string></string> tags. The rule is as follows:


    <key>XML Plist</key>
    <dict>
        <key>Prefix</key>
        <string>&lt;string&gt;</string>
        <key>Segment</key>
        <string></string>
        <key>Suffix</key>
        <string>&lt;\/string&gt;</string>
     </dict>


Copy that and change the values to reflect the OOXML code you want:


    <key>Microsoft Office XML</key>
    <dict>
        <key>Prefix</key>
        <string>&lt;(.:)?t&gt;</string>
        <key>Segment</key>
        <string></string>
        <key>Suffix</key>
        <string>&lt;\/(.:)?t&gt;</string>
     </dict>


Save the file and when you open AppleTrans next you should be able to select the Microsoft Office XML rule and have it segment all of the text. This will still take a while with a lot of text, but it should be better than the HTML rule.

Monday, August 18, 2008

Working With Sources (5) PDF - the good

After documents that can be translated as text or rich text and Microsoft Office formatted documents, the next most common file format in my translation is Portable Document Format (.pdf) documents. There are, for our purposes, three kinds of PDF file: files containing useful text (the good), files containing useless text (the bad) and photo/scan files (the ugly). The one good thing about PDF documents is that, in general, there is no easy way to edit a PDF file so the client is not expecting the formatting.

This time we will look at how to access the good. These kinds of documents are often generated for documents ready to print, such as posters, brochures, investor relations materials and so on.

The simple way to know if this is the kind of document you are dealing with is to open it in Preview and try to selecting the text with the text selection tool.



Just selecting text doesn't yet qualify it as a "good" file. Try selecting an entire paragraph and pasting it into an AppleTrans document. If the text comes out generally in readable format you're most of the way there.

If you've come across a "good" file there are a number of ways to get the text out in a nice way. Let me introduce a few.

The first technique is to just open the document in the latest version of Adobe Reader, which selects text in columns properly by default. With it installed you can often get away with a simple select all, copy and paste.

This is also possible with Preview for those who don't want to install extra software: copy and paste text in order into a text file. Note that you can select multiple lines and then join them afterwards (for example by running a script or regular expression over them). The advantage of Adobe Reader is that it selects text in columns intelligently. A multi-column document selected in Preview will interweave the lines.

Tip: To help with selecting text in documents using Preview that have text boxes or columns and other odd layouts with the text selection tool, hold option when selecting with the mouse.

Here are some methods suggested by Elmars:


Another rather easy way to extract text from a "good" PDF file for translation with Appletrans uses the free Macintosh command-line utility pdftotext.

It has its own installer which places the program executable in /usr/local/bin/

Open a terminal and type /usr/local/bin/pdftotext -htmlmeta *.pdf *.html

Replace the * with the name of your PDF file.

Tip: Spaces in the filename must be preceeded with a backslash (\).

Using the -htmlmeta flag helps to retain the encoding in the resulting simple html file. Open it with TextEdit and resave as .txt or .rtf for use with Appletrans. Of course, you still will have to check and arrange the text before translation.

The advantage of this method is that being a command line tool it may be easily used for batch processing.

Furthermore, OpenOffice 3 for Mac (soon to be made official, in the meantime download the release candidate) has a plugin for conversion of PDF to OpenOffice Draw: pdfimport

In the converted Draw file, each text line is in a separate text box. If you open the Draw file (.odg) as a zip file, all the text can be found in the content.xml file. After translating with Appletrans and inserting the new content.xml into the Draw file, you can open the odg file and export it as PDF.


If you know of another good way, please write it down in the comments for everyone else, and I'll update this page as necessary.