Technical Tips

Mike’s Technical Tip: Converting Scanned Text into Real Text in Acrobat

Magnify Font

One of Adobe Acrobat’s lesser-known features (according to a survey done by… me) is its ability to do OCR, or optical character recognition. That’s just a fancy term for converting images of text to real text.

I’ve lost count the number of times I’ve been sent a scan of some text and have been asked to edit it, or bring its contents into a page layout program like InDesign or Office application such as Word. Obviously, the scanned image itself can’t be imported into those programs if you want anything other than ridiculous results.

Luckily, that problem is exactly what Acrobat’s OCR capability was designed to address. And its basic functionality is very easy to use, too.

Let’s imagine someone sends you a PDF containing this scanned text:

Click to enlarge

Click to enlarge

You can tell this is just a scan (and not real text) by choosing Acrobat’s “Selection tool for text and images”:

Click to enlarge

Click to enlarge

If you click anywhere in the “text” you’ll quickly see that Acrobat correctly identifies it as one big picture, not individual text characters. Notice how all the content gets selected at once, plus the cursor is arrow-shaped (for selecting images) and not I-beam-shaped (for selecting text):

Click to enlarge

Click to enlarge

So here’s where the magic comes in. Under Tools, look for Recognize Text, and under that, choose “In This File.” The wording and location of this command may vary between different versions of Acrobat (and on different platforms, i.e. Mac vs. Windows) but in my copy of Acrobat (version 10 on a Mac) that command can be found here:

Click to enlarge

Click to enlarge

When you click on that “In This File” button, you’ll get a Recognize Text pop-up, which you can just “OK” or specify which pages you’d like converted:

Click to enlarge

Click to enlarge

A progress bar will appear as the text recognition occurs. After it’s done, your page will look pretty much the same as before. But the crucial difference will be that you can now select the page content as text. Notice how it no longer treats it as one big image, but instead as lines of text (and also note the I-beam cursor):

Click to enlarge

Click to enlarge

From this point, you can simply copy real text out of the PDF and paste it into any other application. Depending on the quality of the original image, you might need to clean up the resulting text a bit. But in almost every case, it’ll take less time than keying in the entire thing from scratch.

In Your Inbox