Scanned documents come in a variety of shapes and sizes and from a variety of sources. There are whole documents, small sections (such as a signature), handwritten notes and others. No matter the size of the scanned portion, it should be made accessible to all users.
Most of the time, a document scanned directly to PDF will not be accessible. Generally, the PDF produced from scanning is just an image of the text in that document. Without proper accessibility techniques, ATs such as screen readers cannot identify the contents of an image.
How to Tell if a Scanned Document is Accessible
It is fairly easy to tell if a PDF contains text or is just an image. Check by performing one of the following:
- Recommended Approach: Run the Accessibility Quick Check under the Document menu. If the results dialog reports that “The document appears to contain no text. It may be a scanned image,” the document is not tagged and structured for accessibility.
- Use Adobe Reader’s Read Out Loud feature, found under the View menu. If it says the document is empty, then the document is likely an image and not real text.
- Try to select the text with the mouse pointer. If the text cannot be highlighted by itself, the document is an image.
What to Do with a Scanned Image
The first step for turning scanned documents into actual text is to go through an Optical Character Recognition (OCR) process, which turns the images of words into actual text on a page. Once a document has been run through the OCR process:
- Check the document’s accuracy to ensure it is formatted appropriately, correcting scan errors, as needed.
- Navigate to Tools > Accessibility and select the Add tags to document option.
By completing these steps, Acrobat gives the document content structure. Adding tags to the document can only interpret a left to right, top to bottom reading order; therefore, a little remediation is necessary to correct structural issues such as lists, tables, links, headings, etc.
If the OCR process cannot be performed for some reason, then the other option is to provide alternate text for the document image (as you would for any other complex image).
Text should either be placed in the alternate text field of an image’s properties dialog or as text content within the document (either near the scanned image or as an appendix).