FineReader Sprint Glossary

A B C D F I L O P R S T U

A

Active area is a selected area on an image that can be deleted, moved or modified. To make an area active, click it. The frame enclosing an active area is bold and has small squares that can be dragged to change the size of the area.

Automatic Document Feeder (ADF) is a device that automatically feeds documents to a scanner. A scanner with an ADF can scan multiple pages without manual intervention. FineReader Sprint supports multi-page documents.

ADRT® (Adaptive Document Recognition Technology) is a technology that increases the quality of conversion of multi-page documents. For example, it can recognize such structural elements as headings, headers and footers, footnotes, page numbering, and signatures.

Area is a section of an image enclosed by a frame and containing a certain type of data. Before performing OCR, FineReader Sprint detects text, picture, table, and barcode areas in order to determine which sections of the image should be recognized and in what order.

Area template is a template that contains information about the size and location of the areas for a set of similar-looking documents.

B

Background image area is an image area that contains a picture with text printed over it.

Barcode area is an image area that contains a barcode.

C

Code page is a table that establishes correspondences between characters and their codes. Users can select the characters they need from those available in a code page.

Color mode determines whether document colors are to be retained. Black-and-white images produce smaller FineReader Documents and are faster to process.

D

Document analysis is a process of identifying the elements of the logical structure of a document and areas with different types of data. Document analysis can be carried out automatically or manually.

Document Open Password is a password which prevents users from opening a PDF document unless they type the password specified by the author.

Dots per inch (dpi) is a measure of image resolution.

Driver is a software program that controls a computer peripheral (e.g., a scanner, a monitor, etc).

FineReader document is an object created by FineReader Sprint to process a paper document. It contains page images, recognized text (if any text was recognized), the recognition language and export settings.

Ignored characters are any non-letter characters found in words (e.g. syllable characters or stress marks). These characters are ignored during the spell check.

Inverted image is an image with white characters printed against a dark background.

L

Ligature is a combination of two or more characters which are stuck together (e.g. fi, fl, ffi). Such characters are difficult for FineReader Sprint to separate. Treating them as one compound character improves OCR accuracy.

O

Optional hyphen is a hyphen (¬) that indicates exactly where a word or word combination should be split if it occurs at the end of a line (e.g. "autoformat" should be split into "auto-" and "format"). FineReader Sprint replaces all hyphens found in dictionary words with optional hyphens.

P

Page layout is the arrangement of text, tables, pictures, paragraphs, and columns on a page. The fonts, font sizes, font colors, text background, and text orientation are also part of the page layout.

Page layout analysis is the process of detecting areas on a page image. Areas can be of six types: text, picture, table, barcode, background picture, and recognition area. Page layout analysis can be performed automatically when you click the Read button, or manually by the user prior to OCR.

PDF security settings are restrictions that prevent a PDF document from being opened, edited, copied or printed. These settings include Document Open Passwords, Permissions Passwords, and encryption levels.

Permissions Password is a password which prevents other users from printing and editing a PDF document unless they type in the password specified by the author. If some security settings are selected for the document, other users will not be able to change these settings unless they type in the password.

Picture area is an image area that contains a picture. This type of area may enclose an actual picture or any other object that should be displayed as a picture (e.g. a section of text).

Primary form is the "dictionary" form of a word (headwords of dictionary entries are usually given in heir primary forms).

Prohibited characters If certain characters will never occur in a text to be recognized, they may be included in a list of prohibited characters. Specifying prohibited characters increases the speed and quality of OCR.

R

Resolution is a scanning parameter measured in dots per inch (dpi). Resolution of 300 dpi should be used for texts set in 10 pt fonts and larger, 400 to 600 dpi is preferable for texts printed in smaller font sizes (9 pt and less).

Recognition area is an image area that FineReader Sprint should analyze and read automatically when you click the Read button.

S

Scanner is a device for inputting images into a computer.

Scanning mode is a scanning parameter that determines whether an image must be scanned in black and white, grayscale, or color.

Separators are symbols that can separate words (e.g. /, \, — ) and that are separated by spaces from the words themselves.

Support ID is a unique identifier of a serial number. A support ID provides additional protection and is checked by the technical support staff before providing technical support.

T

Table area is an image area that contains data in tabular form. When the application reads this type of area, it draws vertical and horizontal separators inside the area to form a table. This area is then rendered as a table in the output text.

Tagged PDF is a PDF document which contains information about the document structure, such as its logical parts, pictures, and tables. The structure of a document is encoded in PDF tags. A PDF file with such tags may be reflowed to fit different screen sizes and will display well on handheld devices.

Text area is an image area that contains text. Note that text areas should only contain single-column text.

U

Uncertain characters are characters that may have been recognized by the program incorrectly.

Unicode is a standard developed by the Unicode Consortium (Unicode, Inc.). The standard is a 16-bit international encoding system for processing texts. The standard determines character encoding, properties, and procedures used in processing texts written in a certain language.