ATAPY Software - OCR, Document Imaging, Document Management, Data Capture, Data Conversion
Services and Solutions for Document Management



The problem that this product is addressing

The existing recognition systems are targeted at regular documents (non-forms) and ‘geometrically fixed’ forms. However, there are many documents that do not fall into any of these two categories.

Some of those documents are supposed to follow certain standards; they are typically called ‘flexible forms’, or ‘flexiforms’. Those standards are not as stiff as the precise geometrical pattern of ‘fixed forms’. Yet, they must be definitive enough to enable the OCR software to locate the fields. Many of the financial papers you can find in your bookkeeping archives can be considered flexiforms.

For other documents, the guidelines are too lax to be able to parse them automatically. For example, a résumé can be regarded as a form because parsing them into individual fields is a common task at recruiting companies; also, there are certain traditions concerning the fields and their order. However, it is too early to put résumés into the flexiform category as the diversity of their formats exceeds the capabilities of automatic document analysis available in today’s flexiform software.

screenshot

Another example of such documents are various catalogues. At some point in the past ATAPY has completed an order for inputting a large number of catalogue pages into MS Excel:

For this project, ATAPY reviewed several diferent approaches and designed a set of specific software tools. To a certain degree, iOCR stemmed from that project and those tools.

Currently there is no widely accepted term for the documents that we are discussing. For lack of a better term we will call them freeforms. Our iOCR product is targeted at freeforms.

It is worth noting that the distinction between flexiforms and freeforms is vague. Some fields on a document can be easily locatable by flexiform software, and other fields can’t. A good example are invoices. Automatic invoice input is a highly demanded task and a lot of OCR community’s brainpower has been spent on it. The result, however, is far from perfect: the existing invoice OCR software packages, when confronted with a new invoice format, miss more fields than they locate. We can, therefore, consider invoices a mix of flexiform and freeform (at least as long as we look at the entire class of documents rather than invoices of some particular type).