Feature Collection with ESP

Grooper’s ESP engine identifies the distinguishing features of each page to group collections of images together as classified documents. ESP uses three key feature collection mechanisms:

Lexical

Natural language processing examines the language of the complete document to understand context.

Rules-Based

Find unique key words or phrases that positively identify a document, like a title or section heading.

Visual

Computer vision identifies structured forms based on what they look like without having to read from OCR.

ESP Separation

Train document examples and see how the ESP Separation engine interprets the content of each page, groups pages into documents, and simulates page breaking and classification.

Grooper page classification
  • A simple “train-by-example” interface lets you quickly teach ESP how to identify each document.
  • Real-time confidence scores show you both the document type and assumed page number for each page in a batch.
  • Estimated Page Index (EPI) identifies page numbers on your documents. This information is used by ESP to determine if an unknown page is likely part of a surrounding document.

Classification

Provide document examples and watch as Grooper begins to learn the correct Doc Type for each instrument provided. When doing batch testing, unclassified items (those with low confidence scores) can be flagged and sent to a queue for additional training.

Grooper document classification

Give it a Try

The Grooper Experience Will Change You

Imagine With Us