Loading Events

« All Events

Grooper Consultant Training: Unstructured Data Extraction Training

November 16 - November 18


Unstructured Data Extraction Trainingunstructured data training badge

Course Overview

Unstructured documents present a unique challenge for most document processing platforms.  These documents use natural language to convey information rather than placing data in table-like or geometric structures.  Contracts, for example, have clauses, party names, dates and other information throughout the text, which must be parsed through an understanding of the language around it.  While this may be easy for a human to do (depending on the complexity of the contract!), it is more difficult for a machine – even using a sophisticated algorithm – to understand the relationships between words in a paragraph.

(Contact your account representative to register)

Course Goals

This course aims to educate users on Grooper’s natural language processing capabilities to extract data from unstructured documents.  Grooper’s approach to natural language processing is two-fold.  1) User-assisted machine learning:  Understanding semantic importance of text features around text data by weighting them using the TF/IDF algorithm.  2) Text Structuring:  Applying paragraph detection and flow-based collation methods to data extraction methods in order to simulate how humans break up reading text in a document.

Key Concepts

  • Natural language processing using the Field Class data extractor
  • User-assisted machine learning with the TF/IDF algorithm
  • How to train your data: When to stop training
  • Grooper’s paragraph detection

Adjacent Knowledge

  • Dealing with bad OCR: FuzzyRegEx
  • Dealing with too many names, parties, and other unstructured information: Lexicon training

Final Exam

  • Students will create data models to create structure from unstructured data
  • Students will demonstrate an understanding of the TF/IDF algorithm by training document sets in order to extract data via the Field Class extractor
  • Students will workshop simple to complex document sets across multiple industries to successfully target and extract their data elements


November 16
November 18
Event Category:


Remote online instructor-led training

Give it a Try

The Grooper Experience Will Change You

Let's Get Started!