Natural Language Processing
Identify paragraphs, sentences, or other language elements which convey specific meaning. Supervised machine learning makes training easy, and powerful n-gram analysis ensures reliable interpretation – regardless of wording differences.
Capture information from sentence flows
Grooper can read documents in paragraphs and sentences just like a human. This allows it to understand and accurately recommend correct values from the body of documents by considering the surrounding flow of language.
- Language Element Recognition:
- Locate all paragraphs or sentences in a document.
- Language Element Classification:
- Determine if a contract contains a non-solicitation clause.
- Document Flow Detection:
- Extract ‘Monday, May 27, 2009’ across multiple lines.
- Context-Based Data Capture:
- Determine if a date value is the Maturity Date or the Loan Date.
- Powerful Language Parsing:
- Distinguish “SW ¼ of the NW ¼” from “SW ¼ and the NW ¼”.
This technique has made it possible for us to identify all the values that make up a legal description, then break the full description into the individual tracts of land contained within the lease. There was no way we could overcome this challenge without resorting to custom development, and even then, the results were not something we could trust in a production scenario.
Paragraph Detection & Analysis
Grooper’s paragraph ranking engine assesses a document’s structure, intelligently groups words into paragraphs, then compares them against training samples to find the “best match”. Then the user is presented a recommendation list.
Indentions, double spacing, bullets, key phrases, line length and many other factors must be considered to determine where each paragraph starts and stops. Grooper provides an easy to configure console to tune paragraph detection settings for each project.
Use the full power of Grooper’s data types to collect features from within each paragraph. These can be n-grams, entries from a lexicon, or a non-value feature count grouped by data types like: address, phone number, name, etc. The analysis spans lines of text to ensure accurate multi-word feature collection regardless of line wrap.
Lease and contract analysis is a major strength for Grooper. We have successfully built a working model that finds all of the key provisions throughout the body of the main document. Then it automatically searches for modifications to each within addendums/ exhibits and brings them in-line with the main provision. This speeds up our analysis and ensures we are correctly interpreting the data.
Pattern-matching on its own is great for efficiently finding common values like dates, amounts, and phone numbers. But when multiple choices are found on a document, how will the system know which one is the best match?
The answer is spatial analysis. Each choice is ranked by analyzing the words and features nearby.
In the example above, we are easily able to differentiate information pertaining to the borrower vs. co-borrower through radial spatial analysis.
In structured documents, most information is recorded in label/value pairs – meaning a value has a corresponding label indicating its meaning. And field labels are generally written above and/or to the left of the value they define. Grooper can rank possible values by looking spatially in one or more general directions to determine meaning.
Radial Analysis and Geotagging
Consideration of not only the words nearby, but also the direction each word is located in relation to the candidate can lead to more accurate identification of field values from documents. Geotagging adds in this consideration, and it allows for filtering features based on direction as a simple way to remove features not likely to define a value.