Natural Language Processing
With Grooper Natural Language Processing, find paragraphs, sentences, or other language elements in your documents that convey specific meaning.
Our supervised machine learning makes training easy. Powerful n-gram analysis ensures reliable interpretation – regardless of wording differences.
Capture information from sentence flows
Grooper can read paragraphs and sentences in documents just like a human. This allows it to understand and accurately recommend correct values from the body of documents by considering the surrounding flow of language. What is Grooper?
Examples of Grooper’s Natural Language Processing NLP:
- Language Element Recognition:
- Find all paragraphs or sentences in a document.
- Language Element Classification:
- Decide if a contract has a non-solicitation clause.
- Document Flow Detection:
- Extract ‘Monday, May 27, 2009’ across multiple lines.
- Context-Based Data Capture:
- Decide if a date value is the Maturity Date or the Loan Date.
- Powerful Language Parsing:
- Distinguish “SW ¼ of the NW ¼” from “SW ¼ and the NW ¼”.
Why This NLP Method?
This technique has made it possible for us to find all the values that make up a legal description, and then break the full description into the individual tracts of land contained within the lease.
There was no way we could overcome this challenge without resorting to custom development. Even then, the results were not something we could trust in a production scenario.
Paragraph Detection & Analysis
Grooper’s paragraph ranking engine looks at a document’s structure and intelligently groups words into paragraphs.
It then compares them against training samples to find the “best match”. Then the user is presented a recommendation list.
Indentions, double spacing, bullets, key phrases, line length and many other factors must be considered to determine where each paragraph starts and stops.
Grooper provides an easy to configure console in order to tune paragraph detection settings for each project.
Use the full power of Grooper’s data types to collect features from within each paragraph. These can be n-grams, entries from a lexicon, or a non-value feature count grouped by data types like: address, phone number, name, etc.
The analysis spans lines of text to ensure accurate multi-word feature collection regardless of line wrap.
How Does Grooper Analyze Leases & Contracts?
Lease and contract analysis is a major strength for Grooper. We have successfully built a working model that finds all of the key provisions throughout the body of the main document.
Then it automatically looks for modifications to each within addendums/ exhibits and brings them in-line with the main provision.
This speeds up our analysis and ensures we are correctly interpreting the data. Learn more about how Grooper analyzes leases and contracts below:
Pattern matching on its own is great for efficiently finding common values like dates, amounts, and phone numbers. But when multiple choices are found on a document, how will the system know which one is the best match?
The answer is spatial analysis. Each choice is ranked by analyzing the words and features nearby.
In the example above, we are easily able to find the separate data that pertains to the borrower vs. co-borrower through radial spatial analysis.
In structured documents, most data is recorded in label/value pairs. This means that a value has a corresponding label somewhere on the page that tells its meaning.
And field labels are generally written above and/or to the left of the value they define.
Grooper can rank possible values by looking spatially in one or more general directions to find meaning.
Radial Analysis and Geotagging
Considering the nearby words, and also the direction each word is located in relation to the candidate, can lead to more accurately identifying field values from your documents.
Geotagging increases this, and it allows for filtering features based on direction as a simple way to remove features not likely to define a value.