Who said you can’t teach an old dog new tricks? We’ve brought dated Optical Character Recognition (OCR) technology into modern times with multi-pass OCR software.

Perfect OCR Isn’t so Difficult

Want to know the secret? Remove everything that isn’t text – it makes the OCR engine’s life so much easier.

Grooper does this through industry-first image processing tools and out-of-the-box configurations that are designed for the task.

The best part is that these tools won’t alter the original version of the image that you want to permanently retain. Whether you have paper documents or electronic, Grooper provides the best results in image processing and OCR.

What’s the best OCR software? We get asked that all the time. OCR alone is far too inaccurate. The answer is combining Grooper image processing and recognition tools with one of many off-the-shelf OCR engines such as Transym 4/5, Tesseract, Azure, ABBYY, Prime, etc.

(Not sure what Grooper is? No problem. Learn more about Grooper data integration.)

How Grooper Prepares a Document Image for Recognition:

OCR Line Removal Animation
  • Remove lines
  • Clean up document edges
  • Remove small specks
  • Remove large non-text objects
  • Invert white-on-black zones
  • Remove hole punches
Imagine With Us

Tips for Accurate OCR

No matter how clean and pristine document images appear, text recognition software still struggles to collect accurate text. Text in images, in multiple columns, and in different font sizes all contribute to bad character recognition.

Another cause of inaccurate capture is that recognition engines process pages from top to bottom. (Hint: A better approach is OCRing select areas of a page and combining the results together.)

Grooper’s patented OCR synthesis engine intelligently performs multiple passes on different portions of a document image. Results are grouped together as a single unit, providing highly accurate text results.

In a lab test, Grooper accurately captured 99.91% of text. Using OCR alone on the same data set proved half as accurate.

5 Features That Guarantee Accurate OCR

Document Segment Reprocessing Animation
layered ocr for checks

Intelligent Spell Correction

Powered by Atomic RegEx, Grooper performs corrections to fix some pretty ugly stuff. And the secret to making this work? K-Means Clustering, text removal, and text correction engines.

Spell Correction

What Spelling Errors Does Grooper Correct?

  • Simple OCR mistakes in strings that don’t match words in a standard dictionary
  • Human-generated typos on documents
  • Word splitting – insert spaces where OCR falsely jammed multiple words together
  • Delete strings of characters that are not numbers or letters, like strings that resemble an attempt at censorship, like “$#@! ^&*”
  • Repair numbers, such as prices, where overly-aggressive image cleanup mistakenly removed punctuation

How to OCR a PDF Document

PDF is the most widely used document standard in the world. Because there’s no standard for generating a PDF, capturing text has varying levels of difficulty:

  • Some PDFs are purely text-based (easy to capture from)
  • Others are just document scans in PDF format (difficult)
  • Others PDFs have combinations of the two scattered throughout pages (most difficult)

PDF documents have a fair amount of text recognition challenges.

How to Get Text off of PDFs:

ocr pdf text extract

Grooper looks at each page within a PDF and places the page into one of three categories: image-based, text-based, or mixed-content.

By doing this automatically, specific rules and processing methods make text extraction easier.

Then, each page is handled accordingly:

  • Process PDF pages that have a single image covering the entire page as image-based pages
  • If a PDF contains no images, extract only the raw text-behind the page
  • For mixed-content pages, extract each image to a temporary image, process the image, and merge the results with the native text

Additional Tools

trainable ocr

Trainable OCR

Grooper OCR is trainable. The engine supports training custom and difficult font formats.

OCR Performance Balancing

Performance Balancing

Grooper’s “Run Speed” option provides control to achieve the ideal balance between accuracy and performance.

Multi-Language Text Support

Language Support

Grooper recognizes 268 distinct languages and 523 regional cultures. Language detection interprets dates, times, currency names, numeric formats, and more.

Avoiding OCR on Electronic Text

Electronic Text

Grooper avoids optical character recognition altogether when dealing with original text-based files like Word, Excel, and Text PDFs. Instead, Grooper pulls complete and perfect text directly from the file.

how to get good ocr software results

Supercharge Your OCR – Save Time & End Manual Data Entry

Getting accurate capture results from old, or poor quality scanned documents used to be almost impossible. And especially tough if you needed to save a human-readable copy. With Grooper you get high OCR accuracy and a great looking document image.

Watch this webinar to learn:

  • How different OCR tools compare
  • What to do when recognition software returns inaccurate results — or worse — no results
  • How much image correction is too much
  • How to get accurate data from forms and OMR / checkboxes

Save thousands of hours of work and get far better data!

Give it a Try

The Grooper Experience Will Change You

Let's Get Started!