Near-Perfect OCR Shouldn’t Be so Difficult

The secret to high accuracy is removing all non-text elements.

Grooper does this through industry-first image processing tools and out-of-the-box configurations specifically designed for the task.

The best part is that these tools won’t alter the original version of the image that you want to permanently retain.

What’s the best OCR software? The best answer is actually using Grooper’s image processing tools, then using any OCR engine along with Grooper OCR tools.

OCR Line Removal Animation

How to Prep a Document Image for Optical Character Recognition

  • Remove lines
  • Clean up document edges
  • Remove small specks
  • Remove large non-text objects
  • Invert white-on-black zones
  • Remove hole punches

Tips for More Accurate OCR

No matter how clean and pristine document images appear, text recognition software still struggles to collect accurate text. Text in images, in multiple columns, and different font sizes all contribute to bad character recognition.

Another cause of inaccurate capture is that OCR engines process whole pages, from top to bottom. A better approach is OCRing select areas of a page and combining the results together.

Grooper’s patented OCR synthesis engine intelligently performs multiple passes on different portions of a document image. Results are grouped together as a single unit, providing highly accurate text results.

In a lab test, Grooper accurately captured 99.91% of text. Using OCR alone on the same data set proved half as accurate.

5 Features to Guarantee Accurate OCR

Document Segment Reprocessing Animation

Intelligent Spell Correction

Powered by Atomic RegEx, Grooper performs corrections to fix some pretty ugly stuff. And the secret to making this work? K-Means Clustering, text removal, and text correction engines.

Spell Correction

What Spelling Errors Does Grooper Correct?

  • Simple OCR mistakes in strings that don’t match words in a standard dictionary
  • Human-generated typos on documents
  • Word splitting – insert spaces where OCR falsely jammed multiple words together
  • Delete strings of non alpha-numeric characters that resemble an attempt at censorship, like “$#@! ^&*”
  • Repair numeric values where overly-aggressive image cleanup inadvertently removed punctuation

How to OCR a PDF Document

PDF is the most widely used document standard in the world. Because there’s no standard for generating a PDF, capturing text has varying levels of difficulty:

  • Some PDFs are purely text-based (easy to OCR)
  • Others are just document scans in PDF format (difficult)
  • Others PDFs have combinations of the two scattered throughout pages (most difficult)

PDF documents have a fair amount of text recognition challenges.

Getting Text off a PDF Animation

How to Get Text off of PDFs:

Grooper looks at each page within a PDF and places the page into one of three categories: image-based, text-based, or mixed-content. By doing this automatically, specific rules and processing techniques make text extraction easier.

Each page is handled accordingly:

  • Process PDF pages containing a single image covering the entire page as image-based pages
  • If a PDF contains no images, extract only the raw text-behind the page
  • For mixed-content pages, extract each image to a temporary image, process the image, and merge the results with the native text

Additional Tools

trainable ocr

Trainable OCR

Grooper OCR is trainable. The OCR engine supports training custom and difficult font formats.

OCR Performance Balancing

Performance Balancing

Grooper’s “Run Speed” option provides control to achieve the ideal balance between accuracy and performance.

Multi-Language Text Support

Language Support

Grooper recognizes 268 distinct languages and 523 regional cultures. Language detection interprets dates, times, currency names, numeric formats, and more.

Avoiding OCR on Electronic Text

Electronic Text

Grooper avoids optical character recognition altogether when dealing with original text-based files like Word, Excel, and Text PDFs. Instead, Grooper pulls complete and perfect text directly from the file.

how to get good ocr software results

Get the Best OCR Results Possible – Save Time & Expenses Now

Dealing with old or poor quality scanned documents is a pain. With Grooper, you can transform your OCR results, even when it looks like there is no hope.

In this recorded webinar, you will learn:

  • How our OCR tools differ from our competitors
  • What to do when OCR is returning inaccurate results — or worse — no results
  • How much image correction is too much?
  • How Grooper easily gets accurate data from forms and OMR/checkboxes

Save thousands of hours of work and get far better data!

Give it a Try

The Grooper Experience Will Change You

Imagine With Us