Thursday, April 04, 2019

Extracting Text from an Image Using Tesseract in C#

Tesseract engine optical character recognition (OCR) is a technology used to convert scanned paper documents, PDF files, and images to searchable text data. The OCR engine detects the characters present in the image and puts those characters into words, enabling developers to search and edit the content of the document. Tesseract optical character recognition engine is one of the most accurate OCR engines currently available for .NET. It's licensed under Apache 2.0 and has been supported by Google since 2006. Tesseract OCR library is available for various different operating systems. In this article, I will demonstrate extracting image text using Tesseract and writing C# code under Windows OS.

No comments:

Post a Comment

Mocking API Responses in Azure API Management Portal

A mock API imitates a real API call by providing a realistic JSON or XML response to the requester. Mock APIs can be designed on a developer...