OCR for MultiPage TIFF Files
The OcrInput and automatically work with input TIFF files that conventional Tesseract cannot read.
Every frame of your TIFFs will be imported, creating a multipage IronOcr.OcrResult
document.
How to OCR TIFF Files
- Install an OCR library to add OCR TIFF files.
- Construct an
IronTesseract
object. - Create an
OcrInput
object. - Add the TIFF file with the
AddMultiFrameTiff
method. - Read text from the TIFF file with the
Read
method.
Here's a sample C# code to OCR a TIFF file using Iron Tesseract:
using IronOcr;
class Program
{
static void Main()
{
// Create an instance of IronTesseract.
// This object facilitates the OCR process.
var Ocr = new IronTesseract();
// Create an OcrInput object. This object is used to manage the documents or images to be processed.
var inputs = new OcrInput();
// Add a TIFF file to the OcrInput. The AddMultiFrameTiff method allows reading multi-page TIFF files.
inputs.AddMultiFrameTiff("example.tiff");
// Use the Read method of IronTesseract to perform OCR on the input images.
// This method returns an OcrResult object, which contains the recognized text.
OcrResult result = Ocr.Read(inputs);
// Output the result to the console. OcrResult.Text contains the recognized text.
System.Console.WriteLine(result.Text);
}
}
using IronOcr;
class Program
{
static void Main()
{
// Create an instance of IronTesseract.
// This object facilitates the OCR process.
var Ocr = new IronTesseract();
// Create an OcrInput object. This object is used to manage the documents or images to be processed.
var inputs = new OcrInput();
// Add a TIFF file to the OcrInput. The AddMultiFrameTiff method allows reading multi-page TIFF files.
inputs.AddMultiFrameTiff("example.tiff");
// Use the Read method of IronTesseract to perform OCR on the input images.
// This method returns an OcrResult object, which contains the recognized text.
OcrResult result = Ocr.Read(inputs);
// Output the result to the console. OcrResult.Text contains the recognized text.
System.Console.WriteLine(result.Text);
}
}
Imports IronOcr
Friend Class Program
Shared Sub Main()
' Create an instance of IronTesseract.
' This object facilitates the OCR process.
Dim Ocr = New IronTesseract()
' Create an OcrInput object. This object is used to manage the documents or images to be processed.
Dim inputs = New OcrInput()
' Add a TIFF file to the OcrInput. The AddMultiFrameTiff method allows reading multi-page TIFF files.
inputs.AddMultiFrameTiff("example.tiff")
' Use the Read method of IronTesseract to perform OCR on the input images.
' This method returns an OcrResult object, which contains the recognized text.
Dim result As OcrResult = Ocr.Read(inputs)
' Output the result to the console. OcrResult.Text contains the recognized text.
System.Console.WriteLine(result.Text)
End Sub
End Class
Explanation of the Code:
IronTesseract Instance: An instance of the
IronTesseract
class is created to facilitate OCR operations. This object handles the recognition process.OcrInput Object: This object is used to store the images or documents you want to perform OCR on. It can handle multiple formats and pages.
AddMultiFrameTiff Method: This method is used to add a TIFF file to the
OcrInput
. It specifically supports multi-page TIFFs, allowing you to process each page as part of a single operation.OCR Operation: The
Read
method of theIronTesseract
object is called with theOcrInput
. This performs the OCR and stores the results in anOcrResult
object.- Display Results: The recognized text from the TIFF file is then written to the console using
System.Console.WriteLine
. TheOcrResult.Text
property contains the textual content extracted from the image.
By following these steps, you can efficiently perform OCR on TIFF files, especially those with multiple pages, using the Iron OCR library.