How to Save Results as hOCR in an HTML File

hOCR, which stands for "HTML-based OCR," is a file format used to represent the results of Optical Character Recognition (OCR) in a structured manner. HOCR files are typically written in HTML (Hypertext Markup Language) and provide a way to store recognized text, layout information, and the coordinates of each recognized character within an image or document.

IronOCR provides a solution for performing optical character recognition on documents and exporting the results as hOCR in HTML format. It supports both HTML files and strings.

Get started with IronOCR

Start using IronOCR in your project today with a free trial.

First Step:
green arrow pointer



Export Result as hOCR Example

To export the result as hOCR, the user must first enable the Configuration.RenderHocr property by setting it to true. After obtaining the OCR result object from the Read method, use the SaveAsHocrFile method to export the OCR result as HTML. This method will output an HTML file that contains the reading result of the input documents. The code below demonstrates using the following sample TIFF file.

:path=/static-assets/ocr/content-code-examples/how-to/html-export-export-html.cs
using IronOcr;

// Instantiate IronTesseract, the main OCR engine.
var ocrTesseract = new IronTesseract();

// Enable rendering the output as hOCR, which is an HTML format for OCR results.
ocrTesseract.Configuration.RenderHocr = true;

// Load the image to be processed. OcrInput is used to specify input images and options.
using (var imageInput = new OcrInput("Potter.tiff"))
{
    // Optionally, set a title for the HTML output.
    imageInput.InputTitle = "Html Title";

    // Perform OCR on the input image and obtain the result.
    OcrResult ocrResult = ocrTesseract.Read(imageInput);

    // Save the OCR result as an hOCR HTML file.
    ocrResult.SaveAsHocrFile("result.html");
}
$vbLabelText   $csharpLabel

Export Result as HTML String

Using the same TIFF sample image, you can utilize the SaveAsHocrString method to export the OCR result as an HTML string. This method will return an HTML string.

:path=/static-assets/ocr/content-code-examples/how-to/html-export-export-html-string.cs
// This line of code is responsible for exporting OCR (Optical Character Recognition) results as an HTML string format known as hOCR.
// Assumes that `ocrResult` is an instance of a class which includes the method `SaveAsHocrString()`. This method is intended 
// to export the OCR results in the hOCR format.
// Ensure that `ocrResult` is initialized correctly and that the class it is instantiated from has the `SaveAsHocrString()` method implemented.

// Call the method to convert OCR results into hOCR format and store it in the `hocr` variable.
string hocr = ocrResult.SaveAsHocrString();
$vbLabelText   $csharpLabel

Frequently Asked Questions

What is hOCR?

hOCR stands for 'HTML-based OCR,' a file format used to represent the results of Optical Character Recognition in a structured manner. It is typically written in HTML and stores recognized text, layout information, and coordinates of each character within an image or document.

How does IronOCR support hOCR?

IronOCR allows users to perform optical character recognition on documents and export the results as hOCR in HTML format. It supports both HTML files and strings.

What steps are needed to save results as hOCR in an HTML file using IronOCR?

First, download the IronOCR C# library. Prepare your image or PDF document, set the RenderHocr property to true, and then use the SaveAsHocrFile method to output an HTML file.

How can I export OCR results as an HTML string?

To export OCR results as an HTML string, set the RenderHocr property to true and use the SaveAsHocrString method. This method will return the OCR result as an HTML string.

Can IronOCR export OCR results from both images and PDFs?

Yes, IronOCR can process both images and PDF documents to perform OCR and export the results as hOCR in HTML format.

What programming language is required to use IronOCR?

IronOCR is used with .NET C#, making it suitable for developers working within the C# programming environment.

Is there an example code to save OCR results as hOCR using IronOCR?

Yes, the documentation provides example C# code demonstrating how to use IronOCR to read text from an image file and save the OCR results as an hOCR file.

What is the RenderHocr property used for?

The RenderHocr property is used to enable or disable the output of OCR results in the hOCR format. It should be set to true to export results as hOCR.

Chaknith related to Export Result as HTML String
Software Engineer
Chaknith is the Sherlock Holmes of developers. It first occurred to him he might have a future in software engineering, when he was doing code challenges for fun. His focus is on IronXL and IronBarcode, but he takes pride in helping customers with every product. Chaknith leverages his knowledge from talking directly with customers, to help further improve the products themselves. His anecdotal feedback goes beyond Jira tickets and supports product development, documentation and marketing, to improve customer’s overall experience.When he isn’t in the office, he can be found learning about machine learning, coding and hiking.