OCR for PDF Stream

IronOCR also supports Stream.

In this example, IronPDF is used to create a PDF Stream that can later be used for text recognition by IronOCR.

Please note that IronOCR can only read Stream as an input but does not support exporting Stream as an output file.

// Import the necessary namespaces
using IronPdf;
using IronOcr;
using System.IO;

class Program
{
    static void Main()
    {
        // Create a PDF document using IronPDF
        var pdfDocument = new HtmlToPdf().RenderHtmlAsPdf("<h1>Hello World</h1><p>This is a simple example of PDF generation.</p>");

        // Save the PDF as a stream
        using (MemoryStream pdfStream = new MemoryStream()) // Initialize a new memory stream
        {
            pdfDocument.Stream.CopyTo(pdfStream); // Copy the pdfDocument's stream to the memory stream
            pdfStream.Position = 0; // Reset the position of the memory stream to the beginning

            // Initialize IronOCR engine
            var Ocr = new IronTesseract();

            // Perform OCR on the PDF Stream
            using (var input = new OcrInput(pdfStream)) // Pass the PDF Stream to the OCR Input
            {
                var result = Ocr.Read(input); // Perform OCR and get the result
                Console.WriteLine(result.Text); // Output the text from the scanned document to the console
            }
        }
    }
}
// Import the necessary namespaces
using IronPdf;
using IronOcr;
using System.IO;

class Program
{
    static void Main()
    {
        // Create a PDF document using IronPDF
        var pdfDocument = new HtmlToPdf().RenderHtmlAsPdf("<h1>Hello World</h1><p>This is a simple example of PDF generation.</p>");

        // Save the PDF as a stream
        using (MemoryStream pdfStream = new MemoryStream()) // Initialize a new memory stream
        {
            pdfDocument.Stream.CopyTo(pdfStream); // Copy the pdfDocument's stream to the memory stream
            pdfStream.Position = 0; // Reset the position of the memory stream to the beginning

            // Initialize IronOCR engine
            var Ocr = new IronTesseract();

            // Perform OCR on the PDF Stream
            using (var input = new OcrInput(pdfStream)) // Pass the PDF Stream to the OCR Input
            {
                var result = Ocr.Read(input); // Perform OCR and get the result
                Console.WriteLine(result.Text); // Output the text from the scanned document to the console
            }
        }
    }
}
' Import the necessary namespaces
Imports IronPdf
Imports IronOcr
Imports System.IO

Friend Class Program
	Shared Sub Main()
		' Create a PDF document using IronPDF
		Dim pdfDocument = (New HtmlToPdf()).RenderHtmlAsPdf("<h1>Hello World</h1><p>This is a simple example of PDF generation.</p>")

		' Save the PDF as a stream
		Using pdfStream As New MemoryStream() ' Initialize a new memory stream
			pdfDocument.Stream.CopyTo(pdfStream) ' Copy the pdfDocument's stream to the memory stream
			pdfStream.Position = 0 ' Reset the position of the memory stream to the beginning

			' Initialize IronOCR engine
			Dim Ocr = New IronTesseract()

			' Perform OCR on the PDF Stream
			Using input = New OcrInput(pdfStream) ' Pass the PDF Stream to the OCR Input
				Dim result = Ocr.Read(input) ' Perform OCR and get the result
				Console.WriteLine(result.Text) ' Output the text from the scanned document to the console
			End Using
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

Explanation:

  • We use the IronPdf namespace to generate a PDF document from a simple HTML string.
  • The PDF document is converted to a Stream by copying it into a MemoryStream.
  • IronOcr's IronTesseract engine is used to read and perform OCR on the stream.
  • All memory streams are properly disposed of using using blocks to manage resources efficiently.
  • The OCR output is printed to the console, showing the recognized text from the PDF.

This example effectively demonstrates how to use streams with IronOCR and IronPDF for text recognition tasks within a .NET application.