at.knowcenter.wag.egov.egiz.pdf
Class TextualSignature

java.lang.Object
  extended by at.knowcenter.wag.egov.egiz.pdf.TextualSignature

public class TextualSignature
extends Object

Contains helper function for textual signatures.

Author:
wprinz

Constructor Summary
TextualSignature()
           
 
Method Summary
static String extractTextTextual(InputStream pdf_stream)
          Extracts the document text from a given pdf.
static byte[] normalizePDF(InputStream input_pdf)
          Normalizes a given binary PDF to a version PDFbox can handle correctly.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextualSignature

public TextualSignature()
Method Detail

extractTextTextual

public static String extractTextTextual(InputStream pdf_stream)
                                 throws PresentableException
Extracts the document text from a given pdf.

Parameters:
pdf_stream - The pdf_input stream.
Returns:
Returns the extracted document text.
Throws:
PresentableException - Forwarded exception.

normalizePDF

public static byte[] normalizePDF(InputStream input_pdf)
                           throws IOException,
                                  DocumentException
Normalizes a given binary PDF to a version PDFbox can handle correctly.

PDFbox has serious problems with documents that use incremental updates or XObject forms. Therefor use this to remove incremental updates and create a streamlined document.

Note that this has nothing to do with text normalization. It just unifies the PDF documents that are fed into PDFbox for text extraction and page length determination.

Parameters:
input_pdf - The input pdf to be normalized.
Returns:
Returns the normalized pdf.
Throws:
IOException
DocumentException


Copyright © 2006-2007 EGIZ - E-Government Innovationszentrum. All Rights Reserved.