|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectat.knowcenter.wag.egov.egiz.pdf.TextualSignature
public class TextualSignature
Contains helper function for textual signatures.
Constructor Summary | |
---|---|
TextualSignature()
|
Method Summary | |
---|---|
static String |
extractTextTextual(InputStream pdf_stream)
Extracts the document text from a given pdf. |
static byte[] |
normalizePDF(InputStream input_pdf)
Normalizes a given binary PDF to a version PDFbox can handle correctly. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public TextualSignature()
Method Detail |
---|
public static String extractTextTextual(InputStream pdf_stream) throws PresentableException
pdf_stream
- The pdf_input stream.
PresentableException
- Forwarded exception.public static byte[] normalizePDF(InputStream input_pdf) throws IOException, DocumentException
PDFbox has serious problems with documents that use incremental updates or XObject forms. Therefor use this to remove incremental updates and create a streamlined document.
Note that this has nothing to do with text normalization. It just unifies the PDF documents that are fed into PDFbox for text extraction and page length determination.
input_pdf
- The input pdf to be normalized.
IOException
DocumentException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |