|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.pdfbox.util.PDFStreamEngine
org.pdfbox.util.PDFTextStripper
org.pdfbox.util.PDFText2HTML
public class PDFText2HTML
Wrap stripped text in simple HTML, trying to form HTML paragraphs. Paragraphs broken by pages, columns, or figures are not mended.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.pdfbox.util.PDFStreamEngine |
---|
PDFStreamEngine.StreamResources |
Field Summary |
---|
Fields inherited from class org.pdfbox.util.PDFTextStripper |
---|
charactersByArticle, output |
Fields inherited from class org.pdfbox.util.PDFStreamEngine |
---|
fontToAverageWidths, graphicsStack, operators, page, SPACE_BYTES, streamResourcesStack, textLineMatrix, textMatrix |
Constructor Summary | |
---|---|
PDFText2HTML()
Constructor. |
Method Summary | |
---|---|
void |
endDocument(PDDocument pdf)
This method is available for subclasses of this class. |
protected void |
endParagraph()
Write out the paragraph separator. |
protected void |
flushText()
This will print the text to the output stream. |
protected String |
getTitleGuess()
The guess to the document title. |
protected TextPosition |
guessTitle(Iterator textIter)
This method will attempt to guess the title of the document. |
boolean |
isSuppressParagraphs()
|
void |
setSuppressParagraphs(boolean shouldSuppressParagraphs)
|
protected void |
startParagraph()
Write out the paragraph separator. |
protected void |
writeCharacters(TextPosition position)
Write the string to the output stream. |
protected void |
writeHeader()
Write the header to the output document. |
Methods inherited from class org.pdfbox.util.PDFStreamEngine |
---|
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showString |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public PDFText2HTML() throws IOException
IOException
- If there is an error during initialization.Method Detail |
---|
protected void writeHeader() throws IOException
IOException
- If there is a problem writing out the header to the document.protected String getTitleGuess()
protected void flushText() throws IOException
PDFTextStripper
flushText
in class PDFTextStripper
IOException
- If there is an error writing the text.PDFTextStripper.flushText()
public void endDocument(PDDocument pdf) throws IOException
PDFTextStripper
endDocument
in class PDFTextStripper
pdf
- The PDF document that is being processed.
IOException
- If an IO error occurs.PDFTextStripper.endDocument( PDDocument )
protected TextPosition guessTitle(Iterator textIter)
textIter
- The characters on the first page.
protected void startParagraph() throws IOException
startParagraph
in class PDFTextStripper
IOException
- If there is an error writing to the stream.protected void endParagraph() throws IOException
endParagraph
in class PDFTextStripper
IOException
- If there is an error writing to the stream.protected void writeCharacters(TextPosition position) throws IOException
PDFTextStripper
writeCharacters
in class PDFTextStripper
position
- The text to write to the stream.
IOException
- If there is an error when writing the text.PDFTextStripper.writeCharacters( TextPosition )
public boolean isSuppressParagraphs()
public void setSuppressParagraphs(boolean shouldSuppressParagraphs)
shouldSuppressParagraphs
- The suppressParagraphs to set.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |