|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectat.knowcenter.wag.exactparser.parsing.PDFUtils
public abstract class PDFUtils
Abstract class that contains several static utility methods for parsing and analyzing PDF documents on the lowest level.
Most operations require random access to the PDF data (mostly to verify the synthax). So the whole PDF document has to be provided as a byte array. The term "pdf+index" states a specific position index within this byte array.
Field Summary | |
---|---|
protected static byte[] |
LINE_TERMINATOR_CRALONE
|
protected static byte[] |
LINE_TERMINATOR_CRLF
|
protected static byte[] |
LINE_TERMINATOR_LF
|
Constructor Summary | |
---|---|
PDFUtils()
|
Method Summary | |
---|---|
static int |
findLastStartXRef(byte[] pdf)
Searches the last occurrence of the "startxref" entry ... in other words starts the search from the end of the document and works reversely. |
static int |
getObjectOffsetFromXRefByIndirectObjectReference(XRefSectionParseResult xpr,
IndirectObjectReference ior)
|
static int |
indexOfName(byte[] pdf,
List names,
byte[] sought)
|
static boolean |
isDelimiter(byte data)
|
protected static boolean |
isHex(byte data)
|
static boolean |
isIndirectObjectReference(byte[] pdf,
int index)
|
static boolean |
isNewline(byte[] data,
int index)
|
static boolean |
isNumeric(byte data)
|
protected static boolean |
isRegular(byte data)
|
static boolean |
isSign(byte data)
|
static boolean |
isWhitespace(byte data)
|
static ArrayParseResult |
parseArray(byte[] pdf,
int index)
|
static BooleanParseResult |
parseBoolean(byte[] pdf,
int index)
Parses a boolean value. |
static DictionaryParseResult |
parseDictionary(byte[] pdf,
int index)
|
static EOFParseResult |
parseEOF(byte[] pdf,
int index)
Parses the End Of File (EOF) marker at pdf+index. |
static FooterParseResult |
parseFooter(byte[] pdf,
int index)
Parses a PDF footer. |
static HeaderParseResult |
parseHeader(byte[] pdf,
int index)
|
static HexStringParseResult |
parseHexString(byte[] pdf,
int index)
Parses a hexadecimal string. |
static IndirectObjectReferenceParseResult |
parseIndirectObjectReference(byte[] pdf,
int index)
Parses an indirect object reference. |
static IntegerParseResult |
parseInteger(byte[] pdf,
int index)
Parses a (potentially) signed integer. |
static LiteralStringParseResult |
parseLiteralString(byte[] pdf,
int index)
Parses a literal string. |
static NameParseResult |
parseName(byte[] pdf,
int index)
Parses a PDF Name. |
static NullParseResult |
parseNull(byte[] pdf,
int index)
|
static NumberParseResult |
parseNumberFromByteArray(byte[] pdf,
int index)
Parses an arbitrary number; |
static ObjectParseResult |
parseObject(byte[] pdf,
int index)
|
static ObjectHeaderParseResult |
parseObjectHeader(byte[] pdf,
int index)
Parses the object header at pdf+index. |
static StartXRefParseResult |
parseStartXRef(byte[] pdf,
int index)
Parses the startxref section at pdf+index. |
static StreamParseResult |
parseStream(byte[] pdf,
int index,
DictionaryParseResult dpr)
Parses a stream. |
static TrailerParseResult |
parseTrailer(byte[] pdf,
int index)
|
static ParseResult |
parseUnknownObject(byte[] pdf,
int index)
|
static IntegerParseResult |
parseUnsignedInteger(byte[] pdf,
int index)
Parses an unsigned integer. |
static XRefLineParseResult |
parseXrefLine(byte[] pdf,
int index)
Parses a single 20 bytes xref line at pdf+index. |
static XRefSectionParseResult |
parseXRefSection(byte[] pdf,
int index)
Parses the xref section at pdf+index. |
static XRefSubSectionParseResult |
parseXRefSubSection(byte[] pdf,
int index)
Parses a xref sub-section. |
static int |
readNumberFromByteArray(byte[] data,
int index)
Reads the (positive integer) number from the data. |
static int |
skipNewline(byte[] data,
int index)
|
static int |
skipToNewline(byte[] data,
int index)
|
static int |
skipToWhitespace(byte[] data,
int index)
Skips bytes until whitespace is reached. |
static int |
skipWhitespace(byte[] data,
int index)
Skips whitespace. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final byte[] LINE_TERMINATOR_CRLF
protected static final byte[] LINE_TERMINATOR_CRALONE
protected static final byte[] LINE_TERMINATOR_LF
Constructor Detail |
---|
public PDFUtils()
Method Detail |
---|
public static boolean isWhitespace(byte data)
public static boolean isDelimiter(byte data)
protected static boolean isRegular(byte data)
public static int skipWhitespace(byte[] data, int index)
Skips all whitespace, which may be none, one or multiple whitespace characters.
Note that this also skips newline characters (which belong to whitespace as well).
data
- The PDF data.index
- The index.
public static int skipToWhitespace(byte[] data, int index)
Skips all non whitespace characters, which may be none at all.
data
- The PDF data.index
- The index.
public static boolean isNewline(byte[] data, int index)
public static int skipNewline(byte[] data, int index)
public static int skipToNewline(byte[] data, int index)
public static BooleanParseResult parseBoolean(byte[] pdf, int index)
pdf
- The PDF data.index
- The index.
public static boolean isSign(byte data)
public static boolean isNumeric(byte data)
public static int readNumberFromByteArray(byte[] data, int index)
data
- The data.index
- The index.
public static IntegerParseResult parseUnsignedInteger(byte[] pdf, int index)
The integer must be a block of successive number characters. It must not be preceded by a sign (not even '+').
pdf
- The PDF data.index
- The index.
public static IntegerParseResult parseInteger(byte[] pdf, int index)
The integer must be a block of successive number characters. It may be preceded by a sign character ('+' or '-').
pdf
- The PDF data.index
- The index.
public static NumberParseResult parseNumberFromByteArray(byte[] pdf, int index)
pdf
- The PDF data.index
- The index.
public static int findLastStartXRef(byte[] pdf)
pdf
- The complete PDF file data.
public static XRefSectionParseResult parseXRefSection(byte[] pdf, int index)
An xref section starts with 'xref' and contains one or more xref sub-sections.
pdf
- The PDF data.index
- The start index of the xref table.
public static XRefSubSectionParseResult parseXRefSubSection(byte[] pdf, int index)
pdf
- The PDF data.index
- The index.
public static XRefLineParseResult parseXrefLine(byte[] pdf, int index)
pdf
- The PDF data.index
- The index.
public static int indexOfName(byte[] pdf, List names, byte[] sought)
public static TrailerParseResult parseTrailer(byte[] pdf, int index)
public static StartXRefParseResult parseStartXRef(byte[] pdf, int index)
pdf
- The complete PDF file data.index
- The index of the startxref section.
public static EOFParseResult parseEOF(byte[] pdf, int index)
pdf
- The PDF data.index
- The index where to start the parsing.
public static boolean isIndirectObjectReference(byte[] pdf, int index)
public static IndirectObjectReferenceParseResult parseIndirectObjectReference(byte[] pdf, int index)
pdf
- The PDF data.index
- The index.
public static ObjectHeaderParseResult parseObjectHeader(byte[] pdf, int index)
pdf
- The PDF data.index
- The index.
public static ObjectParseResult parseObject(byte[] pdf, int index)
public static ParseResult parseUnknownObject(byte[] pdf, int index)
public static LiteralStringParseResult parseLiteralString(byte[] pdf, int index)
A literal string is a string of ASCII characters enclosed by '(' and ')'. Balanced pairs of '(' and ')' are allowed within the string. Unbalanced '(' or ')' must be escaped as '\(' or '\)'.
pdf
- The PDF data.index
- The index.
protected static boolean isHex(byte data)
public static HexStringParseResult parseHexString(byte[] pdf, int index)
pdf
- The PDF data.index
- The index.
public static ArrayParseResult parseArray(byte[] pdf, int index)
public static NameParseResult parseName(byte[] pdf, int index)
pdf
- The PDF data.index
- The index.
public static DictionaryParseResult parseDictionary(byte[] pdf, int index)
public static StreamParseResult parseStream(byte[] pdf, int index, DictionaryParseResult dpr)
pdf
- The PDF data.index
- The index.dpr
- The DictionaryParseResult of the stream's dictionary. This
dictionary must precede the stream keyword. Usually this is
provided in the stream object's dictionary via the /Length field.
public static NullParseResult parseNull(byte[] pdf, int index)
public static int getObjectOffsetFromXRefByIndirectObjectReference(XRefSectionParseResult xpr, IndirectObjectReference ior)
public static HeaderParseResult parseHeader(byte[] pdf, int index)
public static FooterParseResult parseFooter(byte[] pdf, int index)
A PDF footer starts with the xref, followed by the trailer, the startxref and the EOF marker.
pdf
- The PDF data.index
- The index.
FooterParseResult
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |