at.knowcenter.wag.egov.egiz.pdf
Class AbsoluteTextSignature

java.lang.Object
  extended by at.knowcenter.wag.egov.egiz.pdf.AbsoluteTextSignature

public class AbsoluteTextSignature
extends Object

Contains methods and helpers that implement the absolute text signature.

Author:
wprinz

Constructor Summary
AbsoluteTextSignature()
           
 
Method Summary
static void checkBlockIntegrity(String text, FoundBlock found_block)
          Checks the integrity of a found block.
static FoundBlock chooseMostPossibleBlock(List found_blocks)
          Chooses the most possible (best choice) block of the list of blocks.
static SignatureObject createSignatureObjectFromFoundBlock(String text, FoundBlock found_block)
          Creates a SignatureObject from a found block by extracting the corresponding values.
static String cutOutBlock(String text, FoundBlock block)
          Cuts out the given found block from the text.
static SignatureHolder extractLatestBlock(String text)
          Extracts the latest signature block from the given text and creates a SignatureHolder object that can be verified.
static List extractSignatureHoldersFromText(String text)
          Extracts all signature holders from a given text.
static List filterHorizontallyLargestBlocks(List found_blocks)
          Filters out all blocks but the horizonally largest ones.
static List filterLastDateEqualBlocks(String text, List found_blocks)
          Given a List of FoundBlock objects, this method returns the last blocks of this list that have the same date.
static List filterVerticallyLargestBlocks(List found_blocks)
          Filters out all blocks but the vertically largest ones.
static int findEndOfValue(String text, int start_index)
          Finds the end of the value in the text.
static List findIndicesWithStartingNL(String text, String subtext)
          Finds all indices of the given subtext (starting at a new line) within a given text.
static FoundBlock findLatestBlock(String text)
          Finds the latest signature block for a given text.
static List findPotentialSignaturesForProfile(String text, SignatureTypeDefinition block_type)
          Finds the List of potential blocks within the given text for the given profile.
static List findRestKeys(String text, List keys, List captions, int last_caption_index)
          Finds the other keys/captions according to their order starting from the last_caption index upwards.
static EGIZDate getDateFromFoundBlock(String text, FoundBlock found_block)
          Parses the EGIZDate from a found block and the given text.
static String getDateValue(String text, FoundBlock block)
          Returns the value of the date field as String.
protected static boolean isHorizontallyEqual(FoundBlock fb0, FoundBlock fb1)
           
protected static boolean isHorizontallyLarger(FoundBlock fb0, FoundBlock fb1)
           
static boolean reverseCheckFoundKeys(String text, List found_keys)
          Performs a reverse (top to bottom) search for the found keys and checks that these indices are the same as those that were found during the regular (bottom up) search.
static void sortFoundBlocksByDate(String text, List found_blocks)
          Sorts the List of found blocks by date.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbsoluteTextSignature

public AbsoluteTextSignature()
Method Detail

extractSignatureHoldersFromText

public static List extractSignatureHoldersFromText(String text)
                                            throws SignatureException,
                                                   SignatureTypesException
Extracts all signature holders from a given text.

First the latest signature holder is extracted. Then the latest signature holder in the rest text, which is the second latest one, is extracted. Then the third latest signature holder is extracted and so forth until no more signature holders are found.

Parameters:
text - The text.
Returns:
Returns the List of extracted signature holders ordered by their date ascendingly (the lowest, earliest date first, the latest, newest date last). An empty list is returned if no signature holders were found.
Throws:
SignatureException - F.e.
SignatureTypesException - F.e.

extractLatestBlock

public static SignatureHolder extractLatestBlock(String text)
                                          throws SignatureException,
                                                 SignatureTypesException
Extracts the latest signature block from the given text and creates a SignatureHolder object that can be verified.

Parameters:
text - The text.
Returns:
Returns the SignatureObject extracted from the text, or null, if no latest block was found.
Throws:
SignatureException - F.e.
SignatureTypesException - F.e.

findLatestBlock

public static FoundBlock findLatestBlock(String text)
                                  throws SignatureException,
                                         SignatureTypesException
Finds the latest signature block for a given text.

The latest block is the one with the highest, most recent date. Usually this block will be extracted (cut out) of the text which will result in the originally signed text of this signature to be verified using the cut out data.

Parameters:
text - The text to be analyzed.
Returns:
Returns the latest found block or null, if there was none.
Throws:
SignatureException - F.e.
SignatureTypesException - F.e.

findPotentialSignaturesForProfile

public static List findPotentialSignaturesForProfile(String text,
                                                     SignatureTypeDefinition block_type)
Finds the List of potential blocks within the given text for the given profile.

Parameters:
text - The text, in which potential block are to be sought.
block_type - The profile for which the text is to be sought.
Returns:
Returns the List of potential FoundBlocks or an empty List if none could be found.

findIndicesWithStartingNL

public static List findIndicesWithStartingNL(String text,
                                             String subtext)
Finds all indices of the given subtext (starting at a new line) within a given text.

This is usually used to find the indices of the last captions.

Parameters:
text - The text to be searched.
subtext - The subtext to be sought.
Returns:
Returns the List of found indices.

findRestKeys

public static List findRestKeys(String text,
                                List keys,
                                List captions,
                                int last_caption_index)
Finds the other keys/captions according to their order starting from the last_caption index upwards.

Parameters:
text - The text.
keys - The list of keys.
captions - The list of captions.
last_caption_index - The index of the last caption.
Returns:
Returns the List of found keys, if all keys could be found, or null if not all keys could be found.

reverseCheckFoundKeys

public static boolean reverseCheckFoundKeys(String text,
                                            List found_keys)
Performs a reverse (top to bottom) search for the found keys and checks that these indices are the same as those that were found during the regular (bottom up) search.

If a reverse check proves that the found keys are not at the same positions as during regular search, this list of found keys should be discarded.

Parameters:
text - The text.
found_keys - The found keys to be reversely checked.
Returns:
Returns true, if all (also the non required) captions could be found at the same indices as during regular search, false otherwise.

findEndOfValue

public static int findEndOfValue(String text,
                                 int start_index)
Finds the end of the value in the text.

This simply scans for a '\n' from a given start index. The line up to and inclusive the '\n' is considered to be the value.

Note that this method does NOT find the accurate value, if the value goes over multiple lines! This may bear a serious problem. Usually this method is only used to finding the end of the last value in a found block, because mid- values are exactly determined by their start index and the start of the next caption. Nevertheless, if the last value spans over multiple lines, this method will not retrieve it completely.

Parameters:
text - The text.
start_index - The start index from where the end of the value is sought.
Returns:
Returns the end index of the value, which is the index of the first character not belonging to the value anymore (the character after the '\n').

checkBlockIntegrity

public static void checkBlockIntegrity(String text,
                                       FoundBlock found_block)
Checks the integrity of a found block.

This is an assertive function.

Parameters:
text - The text.
found_block - The found block.

cutOutBlock

public static String cutOutBlock(String text,
                                 FoundBlock block)
Cuts out the given found block from the text.

Parameters:
text - The text.
block - The found block.
Returns:
Returns the rest text without the block.

getDateValue

public static String getDateValue(String text,
                                  FoundBlock block)
Returns the value of the date field as String.

Parameters:
text - The text.
block - The found block.
Returns:
Returns the date value.

createSignatureObjectFromFoundBlock

public static SignatureObject createSignatureObjectFromFoundBlock(String text,
                                                                  FoundBlock found_block)
                                                           throws SignatureTypesException,
                                                                  SignatureException
Creates a SignatureObject from a found block by extracting the corresponding values.

Parameters:
text - The text.
found_block - The found block.
Returns:
Returns the created SignatureObject.
Throws:
SignatureTypesException - F.e.
SignatureException - F.e.

getDateFromFoundBlock

public static EGIZDate getDateFromFoundBlock(String text,
                                             FoundBlock found_block)
Parses the EGIZDate from a found block and the given text.

Parameters:
text - The text.
found_block - The found block.
Returns:
Returns the parsed EGIZDate.

sortFoundBlocksByDate

public static void sortFoundBlocksByDate(String text,
                                         List found_blocks)
Sorts the List of found blocks by date.

Parameters:
text - The text.
found_blocks - The List of found blocks.

filterLastDateEqualBlocks

public static List filterLastDateEqualBlocks(String text,
                                             List found_blocks)
Given a List of FoundBlock objects, this method returns the last blocks of this list that have the same date.

Usually a date sorted list (earliest first, latest last) will be provided to this method. Then the last date equal blocks are returned, which are the last blocks.

Parameters:
text - The text to retrieve the values of the fields from.
found_blocks - The List of FoundBlock objects.
Returns:
Returns the List of the last date equal blocks.

chooseMostPossibleBlock

public static FoundBlock chooseMostPossibleBlock(List found_blocks)
                                          throws SignatureException
Chooses the most possible (best choice) block of the list of blocks.

The strategy to find the most possible block is to choose the very one block with the maximum number of captions. This block has extracted most information from the text.

If there are still multiple blocks with the same number of cations, the blocks are compared caption-wise. The block with all captions being longer or equal to all other blocks' captions wins.

Parameters:
found_blocks - The List of semantically equal blocks.
Returns:
Returns the best choice FoundBlock.
Throws:
SignatureException

filterVerticallyLargestBlocks

public static List filterVerticallyLargestBlocks(List found_blocks)
Filters out all blocks but the vertically largest ones.

A vertically largest block has the most found keys.

Parameters:
found_blocks - The List of FoundBlock objects to be filtered.
Returns:
Returns the List of the vertically largest FoundBlock objects.

filterHorizontallyLargestBlocks

public static List filterHorizontallyLargestBlocks(List found_blocks)
                                            throws SignatureException
Filters out all blocks but the horizonally largest ones.

A vertically largest block has the most found keys.

Parameters:
found_blocks - The List of FoundBlock objects to be filtered. All of these FoundBlock objects must have the same number of found keys.
Returns:
Returns the List of the horizontally largest FoundBlock objects.
Throws:
SignatureException

isHorizontallyEqual

protected static boolean isHorizontallyEqual(FoundBlock fb0,
                                             FoundBlock fb1)

isHorizontallyLarger

protected static boolean isHorizontallyLarger(FoundBlock fb0,
                                              FoundBlock fb1)


Copyright © 2006-2007 EGIZ - E-Government Innovationszentrum. All Rights Reserved.