|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.pdfbox.searchengine.lucene.LucenePDFDocument
public final class LucenePDFDocument
This class is used to create a document for the lucene search engine. This should easily plug into the IndexHTML or IndexFiles that comes with the lucene project. This class will populate the following fields.
Lucene Field Name | Description |
path | File system path if loaded from a file |
url | URL to PDF document |
contents | Entire contents of PDF document, indexed but not stored |
summary | First 500 characters of content |
modified | The modified date/time according to the url or path |
uid | A unique identifier for the Lucene document. |
CreationDate | From PDF meta-data if available |
Creator | From PDF meta-data if available |
Keywords | From PDF meta-data if available |
ModificationDate | From PDF meta-data if available |
Producer | From PDF meta-data if available |
Subject | From PDF meta-data if available |
Trapped | From PDF meta-data if available |
Method Summary | |
---|---|
static org.apache.lucene.document.Document |
getDocument(File file)
This will get a lucene document from a PDF file. |
static org.apache.lucene.document.Document |
getDocument(InputStream is)
This will get a lucene document from a PDF file. |
static org.apache.lucene.document.Document |
getDocument(URL url)
This will get a lucene document from a PDF file. |
static void |
main(String[] args)
This will test creating a document. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static org.apache.lucene.document.Document getDocument(InputStream is) throws IOException
is
- The stream to read the PDF from.
IOException
- If there is an error parsing or indexing the document.public static org.apache.lucene.document.Document getDocument(File file) throws IOException
file
- The file to get the document for.
IOException
- If there is an error parsing or indexing the document.public static org.apache.lucene.document.Document getDocument(URL url) throws IOException
url
- The file to get the document for.
IOException
- If there is an error parsing or indexing the document.public static void main(String[] args) throws IOException
args
- command line arguments.
IOException
- If there is an error.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |