Package org.apache.pdfbox.cos
Class COSDocument
- java.lang.Object
-
- org.apache.pdfbox.cos.COSBase
-
- org.apache.pdfbox.cos.COSDocument
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,COSObjectable
public class COSDocument extends COSBase implements java.io.Closeable
This is the in-memory representation of the PDF document. You need to call close() on this object when you are done using it!!
-
-
Field Summary
Fields Modifier and Type Field Description private boolean
closed
private long
highestXRefObjectNumber
Used for incremental saving, to avoid XRef object numbers from being reused.private boolean
isDecrypted
Signal that document is already decrypted.private boolean
isXRefStream
private static org.apache.commons.logging.Log
LOG
Log instance.private java.util.Map<COSObjectKey,COSObject>
objectPool
Maps ObjectKeys to a COSObject.private ScratchFile
scratchFile
private long
startXref
private java.util.List<COSStream>
streams
List containing all streams which are created when creating a new pdf.private COSDictionary
trailer
Document trailer dictionary.private float
version
private boolean
warnMissingClose
private java.util.Map<COSObjectKey,java.lang.Long>
xrefTable
Maps object and generation id to object byte offsets.
-
Constructor Summary
Constructors Constructor Description COSDocument()
Constructor.COSDocument(ScratchFile scratchFile)
Constructor that will use the provide memory handler for storage of the PDF streams.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description java.lang.Object
accept(ICOSVisitor visitor)
visitor pattern double dispatch method.void
addXRefTable(java.util.Map<COSObjectKey,java.lang.Long> xrefTableValues)
Populate XRef HashMap with given values.void
close()
This will close all storage and delete the tmp files.COSStream
createCOSStream()
Creates a new COSStream using the current configuration for scratch files.COSStream
createCOSStream(COSDictionary dictionary)
Creates a new COSStream using the current configuration for scratch files.void
dereferenceObjectStreams()
This method will search the list of objects for types of ObjStm.protected void
finalize()
Warn the user in the finalizer if he didn't close the PDF document.COSObject
getCatalog()
Deprecated.usePDDocument.getDocumentCatalog()
instead.COSArray
getDocumentID()
This will get the document ID.COSDictionary
getEncryptionDictionary()
This will get the encryption dictionary if the document is encrypted or null if the document is not encrypted.long
getHighestXRefObjectNumber()
Internal PDFBox use only.COSObjectKey
getKey(COSBase object)
Returns the COSObjectKey for a given COS object, or null if there is none.COSObject
getObjectByType(COSName type)
This will get the first dictionary object by type.COSObject
getObjectFromPool(COSObjectKey key)
This will get an object from the pool.java.util.List<COSObject>
getObjects()
This will get a list of all available objects.java.util.List<COSObject>
getObjectsByType(java.lang.String type)
This will get all dictionary objects by type.java.util.List<COSObject>
getObjectsByType(COSName type)
This will get a dictionary object by type.long
getStartXref()
Return the startXref Position of the parsed document.COSDictionary
getTrailer()
This will get the document trailer.float
getVersion()
This will get the version extracted from the header of this PDF document.java.util.Map<COSObjectKey,java.lang.Long>
getXrefTable()
Returns the xrefTable which is a mapping of ObjectKeys to byte offsets in the file.boolean
isClosed()
Returns true if this document has been closed.boolean
isDecrypted()
Indicates if a encrypted pdf is already decrypted after parsing.boolean
isEncrypted()
This will tell if this is an encrypted document.boolean
isXRefStream()
Determines if the trailer is a XRef stream or not.void
print()
This will print contents to stdout.COSObject
removeObject(COSObjectKey key)
Removes an object from the object pool.void
setDecrypted()
Signals that the document is decrypted completely.void
setDocumentID(COSArray id)
This will set the document ID.void
setEncryptionDictionary(COSDictionary encDictionary)
This will set the encryption dictionary, this should only be called when encrypting the document.void
setHighestXRefObjectNumber(long highestXRefObjectNumber)
Internal PDFBox use only.void
setIsXRefStream(boolean isXRefStreamValue)
Sets isXRefStream to the given value.void
setStartXref(long startXrefValue)
This method set the startxref value of the document.void
setTrailer(COSDictionary newTrailer)
// MIT added, maybe this should not be supported as trailer is a persistence construct.void
setVersion(float versionValue)
This will set the header version of this PDF document.void
setWarnMissingClose(boolean warn)
Controls whether this instance shall issue a warning if the PDF document wasn't closed properly through a call to theclose()
method.-
Methods inherited from class org.apache.pdfbox.cos.COSBase
getCOSObject, isDirect, setDirect
-
-
-
-
Field Detail
-
LOG
private static final org.apache.commons.logging.Log LOG
Log instance.
-
version
private float version
-
objectPool
private final java.util.Map<COSObjectKey,COSObject> objectPool
Maps ObjectKeys to a COSObject. Note that references to these objects are also stored in COSDictionary objects that map a name to a specific object.
-
xrefTable
private final java.util.Map<COSObjectKey,java.lang.Long> xrefTable
Maps object and generation id to object byte offsets.
-
streams
private final java.util.List<COSStream> streams
List containing all streams which are created when creating a new pdf.
-
trailer
private COSDictionary trailer
Document trailer dictionary.
-
warnMissingClose
private boolean warnMissingClose
-
isDecrypted
private boolean isDecrypted
Signal that document is already decrypted.
-
startXref
private long startXref
-
closed
private boolean closed
-
isXRefStream
private boolean isXRefStream
-
scratchFile
private ScratchFile scratchFile
-
highestXRefObjectNumber
private long highestXRefObjectNumber
Used for incremental saving, to avoid XRef object numbers from being reused.
-
-
Constructor Detail
-
COSDocument
public COSDocument()
Constructor. Uses main memory to buffer PDF streams.
-
COSDocument
public COSDocument(ScratchFile scratchFile)
Constructor that will use the provide memory handler for storage of the PDF streams.- Parameters:
scratchFile
- memory handler for buffering of PDF streams
-
-
Method Detail
-
createCOSStream
public COSStream createCOSStream()
Creates a new COSStream using the current configuration for scratch files.- Returns:
- the new COSStream
-
createCOSStream
public COSStream createCOSStream(COSDictionary dictionary)
Creates a new COSStream using the current configuration for scratch files. Not for public use. Only COSParser should call this method.- Parameters:
dictionary
- the corresponding dictionary- Returns:
- the new COSStream
-
getObjectByType
public COSObject getObjectByType(COSName type) throws java.io.IOException
This will get the first dictionary object by type.- Parameters:
type
- The type of the object.- Returns:
- This will return an object with the specified type.
- Throws:
java.io.IOException
- If there is an error getting the object
-
getObjectsByType
public java.util.List<COSObject> getObjectsByType(java.lang.String type) throws java.io.IOException
This will get all dictionary objects by type.- Parameters:
type
- The type of the object.- Returns:
- This will return an object with the specified type.
- Throws:
java.io.IOException
- If there is an error getting the object
-
getObjectsByType
public java.util.List<COSObject> getObjectsByType(COSName type) throws java.io.IOException
This will get a dictionary object by type.- Parameters:
type
- The type of the object.- Returns:
- This will return an object with the specified type.
- Throws:
java.io.IOException
- If there is an error getting the object
-
getKey
public COSObjectKey getKey(COSBase object)
Returns the COSObjectKey for a given COS object, or null if there is none. This lookup iterates over all objects in a PDF, which may be slow for large files.- Parameters:
object
- COS object- Returns:
- key
-
print
public void print()
This will print contents to stdout.
-
setVersion
public void setVersion(float versionValue)
This will set the header version of this PDF document.- Parameters:
versionValue
- The version of the PDF document.
-
getVersion
public float getVersion()
This will get the version extracted from the header of this PDF document.- Returns:
- The header version.
-
setDecrypted
public void setDecrypted()
Signals that the document is decrypted completely.
-
isDecrypted
public boolean isDecrypted()
Indicates if a encrypted pdf is already decrypted after parsing.- Returns:
- true indicates that the pdf is decrypted.
-
isEncrypted
public boolean isEncrypted()
This will tell if this is an encrypted document.- Returns:
- true If this document is encrypted.
-
getEncryptionDictionary
public COSDictionary getEncryptionDictionary()
This will get the encryption dictionary if the document is encrypted or null if the document is not encrypted.- Returns:
- The encryption dictionary.
-
setEncryptionDictionary
public void setEncryptionDictionary(COSDictionary encDictionary)
This will set the encryption dictionary, this should only be called when encrypting the document.- Parameters:
encDictionary
- The encryption dictionary.
-
getDocumentID
public COSArray getDocumentID()
This will get the document ID.- Returns:
- The document id.
-
setDocumentID
public void setDocumentID(COSArray id)
This will set the document ID.- Parameters:
id
- The document id.
-
getCatalog
public COSObject getCatalog() throws java.io.IOException
Deprecated.usePDDocument.getDocumentCatalog()
instead.This will get the document catalog.- Returns:
- The catalog is the root of the document; never null.
- Throws:
java.io.IOException
- If no catalog can be found.
-
getObjects
public java.util.List<COSObject> getObjects()
This will get a list of all available objects. This method works only for loaded PDFs. It will return an empty list for PDFs created from scratch (this includes PDFs generated within PDFBox, e.g. bySplitter
). This method will be removed in 3.0.- Returns:
- A list of all objects, never null.
-
getTrailer
public COSDictionary getTrailer()
This will get the document trailer.- Returns:
- the document trailer dict
-
setTrailer
public void setTrailer(COSDictionary newTrailer)
// MIT added, maybe this should not be supported as trailer is a persistence construct. This will set the document trailer.- Parameters:
newTrailer
- the document trailer dictionary
-
getHighestXRefObjectNumber
public long getHighestXRefObjectNumber()
Internal PDFBox use only. Get the object number of the highest XRef stream. This is needed to avoid reusing such a number in incremental saving.- Returns:
- The object number of the highest XRef stream, or 0 if there was no XRef stream.
-
setHighestXRefObjectNumber
public void setHighestXRefObjectNumber(long highestXRefObjectNumber)
Internal PDFBox use only. Sets the object number of the highest XRef stream. This is needed to avoid reusing such a number in incremental saving.- Parameters:
highestXRefObjectNumber
- The object number of the highest XRef stream.
-
accept
public java.lang.Object accept(ICOSVisitor visitor) throws java.io.IOException
visitor pattern double dispatch method.
-
close
public void close() throws java.io.IOException
This will close all storage and delete the tmp files.- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
- If there is an error close resources.
-
isClosed
public boolean isClosed()
Returns true if this document has been closed.- Returns:
- true if the document has been closed.
-
finalize
protected void finalize() throws java.io.IOException
Warn the user in the finalizer if he didn't close the PDF document. The method also closes the document just in case, to avoid abandoned temporary files. It's still a good idea for the user to close the PDF document at the earliest possible to conserve resources.- Overrides:
finalize
in classjava.lang.Object
- Throws:
java.io.IOException
- if an error occurs while closing the temporary files
-
setWarnMissingClose
public void setWarnMissingClose(boolean warn)
Controls whether this instance shall issue a warning if the PDF document wasn't closed properly through a call to theclose()
method. If the PDF document is held in a cache governed by soft references it is impossible to reliably close the document before the warning is raised. By default, the warning is enabled.- Parameters:
warn
- true enables the warning, false disables it.
-
dereferenceObjectStreams
public void dereferenceObjectStreams() throws java.io.IOException
This method will search the list of objects for types of ObjStm. If it finds them then it will parse out all of the objects from the stream that is contains.- Throws:
java.io.IOException
- If there is an error parsing the stream.
-
getObjectFromPool
public COSObject getObjectFromPool(COSObjectKey key) throws java.io.IOException
This will get an object from the pool.- Parameters:
key
- The object key.- Returns:
- The object in the pool or a new one if it has not been parsed yet.
- Throws:
java.io.IOException
- If there is an error getting the proxy object.
-
removeObject
public COSObject removeObject(COSObjectKey key)
Removes an object from the object pool.- Parameters:
key
- the object key- Returns:
- the object that was removed or null if the object was not found
-
addXRefTable
public void addXRefTable(java.util.Map<COSObjectKey,java.lang.Long> xrefTableValues)
Populate XRef HashMap with given values. Each entry maps ObjectKeys to byte offsets in the file.- Parameters:
xrefTableValues
- xref table entries to be added
-
getXrefTable
public java.util.Map<COSObjectKey,java.lang.Long> getXrefTable()
Returns the xrefTable which is a mapping of ObjectKeys to byte offsets in the file.- Returns:
- mapping of ObjectsKeys to byte offsets
-
setStartXref
public void setStartXref(long startXrefValue)
This method set the startxref value of the document. This will only be needed for incremental updates.- Parameters:
startXrefValue
- the value for startXref
-
getStartXref
public long getStartXref()
Return the startXref Position of the parsed document. This will only be needed for incremental updates.- Returns:
- a long with the old position of the startxref
-
isXRefStream
public boolean isXRefStream()
Determines if the trailer is a XRef stream or not.- Returns:
- true if the trailer is a XRef stream
-
setIsXRefStream
public void setIsXRefStream(boolean isXRefStreamValue)
Sets isXRefStream to the given value. You need to take care that the version of your PDF is 1.5 or higher.- Parameters:
isXRefStreamValue
- the new value for isXRefStream
-
-