class PDF::Reader::Parser
An internal PDF::Reader
class that reads objects from the PDF
file and converts them into useable ruby objects (hash’s, arrays, true, false, etc)
Constants
- MAPPING
- STRATEGIES
- TOKEN_STRATEGY
Public Class Methods
Source
# File lib/pdf/reader/parser.rb, line 65 def initialize(buffer, objects=nil) @buffer = buffer @objects = objects end
Create a new parser around a PDF::Reader::Buffer
object
buffer - a PDF::Reader::Buffer
object that contains PDF
data objects - a PDF::Reader::ObjectHash
object that can return objects from the PDF
file
Public Instance Methods
Source
# File lib/pdf/reader/parser.rb, line 98 def object(id, gen) idCheck = parse_token # Sometimes the xref table is corrupt and points to an offset slightly too early in the file. # check the next token, maybe we can find the start of the object we're looking for if idCheck != id Error.assert_equal(parse_token, id) end Error.assert_equal(parse_token, gen) Error.str_assert(parse_token, "obj") obj = parse_token post_obj = parse_token if obj.is_a?(Hash) && post_obj == "stream" stream(obj) else obj end end
Reads an entire PDF
object from the buffer and returns it as a Ruby String. If the object is a content stream, returns both the stream and the dictionary that describes it
id - the object ID to return gen - the object revision number to return
Source
# File lib/pdf/reader/parser.rb, line 74 def parse_token(operators={}) token = @buffer.token if STRATEGIES.has_key? token STRATEGIES[token].call(self, token) elsif token.is_a? PDF::Reader::Reference token elsif operators.has_key? token Token.new(token) elsif token.frozen? token elsif token =~ /\d*\.\d/ token.to_f else token.to_i end end
Reads the next token from the underlying buffer and convets it to an appropriate object
operators - a hash of supported operators to read from the underlying buffer.
Private Instance Methods
Source
# File lib/pdf/reader/parser.rb, line 150 def array a = [] loop do item = parse_token break if item.kind_of?(Token) and item == "]" raise MalformedPDFError, "unterminated array" if @buffer.empty? a << item end a end
reads a PDF
array from the buffer and converts it to a Ruby Array.
Source
# File lib/pdf/reader/parser.rb, line 123 def dictionary dict = {} loop do key = parse_token break if key.kind_of?(Token) and key == ">>" raise MalformedPDFError, "unterminated dict" if @buffer.empty? PDF::Reader::Error.validate_type_as_malformed(key, "Dictionary key", Symbol) value = parse_token value.kind_of?(Token) and Error.str_assert_not(value, ">>") dict[key] = value end dict end
reads a PDF
dict from the buffer and converts it to a Ruby Hash.
Source
# File lib/pdf/reader/parser.rb, line 164 def hex_string str = "".dup loop do token = @buffer.token break if token == ">" raise MalformedPDFError, "unterminated hex string" if @buffer.empty? str << token end # add a missing digit if required, as required by the spec str << "0" unless str.size % 2 == 0 [str].pack('H*') end
Reads a PDF
hex string from the buffer and converts it to a Ruby String
Source
# File lib/pdf/reader/parser.rb, line 141 def pdf_name tok = @buffer.token tok = tok.dup.gsub(/#([A-Fa-f0-9]{2})/) do |match| match[1, 2].hex.chr end tok.to_sym end
reads a PDF
name from the buffer and converts it to a Ruby Symbol
Source
# File lib/pdf/reader/parser.rb, line 213 def stream(dict) raise MalformedPDFError, "PDF malformed, missing stream length" unless dict.has_key?(:Length) if @objects length = @objects.deref_integer(dict[:Length]) if dict[:Filter] dict[:Filter] = @objects.deref_name_or_array(dict[:Filter]) end else length = dict[:Length] || 0 end PDF::Reader::Error.validate_type_as_malformed(length, "length", Numeric) data = @buffer.read(length, :skip_eol => true) Error.str_assert(parse_token, "endstream") # We used to assert that the stream had the correct closing token, but it doesn't *really* # matter if it's missing, and other readers seems to handle its absence just fine # Error.str_assert(parse_token, "endobj") PDF::Reader::Stream.new(dict, data) end
Decodes the contents of a PDF
Stream
and returns it as a Ruby String.
Source
# File lib/pdf/reader/parser.rb, line 180 def string str = @buffer.token return "".dup.force_encoding("binary") if str == ")" Error.assert_equal(parse_token, ")") str.gsub!(/\\(\r\n|[nrtbf()\\\n\r]|([0-7]{1,3}))?|\r\n?/m) do |match| if $2.nil? # not octal digits MAPPING[match] || "".dup else # must be octal digits ($2.oct & 0xff).chr # ignore high level overflow end end str.force_encoding("binary") end
Reads a PDF
String from the buffer and converts it to a Ruby String