Convert PDF file to JSON format

Drop Files Here (max 20 MB)


Browse
Converted Files
 
> Download Cancel

Alert!

Output Data1
Formatting
Page Origin2

Notes

  1. "Runs of Text" are strings that are on a single line that share common display properties. They share font, font-size, bold, italic, rotation and color. They are useful for composing formatted output (e.g. rendering using html/css). They may split words if one of the letters in the word has different properties. A typical run of text will look as follows: { "leftX" : 323.999, "bottomY" : 292.9298, "rightX" : 557.9752, "topY" : 283.9631, "baseLineY" : 290.7679, "fontId" : 2, "text" : "scanned document pages is a process of partitioning a docu-" } where leftX, bottomY, rightX, topY define a box that encapsulates the text. The "baseLineY" field is the font baseline. Rotated runs of text will have a rotation field that defines the angle specified anti-clockwise in radians and the point of rotatation. "rotation" : { "pivotX" : 119.4425, "pivotY" : 24.96, "angle" : -0.9147 }
  2. Most Applications use top-left of the page as the (0,0) origin with increasing Y going down the page, however the PDF spec uses bottom-left with increasing Y going up the page.