We use cookies to improve your user experience, ensure security of your documents and to optimise the platforms resource usage. Personalized advertising contents could also be displayed.
View Privacy Policy
About Cookies on this Site
This site must use Required Cookies to operate. Please choose whether this site may use Functional Cookies as described below:
REQUIRED COOKIES
These cookies are required to enable core site functionality and to allow you to download your converted documents securely
FUNCTIONAL COOKIES
These cookies allow us to gather site usage analytics so that we an determine the resources needed to give good performance. We use StatCounter to gather site usage analytics. StatCounter sets an analytics cookie to determinie first-time and returning visits to this site
No personal information is stored in the cookie. About StatCounter Cookies
Note that you may also manage cookies from within your browser.
Convert PDF file to XML format
Notes
Please keep this window open until your document is ready for download. For your security, your converted document can not be downloaded from our server after this page is closed. Your documents are not retained on our servers. They will be removed within 30 mins.
"Runs of Text" are strings that are on a single line that share common display properties. They share font, font-size,
bold, italic, rotation and color. They are useful for composing formatted output (e.g. rendering using html/css). They may split words if one of the letters in the word has different properties.
A typical run of text will look as follows:
323.999292.9298557.9752283.9631290.76792scanned document pages is a process of partitioning a docu-
where leftX, bottomY, rightX, topY define a box that encapsulates the text. The "baseLineY" field is the font baseline.
Rotated runs of text will have a rotation field that defines the angle specified anti-clockwise in radians and the point of rotatation.
119.442524.96-0.9147
Most Applications use top-left of the page as the (0,0) origin with increasing Y going down the page, however the PDF spec uses bottom-left with increasing Y going up the page.