We got a new example for Xojo on how to extract text from a page and know each position of each text chunk. We can even keep track of current state, so we have the fonts ready and can even draw the same text on top of the existing PDF with Xojo. So DynaPDF renders page with here with text in black and we draw in paint event the rectangles for text portions and than draw the same text with Xojo. New is the latest version of the example, where we can now split text per character to get the position of each character. As you can see we now draw each box in green and each character in blue on top of the PDF drawing:
This example uses the DynaPDFParseInterfaceMBS class from DynaPDF Pro to walk over all PDF commands drawing a page, so we get all the text output as well as matrix changes, font setting and state saving & restoring. We keep track of the current matrix with DynaPDFMatrixMBS class and collect the text records with DynapdfTextRecordWMBS class for Unicode or DynapdfTextRecordAMBS for font specific encodings. In latter case we convert the characters using current font to unicode text.
The example is included in 18.1pr4 pre-release.
To use this code in a shipping application, you need to order the DynaPDF Pro license from us for use with the MBS Xojo DynaPDF Plugin.
Older blog post: DynaPDF Text Position Examples