Frequently Asked Question:

Text extraction functions - why so many?

Question

I notice there are several functions for extracting text and I'm wondering what the difference is.

ExtractFilePageText GetPageText and the DAExtractPageText

Is there any difference in these functions? The prototypes and descriptions seem to be the same...

Answer

There are differences.

So you'll find a lot of functions to be doubled: the same name with and without "DA". But the difference is in the parameters. DA-functions go over handles, the partners of them go over filenames or use the before loaded file. For very big files it may be usefull, if the function reads only the needed part of the PDF into memory.

For this three function it's so:

GetPageText or DAExtractPageText needs that the page of interest is selected before, but this both differ in the parameters.

ExtractFilePageText uses the pagenumber to identify the page of interest.

So it is a good idea to have a look to all functions with the same kernel of work and decide depending on your needs.