Do you own a Debenu Quick PDF Library 12/11/10/9/8/7? Upgrade to Debenu Quick PDF Library 13!

Foxit Quick PDF Library

Frequently Asked Question:

Return to FAQ Index

Extract text from PDF without using GetPageText or other extraction functions

Question

How can I extract text from a PDF without using the GetPageText and other extraction functions? I basically want to create my own extraction function.

Answer

Yes, using Quick PDF Library's GetPageContent function you can retrieve text from a PDF file yourself. Here is a Delphi code sample that demonstrates how to do this.

procedure GetTextFromPageContent(PageContent: AnsiString; TextList: TStrings);

  function P(Value: AnsiString): Integer;
  begin
    Result := Pos(Value, pagecontent);
  end;

begin
  TextList.Clear;
  while P('TL') <> 0 do
  begin
    pagecontent := Copy(pagecontent, P('TL'), Length(pagecontent));
    //Searching for 'TL'
    if P('(') - P('TL') < 10  then
    begin
      //Extracting 'text' from '(text)'
      textlist.Add(Copy(pagecontent, P('(') + 1, P(')') - P('(') - 1));
      Delete(pagecontent, 1, P(')'));
      //Checking for next '(text)' in '(text)''(text)'...
      while pagecontent[1] = '''' do
      begin
        //Extracting 'text' from '(text)'
        textlist.Add(Copy(pagecontent, P('(') + 1, P(')') - P('(') - 1));
        Delete(pagecontent, 1, P(')'));
      end;
    end;
  end;
end;

// Sample of how to use
// PDFLibrary: TQuickPDF;
// PageTextMemo: TMemo;
begin
  PDFLibrary.CombineLayers;
  GetTextFromPageContent(PDFLibrary.GetPageContent, PageTextMemo.Lines);
end;

© 2015 Debenu & Foxit. All rights reserved. AboutBuyContactBlogNewsletterSupportFAQProduct UpdatesForum