Frequently Asked Question:
Extract text from PDF without using GetPageText or other extraction functions
Question
How can I extract text from a PDF without using the GetPageText and other extraction functions? I basically want to create my own extraction function.
Answer
Yes, using Quick PDF Library's GetPageContent function you can retrieve text from a PDF file yourself. Here is a Delphi code sample that demonstrates how to do this.
procedure GetTextFromPageContent(PageContent: AnsiString; TextList: TStrings);
function P(Value: AnsiString): Integer;
begin
Result := Pos(Value, pagecontent);
end;
begin
TextList.Clear;
while P('TL') <> 0 do
begin
pagecontent := Copy(pagecontent, P('TL'), Length(pagecontent));
//Searching for 'TL'
if P('(') - P('TL') < 10 then
begin
//Extracting 'text' from '(text)'
textlist.Add(Copy(pagecontent, P('(') + 1, P(')') - P('(') - 1));
Delete(pagecontent, 1, P(')'));
//Checking for next '(text)' in '(text)''(text)'...
while pagecontent[1] = '''' do
begin
//Extracting 'text' from '(text)'
textlist.Add(Copy(pagecontent, P('(') + 1, P(')') - P('(') - 1));
Delete(pagecontent, 1, P(')'));
end;
end;
end;
end;
// Sample of how to use
// PDFLibrary: TQuickPDF;
// PageTextMemo: TMemo;
begin
PDFLibrary.CombineLayers;
GetTextFromPageContent(PDFLibrary.GetPageContent, PageTextMemo.Lines);
end;