Author Topic: Extracting Text & Images from PDF Files  (Read 4001 times)

0 Members and 1 Guest are viewing this topic.

August 06, 2010, 06:33:06 pm
Read 4001 times


  • Administrator
  • Hero Member

  • Offline
  • *****

  • 3335

PDFMiner is a pdf parsing library written in Python by Yusuke Shinyama.

In addition to the and command line tools, there is a way of analyzing the content tree of each page.

Since thatís exactly the kind of programmatic parsing I wanted to use PDFMiner for, this is a more complete example, which continues where the default documentation stops.
Ruining the bad guy's day