Author Topic: Extracting Text & Images from PDF Files  (Read 2732 times)

0 Members and 1 Guest are viewing this topic.

August 06, 2010, 06:33:06 pm
Read 2732 times

SysAdMini

  • Administrator
  • Hero Member

  • Offline
  • *****

  • 3335
http://denis.papathanasiou.org/?p=343

Quote
PDFMiner is a pdf parsing library written in Python by Yusuke Shinyama.

In addition to the pdf2txt.py and dumppdf.py command line tools, there is a way of analyzing the content tree of each page.

Since thatís exactly the kind of programmatic parsing I wanted to use PDFMiner for, this is a more complete example, which continues where the default documentation stops.
Ruining the bad guy's day