Malware Domain List

Malware Related => Malware Analysis => Topic started by: SysAdMini on August 06, 2010, 06:33:06 pm

Title: Extracting Text & Images from PDF Files
Post by: SysAdMini on August 06, 2010, 06:33:06 pm
http://denis.papathanasiou.org/?p=343

Quote
PDFMiner is a pdf parsing library written in Python by Yusuke Shinyama.

In addition to the pdf2txt.py and dumppdf.py command line tools, there is a way of analyzing the content tree of each page.

Since thatís exactly the kind of programmatic parsing I wanted to use PDFMiner for, this is a more complete example, which continues where the default documentation stops.