Well, I am not commercial despite the
http://www.SecureMecca.com domain name. My stuff all is GNU licensed. I also own SecureMecca.net, SecureMecca.biz, SecureMecca.org (what I probably should have used), SecureMecca.us, and SecureMecca.info. As soon as some more translation is done (very difficult as opposed to just thinking in French) into French I hope to have either SecureMecca.fr (most likely) or SecureMecca.com.fr (least likely). I am paying for all of them out of my own pocket. I don't intend to ever host on most of those names - I just don't want somebody else pretending to be me.
Instead of money (I am really poor) I can give you the following pieces of code that may help. But before I do that, would you prefer it if I just stripped out all of the dead / parked hosts and put it up in the folder (assuming you had a file today (18 Nov) and I did it to it, the file name would be:
http://www.securemecca.com/MalwareDomainList/hosts_2008_11_18.txtIf you don't want that, the code I am giving you is all designed to work on Unix / Linux since that is what I work on. Here is where the files are at (you will have to do a make):
http://www.securemecca.com/MalwareDomainList/QuickRemove.7zhttp://www.securemecca.com/MalwareDomainList/QuickRemove.zipckaaa.c:
======
Checks that ALL of my merged database is in strict ascending order. Used with duplin to make sure I have something for a program that follows.
ckdupe.c:
=======
Checks for duplicates in a host file, or optionally spits out the names. I do not store my database in a finished hosts file. Instead they look like this (and I merge and ssort things into the order I want when I build the hosts files with automated programs).
10000hits.net
10000hits.net (WWW)
10006.hittail.com
100webads.com
100webads.com (WWW)
100webads.com (WWW2)
etcetera. I wished I hadn't done the www2.* that way but I am stuck with it now. I have several files going into making my hosts files up: add.Casino, add.Dead, add.Header, add.PacProxy, add.Porn, add.Proxy, add.WinRisk, and main. add.Dead hosts are old dead proxies. I have to block proxies because they effectively turn all filtration off. The Unix file consists of add.Header (always at the start) main, add.Casino, add.Porn, and add.Proxy. The Windows file only adds add.WinRisk. I hesitate giving you the rest of the programs (disorder.c, 2lnx.c, 2win.c, ctrlm, addm and two scripts named newadds.sh and pushhosts) that massage them since you can probably come up with something in PERL that will do those things nicely. It is just that sometimes I work with some pretty big files (Airelle's files). I can't use an interpreted language. It is just too slow. My scripts that do DNS checks just bite off 100 hosts at a time and rest between queries (using head and tail and sleep). I should probably write a program for OpenDNS dead hosts, but for now I just use MicroEMACS macros to move hosts that are mapped to those addresses out of the Alive file into the Dead file. Here are the files:
ferret.c:
======
This is used in conjunction with some of the other files. It basically splits one set of hosts you have with another set you are looking at to see which of the others you already have or not. There will be an example of using this at the end.
hexcmp.c:
========
A bonus. I use it with scripts to make sure what ever I uploaded went up safely. I also used it to construct my own hand-crafted GRUB boot files and patches for the starts of disk (with dd), etcetera.
isparked.c:
========
This contains all of the known parked addresses I have. The problem is, I keep adding several every month. Be careful with GoDaddy and some others - they mix both live and parked hosts on the same web server (same IP address). It drives you nuts, but what can you do? You just have to check them manually (I use wget for speed - no time for browsers most of the time).
mytmp.c/mytmp.h:
==============
Just comes up with a temporary file name in the current directory. If you want to be a peach, modify it so the temporary file is put in the folder where the modified file is (you cannot link across file systems). It is used by serveral of these programs and some more. I haven't had time to modify it ever since I started working at a good-will store 40 hours per week and work on my web sites / filter 50+ hours per week.
ssort.c:
======
This is a strict ascending sort using the heapsort algorithm. The reason it exists is because I use its output for a known list as input to ferret which can then use a binary look-up for speed.
HOW TO USE THEM:
================
Okay, I gave you the list of dead hosts in the file say, 2008_10_23_rmlist.txt. Now that file is sorted so that they are in that order above for my files. I just used MicroEMACS (thanks Dan Lawrence) to massage them to a format you may prefer. So here is what I would do, using your hosts file (I am assuming you compiled and put the files in your own home bin directory or some place else in your path):
$ ssort < 2008_10_23_rmlist.txt > AAA
$ duplin AAA
# no output means you are okay. If you aren't then
$ duplin -s AAA
# you can use uniq but at the time I wrote this program I was using Windows and programming for the Hobbit chip ON DOS
$ ckaaa
$ ckdupe -p hosts.txt > tmp
$ ferret
Okay, now you have two new files, "found" and "okay". The ones in the "found" file were dead (or maybe Parked - I can give you those in a list as well since I retain a Host <---> IP database (with IPS in the format ###.###.###.### - left pad filled with zeros if necessary) and the ones in the "okay" file are ready to go back into a new hosts file as long as you prepend a "127.0.0.1" in front of them. That was what my 2lnx and 2win files do. Despite how simple it looks, ferret is a
very powerful program!. You also need to be careful for the IP addresses 0.0.0.0 (000.000.000.000), 127.0.0.1 (127.000.000.001), and 255.255.255.255. Beats me what they mean for the last one coming back from a DNS server. It is certainly unroutable by every protocol I know (RIP, RIPv2, EIGRP, OSPF)! Maybe that is the point. If some idiot is setting up some routers, the 127.0.0.1 IP address is NOT turned off by default as are the three NRIP address spaces. You have to turn all of them off in the configuration on some routers (actually most for the NRIPs).
But you can also turn around and use all you have in the AAA file, and the new ones you are considering adding in the tmp file and ferret will spit out the host names you don't have in the "okay" file. Like I said, ferret is a very powerful program. It spit the stuff out so darn quick when I first wrote it I began to wonder if it was functioning okay.
It is functioning okay! It is just blisteringly fast!I can't give you any money since I only make a minimal amount of money at the charity store and after seeing an idiot (a customer) yesterday pulling apart a VCR / DVD combo with the power plugged in I figure I better get out of there. In fact, I left for the day when the managers said I was wrong to yell at him for destroying merchandise. I also have a school loan to pay off so money for me is scarce. I will help how ever I can though. The problem is I am swamped with what I have to do.