I’ve been busy writing the new version of the Statistical hit counter, and in doing so, I write helper scripts to show me what new browser strings, screen resolutions etc are out there now that perhaps should be counted. In doing so, I’ve noticed an odd new trend and I believe I have an explanation for it. Of the past 6781 unique hits we’ve had, 2470 have apparently had JavaScript turned off, the beta counter that has been runing behind htmlfixit has been tracking JS and non JS (including bots) for about 2 years now.
Think about that, of 6781 hits, 2470 of them didn’t have JavaScript (not counting known search engine spiders either). I didn’t believe the number could be that high, but I’ve manually reviewed the stats data myself and it’s legit. If my maths is correct, that’s over 35% of all traffic having no JavaScript. Everyone else I’ve read about has claimed figures of more than 80% having JS turned on, so why are my figures so off?
After some thought, I realized or at least theorized as to what the cause is and the answer is something I imagine other CMS or forum systems (systems in which strangers can post or comment) could testify to. Automated Comment spamming programs, I wrote a quick program to print out the details of those browsers that did not support JavaScript, (or had it turned off) and group them by popularity. The program is here, and the list is many and varied, which is to be expected because spammer submit bots can pretend to be anything at all, so copying popular browser strings would be common sense to them. I don’t believe all of these are faked, this is a techie oriented site and many of use are security conscious enough to turn off JavaScript, I just don’t believe anywhere near 35% of us are.
We have some systems in place here in our CMS, a blacklist based on IP or domain and a system that works in much the same way that SpamAssassin does (except HTTP not email) by grading comments on known criteria and blocking comments below a certain threshold. These systems stop allot of bad hits (in fact we’ve had not at all get though in ages, wish they’d stop trying), but most of the site is not in the CMS and therefore gets bad hits. (like the forum which had 80 odd spiders in it at once last week.) I’m now looking at the possibility of using mod_security (which we already have) in conjunction with a blacklist of known spammer IP’s to block them before they ever get a full connection to apache. My concern with this is the risk of blocking normal nice non-scummy people as well, so it has to be done carefully.
Interestingly enough, a bit of Googling shows me that I am not alone in this thinking, and even better I’ve found a mod_security rules filter that looks like it will really make a difference just in the pattern checks alone. I am going to try this now and I’ll let you know how it goes. (Assuming we haven’t accidentally blocked you from accessing our server.) If all goes well, and I’ll know how successful the measure is by how many comment blocks are listed in our spam log. I’ll let you know how it goes and what exactly I did.
Update: We are now running the latest mod_security, with the latest gotroot rules and comment spam blacklists. In the past few days we’ve seen a 50% drop in comment spamming attempts. I’d imagine that blocking these hordes of comment spamming bots will show a significant drop in bandwidth usage also. I was required to make a couple of exclusion rules so that all our various programs work as planned without being blocked by mod_security, but other than that it has been a straight forward and easy (and worthwhile) experience. The more people that use this fantastic tool, the more the blacklists will be effective and up to date. If you have control over your own server, you could do allot worse than to install mod security.