HTMLfixIT article archives.

Comment spam again!

Friday, July 6th, 2007 by Franki

I’ve been busy writing the new version of the Statistical hit counter, and in doing so, I write helper scripts to show me what new browser strings, screen resolutions etc are out there now that perhaps should be counted. In doing so, I’ve noticed an odd new trend and I believe I have an explanation for it. Of the past 6781 unique hits we’ve had, 2470 have apparently had JavaScript turned off, the beta counter that has been runing behind htmlfixit has been tracking JS and non JS (including bots) for about 2 years now.

Think about that, of 6781 hits, 2470 of them didn’t have JavaScript (not counting known search engine spiders either). I didn’t believe the number could be that high, but I’ve manually reviewed the stats data myself and it’s legit. If my maths is correct, that’s over 35% of all traffic having no JavaScript. Everyone else I’ve read about has claimed figures of more than 80% having JS turned on, so why are my figures so off?

After some thought, I realized or at least theorized as to what the cause is and the answer is something I imagine other CMS or forum systems (systems in which strangers can post or comment) could testify to. Automated Comment spamming programs, I wrote a quick program to print out the details of those browsers that did not support JavaScript, (or had it turned off) and group them by popularity. The program is here, and the list is many and varied, which is to be expected because spammer submit bots can pretend to be anything at all, so copying popular browser strings would be common sense to them. I don’t believe all of these are faked, this is a techie oriented site and many of use are security conscious enough to turn off JavaScript, I just don’t believe anywhere near 35% of us are.

We have some systems in place here in our CMS, a blacklist based on IP or domain and a system that works in much the same way that SpamAssassin does (except HTTP not email) by grading comments on known criteria and blocking comments below a certain threshold. These systems stop allot of bad hits (in fact we’ve had not at all get though in ages, wish they’d stop trying), but most of the site is not in the CMS and therefore gets bad hits. (like the forum which had 80 odd spiders in it at once last week.) I’m now looking at the possibility of using mod_security (which we already have) in conjunction with a blacklist of known spammer IP’s to block them before they ever get a full connection to apache. My concern with this is the risk of blocking normal nice non-scummy people as well, so it has to be done carefully.

Interestingly enough, a bit of Googling shows me that I am not alone in this thinking, and even better I’ve found a mod_security rules filter that looks like it will really make a difference just in the pattern checks alone. I am going to try this now and I’ll let you know how it goes. (Assuming we haven’t accidentally blocked you from accessing our server.) If all goes well, and I’ll know how successful the measure is by how many comment blocks are listed in our spam log. I’ll let you know how it goes and what exactly I did.

Update: We are now running the latest mod_security, with the latest gotroot rules and comment spam blacklists. In the past few days we’ve seen a 50% drop in comment spamming attempts. I’d imagine that blocking these hordes of comment spamming bots will show a significant drop in bandwidth usage also. I was required to make a couple of exclusion rules so that all our various programs work as planned without being blocked by mod_security, but other than that it has been a straight forward and easy (and worthwhile) experience. The more people that use this fantastic tool, the more the blacklists will be effective and up to date. If you have control over your own server, you could do allot worse than to install mod security.

1 Comment »

Search engine conclusions.

Thursday, July 5th, 2007 by Franki

Analyzing the traffic details reported by my Advanced Stats counter, I’ve noticed some interesting trends regarding the various search engines. HTMLfixIT is now getting roughly 98.5% of our search engine traffic from Google. Now we have worked hard to create a site that would do well with search engines, and we have loads of links from other peoples sites to our own, so it isn’t surprising that we do well with Google. What is surprising, is that we do far better from Google than Yahoo, Microsoft and the others despite the fact that their search engine algorithms are growing closer and that now days we usually rank the same or similar for many of our search terms in all engines. Lets take this brief example search string:
‘column count doesn’
Which is the start of a MySQL error that we explain in one of our tutes. The full string is:
Invalid Query: Column count doesn’t match value count at row 1 MySQL
but we don’t need the full string as the partial works rather well to demonstrate my point. and the addition of the ‘ in doesn’t seems to confuse things with some of them. (though it doesn’t change the conclusions I’m drawing here as I’ve tried it with many tech search terms that bring us traffic from Google.)

That string ranks the following in the top 5 search engines.
Google Number 3
Yahoo Number 1
Microsoft Number 1
Altavista Number 1
AOL Number 10

Now for the above term, Google has sent us 213 unique visitors in the past 5 days, Yahoo has sent us 2 visitors for that term and the others have sent us none at all. It sounds like an odd string to use for this example, but I’ve tried it in what is possibly the most searched error message on the Internet as well. “premature end of script headers” and it the same trend is found.

Now I have a friend who has a great many domain names and the vast majority of them are not tech based. They are mostly spiritual in nature, (meditation, psychic readings etc). This friends sites get far far more traffic from the other engines then does htmlfixit, and in fact their ratio of traffic from Google to the other engines is much better also.

So what can we conclude from this?
Well if htmlfixit can rank higher on MSN and Yahoo (for example) than on Google for a popular tech search term, and still get a hundred times more traffic for that search term from Google. Then the logical conclusion is that Google is the search engine preferred by tech oriented people.
That makes sense really, for example; Firefox and Safari both use Google as their homepage, and it is the tech aware users who are installing these browsers (on Windows anyway) on their machines. The people using Microsoft search are likely doing so because it’s the default search engine setup on Windows and they don’t know any better. Yahoo seems to be more about entertainment and eye candy, so they would likely draw the entertainment seeking, less serious searching, younger market.

More proof of the non techie users of Microsoft search is that the most technical search term anyone had used to find us was: “anti virus for windows 98se” which is somewhat lower on the “tech” ranking than CGI or MySQL error rankings. Yahoo is definitely heaps better than Microsoft at this stage, but they are still light years behind Google for tech terms.

As a test, I’m going to see if I can prove this conclusion by using a couple of pre-existing sub-domains and create non tech hotspots for certain keywords. Then after a year or so has gone by, I will analyze the traffic and see how much better the non Google engines do for non techie search terms.

I don’t think I’ve said anything revolutionary in the above, if I’d been asked a week ago what search engine techies preferred, I’d have said Google anyway, but I wasn’t expecting the ratio to be that high and growing. A year ago, about 94% of our search engine traffic was Google, now it’s 98.5%. Since we are talking about many thousands of referrals it’s quite a significant increase.

1 Comment »

Super free video processing tool.

Tuesday, July 3rd, 2007 by Franki

Recently I had to create a website for a customer that contained quite a long video clip. After much searching around and some experimentation with wmv and divx I decided to use Flash. After all, if it’s good enough for YouTube…… Flash can achieve very good compression rates while maintaining “watchability” and was supported by a larger percentage of the population than any other format, which was exactly what I needed. The video I had to work with was 1.8 gigabytes, so the compression I needed was going to have to be pretty extreme.

Next step was to find a free (preferably) tool to do the encoding, which I figured would mean I’d lose the very latest codecs, (and right I was) but I was hopeful I’d still achieve the results I needed. (and I did.)

After much searching I came across Super which is a video processing/encoding tool that offers flash encoding as one of its benefits. This is a well designed, functional piece of software and I got what I needed from it without once needing to read any documentation. However the design of the Super pages at the eRightsoft site is deplorable beyond measure and I can’t believe it was designed by the same people that created the tool. I could not download the tool from their website no matter how hard I tried, the page just kept looping. I had no proxy, no software firewall , nothing was blocking my referrer and I tried 3 browsers (IE, Firefox and Safari), then I gave up and downloaded it from Afterdawn.

I was thinking about writing a tutorial detailing the steps I took making my videos, but it really was too easy and not really worthy of one. The only tip I could really give is to right click on a blank part of the main window to see all the config settings.
The site I worked on was finished on time (2 days) and I had a perfectly watchable 10MB flash video embedded in the page as well as offering Xvid, DivX and WMV higher quality versions available for download. Perfect!

3 Comments »

  Time in Don's part of the world is:   July 12, 2025, 1:50 am
  Time in Franki's part of the world is: July 12, 2025, 2:50 pm
  Don't worry neither one sleeps very long!

Browser Statistics
Internet Explorer 8	5.88%
IE 7	17.63%
IE 6	2.3%
IE 5	0.00%
IE other	8.6%
Moz Firefox 3.x	3.03%
Moz Firefox 2.x	0.18%
Moz Firefox 0.x/1.x	26.65%
Netscape 8.x	0.00%
NS 6+/Mozilla	2.73%
Moz Seamonkey	0.00%
K-meleon	0.00%
Epiphany	0.00%
Netscape 4.x	0.00%
Opera 9.x	0.00%
Opera 8.x	0.00%
Opera 7.x	0.42%
Opera 6.x	0.00%
Opera other	0.42%
Safari Mac/Intel	5.21%
Safari Mac/PPC	0.06%
Safari Windows	25.2%
Google Chrome	1.51%
Konqueror	0.18%
Galeon	0.00%
WebTV	0.00%

Resolution Statistics
640 x 480	0.25%
800 x 600	26.14%
1024 x 768	36.55%
1152 x 864	0.25%
1280 x 800	11.68%
1280 x 854	0.00%
1280 x 1024	17.01%
1400 x 1050	0.00%
1600 x 1200	1.02%
1920 x 1200	7.11%
2560 x 1024	0.00%

OS Statistics
Windows 7	41.55%
Windows Vista	2.4%
Windows 2003	3.91%
Windows XP	20.86%
Windows 2000	0.36%
Windows NT4	0.05%
Windows 98/ME	0.05%
Windows 95	0.00%
Linux/UNIX/BSD	8.76%
Mac OSX	8.03%
Mac Classic	0.00%
Misc	14.03%

Archive

HTMLfixIT Archive for July, 2007

Comment spam again!

Search engine conclusions.

Super free video processing tool.

Archives

Categories