Clip to Strip Search Terms From Server Logs

A list user had referrer logs on his server and wanted to strip the search terms from them. I wrote this:

;effort by don at htmlfixit.com
;02/04/05
; to take query terms from lines of stats
;one long example line
;http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q="We are what we know"
;becomes
;"We are what we know"
;turn off wordwrap
^!SetWordWrap Off
;^!SetDebug On
;go to start of document
^!Jump Doc_Start
;loop for cleaning lines
:Loop
;highlight just this line
^!Select Eol
^!Find "?q=" TIHS
^!IfError TryAgain ELSE KillStart

:TryAgain
;sometimes it is &q=
^!Find "&q=" TIHS
^!IfError KillLine ELSE KillStart

:KillLine
;KillLine (not a search with a ?q=)
;delete highlighted line
^!DeleteLine
;repeat til done - if done go to done subroutine
^!If ^$GetRow$ = ^$GetLinecount$ DONE ELSE Next
; next line
^!Goto Loop

:KillStart
;KillStart (get rid of ?q= and everything before it)
;jump to select end
^!Jump Select_End
;select to line beginning
^!Select Bol
;delete highlighted piece
^!Keyboard DELETE

;now get rid of post search terms by finding &
;highlight just this line
^!Select Eol
^!Find "&" TIHS
^!IfError SKIP_3
;jump to select end
^!Jump Select_Start
^!Select Eol
;delete highlighted piece
^!Keyboard DELETE

;repeat til done - if done go to done subroutine
^!If ^$GetRow$ = ^$GetLinecount$ DONE ELSE Next

;advance to next line
^!Jump +1
;repeat
^!Goto Loop

:DONE
^!Info Done

On this test list:

http://cgi.resourceindex.com/detail/09039.html
http://www.google.com/search?q=”Invalid query: Column count doesn’t match value count at row”&sourceid=opera&num=0&ie=utf-8&oe=utf-8
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=rgb color chooser
http://www.google.com.au/search?hl=en&ie=UTF-8&oe=UTF-8&q=regular expression remove path from file&btnG=Google Search&meta=cr=countryAU
http://www.cgi-resources.com/Programs_and_Scripts/Perl/Access_Counters/Combination/
http://www.google.ch/search?q=script download manager php&hl=de&lr=&ie=UTF-8&start=10&sa=N
http://www.google.se/search?hl=sv&ie=UTF-8&oe=UTF-8&q=html scripts +stats&meta=
http://www.google.ca/search?q=PHP string manipulation&ie=UTF-8&oe=UTF-8&hl=en&meta=
http://www.google.com/search?q=perl loops continue&ie=UTF-8&oe=UTF-8
http://www.google.com.hk/search?hl=zh-TW&ie=UTF-8&oe=UTF-8&q=HTML Fix the background&spell=1
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=php splitting value by space
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=”top == self” document.write javascript
http://www.google.com/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=”file names” complete list fix
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=”We are what we know”
http://www.dogpile.com/info.dogpl/search/web/can+a+virus+overtake+your+homepage%3F
http://www.google.com.au/search?hl=en&ie=UTF-8&oe=UTF-8&q=javascript validator &btnG=Google Search&meta=cr=countryAU
http://cgi.resourceindex.com/Programs_and_Scripts/Perl/Access_Counters/Combination/
http://www.google.com/search?q=PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” allowed tags&hl=en&lr=&ie=UTF-8&oe=UTF-8&start=10&sa=N
http://www.google.ca/search?q=Regex links jpg&hl=en&lr=&ie=UTF-8&oe=UTF-8&start=10&sa=N
http://cgi.resourceindex.com/Programs_and_Scripts/Perl/File_Management/File_Downloading/
http://www.google.ca/search?q=Column count doesn’t match value count at row 1&ie=UTF-8&oe=UTF-8&hl=en&btnG=Google Search&meta=
http://cgi.resourceindex.com/Programs_and_Scripts/Perl/Access_Counters/Combination/
http://cgi.resourceindex.com/Programs_and_Scripts/Perl/Access_Counters/Combination/
http://www.google.com/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=hello world perl
http://www.google.co.uk/search?q=HTML FIX IT.COM&ie=UTF-8&oe=UTF-8&hl=en&btnG=Google Search&meta=
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=hello world perl script
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=perl find number of items in hash&btnG=Google Search
http://cgidir.com/Scripts/Counters/
http://www.google.com/search?q=php string manipulation&sourceid=opera&num=0&ie=utf-8&oe=utf-8
http://htmlfixit.com/tutes/tute_menu.shtml
http://www.google.com/search?sourceid=mozclient&ie=utf-8&oe=utf-8&q=php string manipulation
http://www.google.com.au/search?hl=en&ie=UTF-8&oe=UTF-8&q=perl if elsif&meta=
http://www.gigablast.com/search?k3j=754263&s=70&q=cgi irc login
http://www.google.com.au/search?q=how to stop people from looking at your HTML source&hl=en&ie=UTF-8&oe=UTF-8
http://www.cgiresourceindex.com/Programs_and_Scripts/Perl/Access_Counters/Combination/
http://www.google.co.hu/search?q=list perl hash array example&btnG=Google keresés&hl=hu&ie=UTF-8&oe=UTF-8
http://www.cgi-resources.com/Programs_and_Scripts/Perl/Access_Counters/Combination/
http://www.google.com/search?q=fix html code&hl=iw&lr=&ie=UTF-8&inlang=iw&start=120&sa=N
http://www.cgi-resources.com/Programs_and_Scripts/Perl/Access_Counters/Combination/
http://htmlfixit.com/cgi-bin/yabb/YaBB.cgi?board=stats
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=tutorials php string manipulation&btnG=Google Search
http://www.google.com.gr/search?hl=el&ie=UTF-8&oe=UTF-8&q=spliti functions tutorial&btnG=Αναζήτηση στο Google&meta=
http://www.freecgicode.com/cgi-bin/dir/search.cgi?query=counter
http://www.google.com.au/search?hl=en&ie=UTF-8&oe=UTF-8&q=mysql insert into matching columns&meta=
http://www.optima-system.com/pagespinner/spinnertalk/viewtopic.php?t=60
http://search.yahoo.com/search?p=examples of while loops in perl&ei=UTF-8&fr=fp-tab-web-t&n=20&fl=0&x=wrt
http://www.google.co.uk/search?hl=en&ie=UTF-8&oe=UTF-8&q=splitting strings in php&spell=1
http://www.optima-system.com/pagespinner/spinnertalk/viewtopic.php?t=60
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=javascript counter free download&btnG=Google Search
http://www.google.co.uk/search?q=xhtml validation tools&ie=UTF-8&oe=UTF-8&hl=en&meta=
http://www.google.com/search?q=PageSpinner&hl=ja&lr=&ie=UTF-8&inlang=ja&start=40&sa=N
http://www.google.ro/search?q=download fix-it utilities 2004 free full&hl=ro&lr=&ie=UTF-8&oe=UTF-8&start=10&sa=N
http://www.google.com.au/search?q=fix mydoom.f&ie=UTF-8&oe=UTF-8&hl=en&meta=cr=countryAU
http://www.cgi-resources.com/Programs_and_Scripts/Perl/Access_Counters/Combination/
http://www.google.com/search?q=shtml vs html&hl=en&lr=&ie=UTF-8&start=10&sa=N
http://search.yahoo.com/search?p=free scripts download&ei=UTF-8&cop=mss&fr=fp-tab-web-t&b=21
http://www.google.co.uk/search?q=php security issues&hl=en&ie=UTF-8&oe=UTF-8
http://www.google.ca/search?q=xhtml validate&hl=en&lr=&ie=UTF-8&oe=UTF-8&start=10&sa=N
http://groups.yahoo.com/group/Anti-spam/message/2151
http://search.yahoo.com/search?p=xhtml vs. html&ei=UTF-8&fr=fp-tab-web-t&n=20&fl=0&x=wrt
http://www.google.co.in/search?q=payment gateway tutorial&hl=en&lr=&ie=UTF-8&oe=UTF-8&start=10&sa=N
http://www.google.com/search?q=Column count doesn’t match value count at row 1&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8
http://www.hotscripts.com/Detailed/24085.html

That rendered the following:

“Invalid query: Column count doesn’t match value count at row”
rgb color chooser
regular expression remove path from file
script download manager php
html scripts +stats
PHP string manipulation
perl loops continue
HTML Fix the background
php splitting value by space
“top == self” document.write javascript
“file names” complete list fix
“We are what we know”
javascript validator
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” allowed tags
Regex links jpg
Column count doesn’t match value count at row 1
hello world perl
HTML FIX IT.COM
hello world perl script
perl find number of items in hash
php string manipulation
php string manipulation
perl if elsif
cgi irc login
how to stop people from looking at your HTML source
list perl hash array example
fix html code
tutorials php string manipulation
spliti functions tutorial
mysql insert into matching columns
splitting strings in php
javascript counter free download
xhtml validation tools
PageSpinner
download fix-it utilities 2004 free full
fix mydoom.f
shtml vs html
php security issues
xhtml validate
payment gateway tutorial
Column count doesn’t match value count at row 1

and I then used tools –> text statistics from the toolbar to get this information:

Word Frequency %

– 1 0.51
” 10 5.08
. 4 2.03
: 1 0.51
0 1 0.51
1 3 1.52
2004 1 0.51
allowed 1 0.51
are 1 0.51
array 1 0.51
at 4 2.03
background 1 0.51
by 1 0.51
cgi 1 0.51
chooser 1 0.51
code 1 0.51
color 1 0.51
Column 3 1.52
columns 1 0.51
COM 1 0.51
complete 1 0.51
continue 1 0.51
count 6 3.05
counter 1 0.51
document 1 0.51
doesn’t 3 1.52
download 3 1.52
DTD 1 0.51
elsif 1 0.51
EN 1 0.51
example 1 0.51
expression 1 0.51
f 1 0.51
file 2 1.02
find 1 0.51
fix 5 2.54
fix-it 1 0.51
free 2 1.02
from 2 1.02
full 1 0.51
functions 1 0.51
gateway 1 0.51
hash 2 1.02
hello 2 1.02
how 1 0.51
html 6 3.05
if 1 0.51
in 2 1.02
insert 1 0.51
into 1 0.51
Invalid 1 0.51
irc 1 0.51
issues 1 0.51
IT 1 0.51
items 1 0.51
javascript 3 1.52
jpg 1 0.51
know 1 0.51
links 1 0.51
list 2 1.02
login 1 0.51
looking 1 0.51
loops 1 0.51
manager 1 0.51
manipulation 4 2.03
match 3 1.52
matching 1 0.51
mydoom 1 0.51
mysql 1 0.51
names 1 0.51
number 1 0.51
of 1 0.51
PageSpinner 1 0.51
path 1 0.51
payment 1 0.51
people 1 0.51
perl 6 3.05
php 8 4.06
PUBLIC 1 0.51
query 1 0.51
Regex 1 0.51
regular 1 0.51
remove 1 0.51
rgb 1 0.51
row 3 1.52
script 2 1.02
scripts 1 0.51
security 1 0.51
self 1 0.51
shtml 1 0.51
source 1 0.51
space 1 0.51
spliti 1 0.51
splitting 2 1.02
stats 1 0.51
stop 1 0.51
Strict 1 0.51
string 4 2.03
strings 1 0.51
tags 1 0.51
the 1 0.51
to 1 0.51
tools 1 0.51
top 1 0.51
tutorial 2 1.02
tutorials 1 0.51
utilities 1 0.51
validate 1 0.51
validation 1 0.51
validator 1 0.51
value 4 2.03
vs 1 0.51
W3C 1 0.51
we 2 1.02
what 1 0.51
world 2 1.02
write 1 0.51
xhtml 3 1.52
your 1 0.51

Different words/items counted: 119
Total Words: 176
Total Punctuation: 5
Total Other Text: 6
Total Characters: 1086
Total Paragraphs: 41

I could do lots more with this, like sort on incidence, etc. This is just a start.

Leave a Reply

Recent Posts

Archives

Topics