The last and possibly most useful thing I wish to show you is regular
expressions, (henceforth to be referred to as regex) I'll also call it "Perl string manipulation" for the benefit of the search engines spidering this page. :-).
Regex is basically "pattern matching" the ability to look through vast tomes
of data, to find patterns.. and do stuff with data that matches.
There are two main forms for regex, and they are "match" and "substitute"
and they look like this:
# Just match on a pattern.
m/pattern/;
and
# Match on pattern, and replace any matches found with 'replacement'
s/pattern/replacement/;
(Strangly enough, the "m" stands for "match" and the "s" for "substitute". :-)
A real world example.
Here you are going to see not only the use of file "open", you will also see
an example of a "foreach" loop and an "if" statement, and some "regex" to
boot. all in one little code sample. (told you the boring stuff was useful.)
Say you opened the file as detailed in
the previous primer and you wanted to look through every line of the file's data looking for
some sort of pattern, Here is what you would do:
# Set the file path and name. my $data_file = '/var/www/cgi-bin/mydata/data.txt'; # Open the file for reading. open DATA, "$data_file" or die "can't open $data_file $!"; my @array_of_data = <DATA>; close (DATA);
now @array_of_data contains all the text in that file, and you can
manipulate it as you please.
so $array_of_data{0} would be the first line of text in the file,
$array_of_data{1} would be the second line and so on.
NOTE: It should be noted that if the file in question is really big, its not a good idea to pull it all into an array.
say for example you did this with server logfile of say 50,000 lines long. The chances are good that you would either time out the CGI process,
or chew up the servers available memory,
or both. None of which is a good idea. :-) in cases like this, its better to open the file and loop over the filehandle.
For example:
my $data_file = '/var/www/cgi-bin/mydata/data.txt';
open DATA, "$data_file" or die "can't open $data_file $!";
while (<DATA>)
{
# any action here will be applied to each line of the file.
}
close (DATA);
There is also something called "slurping" which tells Perl to grab the whole contents of a file in one hit, that would allow you to put the entire content of a file into a single scalar variable.
As with the array though, this shouldn't be done on big files as it will chew a whole heap of memory.
I've also not discussed file locking, which is an important consideration for scripts that may be accessed by more then one person simultaneously.
File locking is discussed reasonably well here:
About.com Perl file locking.
Getting back to the initial example, where the content is pulled into an array,
if we wanted to search that data for the word "dangerous" we would use
a foreach loop and some regex to do it.
like so:
# start foreach loop, and assign each line, # one at a time to the variable $line. foreach my $line (@array_of_data) { # Start an if statement, the condition of which is # "If this particular line contains the word dangerous." if ($line =~ m/dangerous/i;) { # If the line contains "dangerous" then print the line out. print "This line contains the word dangerous: $line\n"; } # End the if condition here. } # End the foreach loop here.
It reads just like English.. and it says: open the file, make each line of
the file an element in the array.
Then loop through the array, picking out each item (line from the file), one
at a time and assign it to $line.
Then check if $line contains a match with the word "dangerous" and if it
does, then print out that line.
Then move onto the next line of the file, again assign it to $line, and
check it for "dangerous" as well..
and keep doing that till you reach the last line of the file, then exit the
loop.
Not too hard is it?
Now lets do something even more useful, lets manipulate the string contained in $line
so that all instances of the word "dangerous" are replaced with the word "safe".
By doing that in the foreach loop, we can do that for the contents of the entire file.
(since its stored line for line in the array.)
We do this in much the same way as the previous example.. except we don't need the "if" statement
anymore.
foreach my $line (@array_of_data) { # Use substitute regex to replace "dangerous" # with the word "safe" $line =~ s/dangerous/safe/gi; }
Now you have that entire file's contents in the array, except now the word
"dangerous", has been replaced with the word "safe". and the actual work was
all done with one short line of code. ($line =~ s/dangerous/safe/gi;)
If you noticed the gi at the end of the match pattern, you might be
wondering what they are for..
the "g" tells the pattern to be greedy (it means global).. in other words,
don't stop at the first match, keep looking throughout the rest of the line
in case there are more matches..
and the "i" tells the pattern to not worry about the character case, so it
will match DANGEROUS, dangerous, DANgerous etc..
# The g isn't necessary in the first example, as we only need to know if the
word is in the line once.. and we print it if it is.,
The second example needs the g because "dangerous" might be on a line more
then once,, and we want to change all of them to "safe", so by adding the g,
Perl will keep looking for more matches on that line. (ie Perl will get
greedy and want then all.)
So, we have opened the file, read it with a loop, and changed the word
dangerous for the word safe..
How about we now write the new revised text back to the old file replacing
the old version. we open the file, (for writing this time) and we use
another foreach loop to loop through the array, and write each line back to
the file..
# Open the file for writing. open DATAOUT, ">$data_file" or die "can't open $data_file $!"; # Start a foreach loop assigning # each line to $line, in turn. foreach my $line (@array_of_data) { # Print each line in turn to the new filehandle DATAOUT print DATAOUT "$line"; } # Close the new file. close (DATAOUT)
Now the file $data_file, contains the same text it did before, except now
any instance of "dangerous" is "safe"
Cool huh?
(I could have written that entire script in less then half the size by
combining the various sections together.. but that would have been less
instructive, so I did it the long way.)
You might be wondering what the use of this stuff is.. here are a couple of
examples keeping HTML in mind...
- Opening a html file, and stripping out all the HTML tags, so you end up
with just the text of a web page. (you can use substitute regex that tells
Perl to replace the tags < > and anything in between them with nothing..
(ie remove them.)
- You want to change the background color in 3 dozen webpages, (or 10000),
you would use opendir ((open directory), instead of open) and "open" all
files ending in .htm or .html etc,, and replace bgcolor='white' to
bgcolor='black' or whatever your preferences are,, in all of them.
see? this is the reason that unix administrators love Perl, they can do
stuff in seconds, that would take hours to do by hand. (Perl could do
hundreds of these in just seconds.)
- You had a system crash, and all your directories are full of *.chk files
(created by scandisk when it tries to save any potential data.) you want to
search your entire hard drive for these .chk files looking for a specific
bit of data you lost inside one of them,,,, and delete each file if it
doesn't contain what you wanted.
(here you would use "opendir" to go through each directory, then "open" to
open each file, pattern matching regex to look for the data you are missing,
and "unlink" to delete the file if it doesn't have what you want in it..)
The scripts to do any of those examples, would probably be less then 20
lines long as it is.. but if you were to use modules, (which I will show you
by example in the next primers) most of the hard work is done for you, and
the scripts would all probably be halved in length again.. (using modules,
you could write a script to do any of the above in roughly 10 lines or less.)
Ok, now you know the very basics, regex can get 1000 times more complicated
then what I have shown you thus far, but for simple stuff, its not too bad,
most of the time, Perl string manipulation involves fairly simple usage like
those mentioned above. There are a ton of resources on the internet for those looking for
more information on Perl string manipulation and regex.
useful links for regex info.
gatech.edu regex tute.
troubleshooters.com regex tute.
perlarchive.com regex tute.
We've covered most of the basics now, so all tutes after this will be covering
more advanced stuff, and hopefully some usable examples.
That will be the point where everything I have shown you thus far starts
to make sense. (assuming it doesn't already.)
Again, keep in mind that we have only touched the basics thus far, things can get
much much more complicated if you want them to..
but the great thing about Perl is that you now know enough to write usable
scripts, (as I soon hope to show you.)
You can now learn new stuff only when you find something you can't do with
what you know.
(I should tell you however that once you grasp the fundamentals,
it gets addictive and you want MORE!!!! :-) but that's a good thing.
<Cool Tip>
Modules are the single biggest reason to use Perl for a particular task.
there are hundreds of modules included standard with any Perl installation..
and literally thousands more you can download and use (for no cost) if the
urge takes you. (see http://search.cpan.org
for a search engine that finds modules for any particular purpose.)
Here are just a tiny number of the things modules can do for you.
- Interacting very easily with Databases: Access, MySQL, PostgreSQL, Oracle
and more.
- Create, resize, crop and modify images, (jpg, gif, png etc..)
- Write an entire web page template in one line of Perl code.
- Set and retrieve cookies, or any other type of header.
- Upload files to your server using just your browser and a Perl script.
- Run other programs on your web server, and display their results in a
browser.
- Send email in a dozen different ways, attach files to them, make them
pretty with images and html.
That's just a tiny sample of the more common ones, but if you want more, I
could keep listing stuff you can do for hours..
Modules are really just snippets of prewritten code (a little bit like
subroutines) that you can call and use in any way you like, the vast
majority of the work is already done for you. (some modules can run into
hundreds of lines of code all prewritten and ready for you to use them at
the drop of a hat.. (and quiet easily as well.) and since most of the most
useful ones are already on your Perl web server, all it takes is one line of
code to call them. (in fact all the full examples I have shown you thus far,
have "use"d modules.. (that's a pun, the lines that start with "use" were
calling a module of some sort).
# The best bit is you don't have to understand a module to use it... I use
heaps of modules that I have never even looked at before... they all use a
standards Object Oriented programming to do their magic.
Modules don't even have to be written in Perl some modules have C code in
them to get that little extra speed... (but you don't need to know C to use
them.)
Since Perl is an "open source" language, there are hundreds of thousands of
developers using it all the time, and the vast majority of them contribute
back to the community in some way. (many by writing modules.) so new
features are being added all the time..
So "where will you want to go tomorrow" would be a good logo for Perl.
.
So if you are ready to learn more:
Back to the Tutorial Index