Storing data in a Perl hash (associative array to some people)

NOTE: This tutorial was written for screen resolutions of at least 1024x768, it will look terrible at 800x600 because the pre tags used to display the code don't wrap to suit screen size. (and wrapping code can be confusing)
Our apologies for any inconvenience.

Last time I spoke of arrays, and how a basic array is just a list of items, like so:

my @array_of_animals = ( 'Dog', 'cat', 'horse', 'mouse', 'sheep', 'cow');

Each item is retrievable by the number of its position in the list.. like so:

my @array_of_animals = ( 'Dog', 'cat', 'horse', 'mouse', 'sheep', 'cow');
                           0      1       2        3        4       5

The "key" to an item, is the number of the item in the list (starting from zero).
$array_of_animals{2} is equal to horse..

Now I want to take about hashes, which are also known as Associative arrays in other languages.
(but in Perl they are called hashes because Perl doesn't like big words that are hard to spell, to hard on the programmers :-)

A hash is basically an array where instead of having numerical keys (like an array), the keys can be anything you want. Here is an example that equates an abreviated weekday name with its full name:

Creating a hash.

my %weekdays = (
'Sun' => 'Sunday',
'Mon' => 'Monday',
'Tue' => 'Tuesday',
'Wed' => 'Wednesday',
'Thu' => 'Thursday',
'Fri' => 'Friday',
'Sat' => 'Saturday',
   );

Retrieving a value from a hash.

As you can see, its much the same thing as a normal array, the only difference is that YOU specify the keys that retrieve the data. so to get to the info (value) of the "Wed" key, we do this:

my $day_of_the_week = $weekdays{'Wed'};
   # $day_of_the_week variable
   # is set to the string (of text): Wednesday

Adding new key/values to a hash.

You can just as easily add new items to the array, you do so in the following way:

$weekdays{'some'} = 'someday';

And that will basically add: 'some' => 'someday' to the hash.

Changing the value of an existing hash key.

You can change the value of an item in the hash the same way you set it. For example, if we want to change "someday" to "some day" in the example above, then we would do it like this:

$weekdays{'some'} = 'some day';

And the old value of "someday" is overwritten with "some day".

Deleting a key/value from a hash.

Deleting a key/value from a hash is likewise simple, we use "delete" to do that. Here is an example that removes the new key we just added above.

delete $weekdays{'some'};

All pretty simple really, but it can get much more complicated when we start dealing with hashes of hashes or hash slices, or hash references but that's more advanced stuff for another time..

You might be wondering what the use of all this is..
Firstly, you can't really do anything with a language until you can store and manipulate the data the program has to work with, and the variable ($variable) the array (@array) and the hash (%hash) are the containers you store and manipulate your bits of data in.
Without those containers, a language is mostly useless.
For a small example:
Say you have a html form, and you want to record any data submitted in that form to be written to a file or emailed to you... a hash is the key to doing that (so to speak).
The data from a form is usually supplied in pairs, the key (or name in html) and the value. For example: name=franki, email=me@somewhere.com

Would be submitted by a HTML/XHTML form with the following inputs:

<input type='text' name='name' value='franki' />
<input type='text' name='email' value='me@somewhere.com' />

Thats the perfect use for a hash, and for many many years, the popular method of accessing form data was by splitting the pairs up and putting them into a hash.
cgi-lib.pl is an older example of this.

When you use cgi-lib, it basically asks you for the name of a hash to put the form data into. Once cgi-lib has put the data is in the hash, you can access it in the script like so: (this is assumming you set the hash to %formdata)

my $name  = $formdata{'name'};
my $email = $formdata{'email'};

NOTE: cgi-lib.pl is part of Perls history now, it shouldn't really be used in modern scripts as there are much better libraries available. (The CGI module is one example, its a standard part of Perl, and does much more then just collect form data and manipulate cookies)

Example: Sorting a hash by its value.

Many times tutorials tell you how to do something, without telling you why you should do it. Having been guilty of this myself from time to time I've decided to remedy this by giving you a real world example of a problem the solution to which involved two hashes.

Let's start with the scenario:

You have a number of files in a directory, the files names are in this format:

Friday_June29_2007at20-02-10.txt

A program collects all these file names and assembles them into web addresses. The program needs to be modified such that it lists the files from newest to oldest rather than the random order they are listed in at present.
Attempts to base the sorting on the files creation time or last modified time fail because the files have been moved from server to server (and Windows to *nix) serveral times.

After realising I couldn't sort based on "stat"ing the files. I took the only other logical option. I'd have to split up the filename and create a timestamp of it in numeric form that could be used to sort them into order. So with some creative use of regex and split I got the day, month and year seperated. (don't need to sort on time for this purpose date is just fine.)


# 	The data is in this format:
#	year:	2007
#	month:	May
#	Day:	26

Which is perfect with the exception of the month. Now keep in mind that we are talking about hundreds of files here. So it's not something that could be done easily by hand.

The answer is to use a hash to convert a month string (like 'May') into a month number (like 05).

And here is the hash in question.

my %month = (
	'January' => '01',
	'February' => '02',
	'March' => '03',
	'April' => '04',
	'May' => '05',
	'June' => '06',
	'July' => '07',
	'August' => '08',
	'September' => '09',
	'October' => '10',
	'November' => '11',
	'December' => '12',
	    );

Now, for every file in the directory I want to sort I can run it against the hash here and convert it to a numeric 2 digit month, which is just what I need to sort with. So while in the midst of my foreach loop I run each files month of creation though the month hash to get its numeric month of creation.

  
foreach my $file (@files)
{
	# A big block of code to split the filename up 
	# into year, month and day has been removed to 
	# shorten and simplify.

	# The end result is that we now have $day, 
	# $word_month and $year to work with, the first
	# step is to get them all into number format.
	
	# Convert the string month to a number month.
$number_month = $month{$word_month};  
	# That line basically passes the written word as 
	# the key to the hash and asks for the value of
	# that key which it assigns to $number_month.
	# An example would be $month{'july'} which would
	# have returned 07.
	
	# Now we want to assemble the numbers into a 
	# single timestamp number:
$timestamp = ("$year"."$number_month"."$day");	
	# The end result is a number format that would look 
	# like: 20070629 exactly what we needed to do our sort.
	# And a Hash made it not only possible but pretty 
	# easy to boot.

	# Ram use is not really an issue here as there are 
	# only a few hundred files not a few thousand so I 
	# want to put the whole lot into a second hash with
	# the filename as the key, and the timestamp as the 
	# value, (so we can sort though them by timestamp 
	# and build the list of URL's to the file in order 
	# of date.)
	# So we create a new empty hash called %archivehash,
	#(which in the actual script, I created above the 
	# foreach loop rather than in it so it's scope is 
	# valid beyond the loop, but here I will
	# instead use "our" instead of "my" to increase 
	# its scope.

our %archivehash = ();
	# Now we need to put the new
	# data in the hash, remember we are still in the 
	# loop that performs the tasks listed in here once
	# for every file in the group, so every files name 
	# and timestamp will be inserted into the new hash. 
	# $file is the variable that contains the file being 
	# processed at each turn of the loop.
$archivehash{$file}= $timestamp;
	# OK, we have all the info we need now,
	# so we can end the foreach loop. 
} # End of foreach loop.

FYI: If we had created this hash and it's content manually it might look like this:

my %archivehash = (
  'Friday_June29_2007at20-02-10.txt' => '20070629',
  'Thursday_October19_2006at09-35-51.txt' => '20061019',
  'Wednesday_October25_2006at15-17-38.txt' => '20061025',
  'Friday_October20_2006at02-46-20.txt' => '20061020',
  'Tuesday_February20_2007at03-17-43.txt' => '20070220',
  'Sunday_October1_2006at11-41-53.txt' => '20041001',
  'Monday_March10_2003at06-03-45.txt' => '20030310',
	);

Only much longer of course.

So now the fun starts, we want to take %archivehash, which contains all files and all timestamps as key=>value pairs and assemble them into a URL list in order of newest to oldest.

Ironically it requires another loop to do so, it could probably have all been done in the one loop. But in this case I just needed it to work, efficiency was a lesser priority than time. Also, it would be much more complicated to follow as one loop.

Here's the final bit of code which does the job:

foreach my $file (sort {$archivehash{$b} cmp $archivehash{$a} }
           keys %archivehash)
{
print '<tr><td colspan="2"><a href="'.$download_url;
print '/'.$file.'">'.$file.'</a></td></tr>';
}

Basically this loop goes though the hash, one key at a time, and sorts them value by value in reverse order by using the comparison operator 'cmp'. (This is sorting in decending rather then assending order. If you swapped $b and $a around, it would be in assending order instead.)

Lastly we print the sorted results out one at a time, (because we're in a loop remember) into a lovely table format. ($download_url is the variable that contains the web address to the directory containing the files.)

The end result, would be several hundred lines like this:

<tr> <td colspan="2"> <a href="http://mydomain.com/directory1/archive3/Friday_June29_2007at20-02-10.txt"> Friday_June29_2007at20-02-10.txt</a></td></tr>

Perfect!

For some more advanced constructs based on arrays and hashes please see:
An array of hashes or A hash of hashes

So if you are ready to learn more:
Back to the Tutorial Index