Processing and Modifying Array Elements

Using Perl Arrays – Part 3

Perl Course

Foreword: In this part of the series, I talk about scanning of an array, removing duplicate elements from the array and the effect of the defined() function on the array.

By: Chrysanthus Date Published: 30 Oct 2015

Introduction

This is part 3 of my series, Using Perl Arrays. In this part of the series, I talk about scanning of an array, removing duplicate elements from the array and the effect of the defined() function on the array. You should have read the previous parts of the series before coming here, as this is a continuation.

Scanning an Array
The array is often used with the foreach statement modifier. In a simple foreach statement, $_ refers to each element of the array behind the foreach modifier. In a compound foreach statement, $_ refers to each element of the array, ahead of the foreach modifier (and inside the block). The following code prints all the elements of an array, separating them with space:

use strict;

    my @animals = ("camel", "donkey", "cow", "horse", "pig");

    print "$_ " foreach @animals;

The output is:

    camel donkey cow horse pig

The following code will also have the same printout:

use strict;

    my @animals = ("camel", "donkey", "cow", "horse", "pig");

    foreach (@animals)
        {
            print "$_ ";
        }

The first code has the simple foreach statement, and the second code has the compound foreach statement. You should always use the simple statement (alternative), when there would be one statement in the block of the compound statement.

Note: with the compound statement, the array name is in parentheses. However, with the simple statement, the array name does not necessarily have to be in parentheses.

You can access an array element using indices ([0], [1], [2], [3], [4], etc.). You must have seen that before, if you have been reading this volume in the order given.

Removing Duplicate Elements from an Array
To remove duplicate elements from an array, use a hash and the map function. The map function copies array elements into the hash, with the array elements becoming the keys of the hash, and the values of the hash all set to 1. A hash never allows duplicate keys. So, in copying the array into the hash, duplicate keys are removed. You then use the hash keys() function to get back the array. In the following code, "donkey" and "horse" are duplicated in the array. The code removes the duplicate. Try the code:

use strict;

    my @animals = ("camel", "donkey", "cow", "horse", "pig", "donkey", "horse");

    my %ha   = map { $_, 1 } @animals;

    my @animals = keys(%ha);

    print "$_ " foreach @animals;

For my computer, the output is:

    donkey cow horse pig camel

The use of the map function here, is a special use of the map function. The map function here, uses a block for its first argument. Remember, when the map or grep function uses a block for the first argument, there is no comma to separate the first argument from the second argument. Parentheses for any Perl function arguments is optional. However, when you want the Perl function to have a high precedence, you have to use parentheses.

In the first argument of the map function above, when you type just $_, 1 in the block, you end up with a hash, if the second argument to the map function is an array. That is how the map function has been pre-defined.

The above code may not necessarily result in an array with the original order. If you want to maintain the original order then you may have to do that manually as the following code illustrates (see explanation below):

use strict;

    my @animals = ("camel", "donkey", "cow", "horse", "pig", "donkey", "horse");

    my %seen   = ();
    my @unique = ();

    foreach my $elem (@animals)
        {
            next if $seen{$elem}++;
            push @unique, $elem;
        }

    @animals = undef;
    @animals = @unique;
    undef %seen;
    undef @unique;

    print "$_ " foreach @animals;

The output with the original order maintained is:

    camel donkey cow horse pig

At the top of the code, you have the array with duplicates. You then have an empty array, @unique and an empty hash, %seen, defined. Next you have a foreach compound statement defined.

The foreach compound statement looks at the next element in the array with duplicates: if the following element (naturally beginning with the first) of the array with duplicate is not in the %seen hash, it copies the element as a key to the %seen hash. Remember that a hash will never allow duplication of keys. When an element is copied from the original array to the hash, the same element is also copied into the @unique array. The unique array is the ultimate array with no duplicate (but maintains the original order).

The presence of the hash in the code is just to insist that there is no duplicate (with keys).

What is really going on in the block of the compound statement? The first statement in this block is:

            next if $seen{$elem}++;

This statement has a lot. It begins with the command, next. This command takes no argument. So the foreach loop will skip to the next iteration if the following element in the array with duplicate, is found as a key in the %seen hash. “if $seen{ $elem }++” is not the argument to the next command. Here, if, is a simple statement modifier. The condition for if is “$seen{ $elem }++”, which should result in true or false.

++ is the increment operator. It can work with hash elements as well as with numbers. With a hash, it points to the next key (beginning with the first in the iterations), including the key into the hash. The if condition returns true if the following element in the array with duplicate, whose value is supposed to be a key in the %seen hash, is already in the %seen hash. A new copy to the hash, acquires the value for the key of 1, while a duplicate replaces the old one but acquires a value of 2.  If the condition is true, the next command skips to the next iteration, without allowing the second statement in the block to copy the value to the @unique array. If the value did not exist as key, in the hash (and has been copied the first time to the hash), then the second statement copies the value to the @unique array.

The foreach block copies an element from the array with duplicate to the %seen hash as well as to the @unique array. A duplicate element of the original array overrides a key in the hash but is not copied again to the @unique array.

After the foreach loop, the original array is shrunk to zero length by assigning undef to it. The @unique array has no duplicate but it has the original order of the original array, which had duplicates. The @unique array is then copied to the original array variable.

The aim of the hash is to naturally prevent its keys, which are the values of the unique array, from having duplicates. The aim of the @unique array is to hold the values of the original array and without duplicates, while maintaining the order of elements in the original array. At this point these two entities are no longer needed, so they are removed from memory by undefining them.

The above code uses the temporary %seen hash to remove duplicates and uses the temporary @unique array to maintain order. A lot of the work is done in the foreach block.

The foreach loop evaluates each element in the original array for duplicate. If true, it does not copy the element to the @unique array; otherwise, it copies the element as key to the %seen hash and it copies the same element to the @unique array.

The foreach loop is better coded with the grep() function, using NOT logic. By so doing, the size of the code is reduced. You should use the following code instead of the above:

use strict;

    my @animals = ("camel", "donkey", "cow", "horse", "pig", "donkey", "horse");

    my %seen   = ();

    my @unique = grep { ! $seen{ $_ }++ } @animals;

    @animals = undef;
    @animals = @unique;
    undef %seen;
    undef @unique;

    print "$_ " foreach @animals;

The arguments to the grep() function are the block, “{ ! $seen{ $_ }++ }” and the list, @animals. Remember, when the first argument to the grep or map function is a block, there is no comma separating the first argument and the second argument.

Effect of Defined() on Empty Array and Hashes
The aim of the defined() function is to test whether a scalar or function is defined. It is not to test whether an array or hash is defined. So do not use it with arrays and hashes. If you use it with an empty array or an empty hash, you will have true returned, which is not correct.

That is it for this part of the series. We stop here and continue in the next part.

Chrys

Broad Network

Related Articles

Processing and Modifying Array Elements

Using Perl Arrays – Part 3

Perl Course

Introduction

Related Links

Comments