Basics of Perl Reference

Perl Basics – Part 20

Perl Course

Foreword: In this part of the series, I talk about Perl Reference Basics.

By: Chrysanthus Date Published: 29 Mar 2015

Introduction

This is part 20 of my series, Perl Basics. A region in memory is a set of consecutive memory cells. Values (variable contents) are kept in memory regions. A variable identifies a memory region. A reference is a pointer to a memory region and is generally used when you are interested in what is in a memory region and not the variable. Consider a reference as the address of a memory region. This memory region can have a value. In this part of the series, I talk about Perl Reference Basics. Everything outlined in this tutorial is applicable to traditional Perl. You should have read the previous part of the series before reaching here, as this is a continuation.

Memory Region
A memory region is an area in computer memory that holds the value of a variable. By value here, I am referring to a scalar value, array or hash.

Different Memory Regions with Different Variables
Consider the following two consecutive statements:

  my $myVar = "I am the content of a large text file from the hard disk, now in memory.";
  my $aVar  = "I am the content of a large text file from the hard disk, now in memory.";

You have two different variables with different names but with the same string values. A variable identifies a memory region. Two different variables with two different names identify two different memory regions, everything being equal. In the above case, the two values, even though are the same, are in two different memory regions.

Same Memory Region for two Different Variables
In Perl you can make the same memory region have two different variables. The two different variables will of course identify the same value. Consider the following consecutive two statements:

  my $myVar = "I am the content of a large text file from the hard disk, now in memory.";
  my $hisVar  = \$myVar;

For the first statement you have a value assign to the variable, $myVar. In the second statement, $myVar is preceded with the \ sign before being assigned to a new variable, $hisVar. \ is an operator. In the second statement, the operator, preceding the first variable, makes the second variable identify the same memory region (same value) as the first variable. An important thing to note here is that $myVar refers to a memory region.

For the second statement, $myVar, with \, is a reference (address of the memory region identified by $myVar). $hisVar is a variable, not a reference. We say $myVar holds a value (the string), while $hisVar holds the reference (address).

Using a Reference
Now that you have a reference, how can you get the value of the memory region that a variable is referring to? In other words a variable is holding the reference of some memory region, how can you get the value of that memory region, using the variable? In the above case, $hisVar is holding the reference of the region identified by $myVar. To get the value using $myVar, there is no problem, because you just have to use $myVar in place of the value. To obtain the value from the variable ($hisVar), which has the reference, you have to use the {} brackets as follows:

    ${$hisVar}

Here, we are dealing with a scalar, so you begin with the scalar sign. This is followed by a pair of braces. Inside the braces, you have the variable that holds the reference. The following code illustrates this:

use strict;

my $myVar = "I am the content of a large text file from the hard disk, now in memory.";
my $hisVar  = \$myVar;

print ${$hisVar};

That is you replace hisVar with {$hisVar}.

Scalar and Reference
The above explanation is applicable to scalars. A similar thing is applicable to arrays and hashes. However, with arrays and hashes, there are two ways of creating a reference and two ways of using the reference.

Array and Reference
Consider the following array creation:

    my @arr = ("one", "two", 3, 4);

To make a reference out of @arr, you have to precede the variable with the sign, \ as the following statement illustrates:

    my $aRef = \@arr;

Anonymous Array
In the above section, you need two statements in order to come up with an array reference. The first statement gives the array a name, @arr; the second creates a reference and assigns to a variable. Now, it is possible to use only one statement to come up with an array reference. In this case, the array will not have a name. It will just have a reference to the region in memory that has the array. The following statement illustrates this:

    my $aRef = ["one", "two", 3, 4];

Note here that we have square brackets to delimit the array elements and not curved brackets for list, as before. Under this condition, the square brackets return a reference (memory address) to the array. This reference returned is assigned to the scalar variable, $aRef. You do not need the \ sign here, since the array does not have a name and the sign should be used in front of a name.

Note: Any variable that holds a reference is a scalar variable. So the reference of an array or hash is held by a scalar variable.

Using an Array Reference
One way to get the array from the array reference (variable holding the reference) is to use the braces. For the above reference, you would type,

    @{$aRef}

You begin with the array sign, @, since you are dealing with an array. This is followed by braces. Inside the braces, you have the variable that holds the reference.

You usually do not use the array as a whole (as indicated above). You usually use an element from the array. For an array that has a name, if you want to use the array name to get an element, you would type something like,

    $arr[2]

where the name of the array (variable) is @arr. When you have a reference to the array, you do a similar thing but with the braces as follows:

    ${$aRef}[2]

That is you replace, arr with {$aRef}.

The other way of accessing an array is applicable when you want an element from the array (this is what you do most of the time). With this way, you do not start with the preceding scalar sign, $. You also omit the braces. However, you follow the array reference variable, with an arrow, -> (minus sign followed by greater than sign), as in the following example:

    $aRef->[2]

Using this way does not depend on whether the array reference was from an array that has a name or an anonymous array.

Read and try the following code:

use strict;

my @arr = ("one", "two", 3, 4);
my $aRef = \@arr;
print ${$aRef}[1];

print "\n";

my $arRef = ["one", "two", 3, 4];
print $arRef->[3];

Hash and Reference
Consider the following hash creation:

my %ha = (Apple => "purple", Banana => "yellow", Pear => "green", Lemon => "green");

To make a reference out of %ha, you have to precede it with the sign, \ as the following statement illustrates:

    my $hRef = \%ha;

Anonymous Hash
In the above section, you need two statements in order to come up with a hash reference. The first statement gives the hash a name, %ha; the second determines and assigns the hash reference to a scalar variable. It is possible to use only one statement to come up with a hash. In this case, the hash will not have a name. It will just have a reference to the region in memory that has the hash. The following statement illustrates this:

my $hRef = {Apple => "purple", Banana => "yellow", Pear => "green", Lemon => "green"};

Note here that we have braces to delimit the hash elements and not curved brackets for list, as before. Under this condition, the braces return a reference (memory address) to the hash. This reference returned is assigned to the scalar variable, $hRef. You do not need the \ sign here, since the hash does not have a name and the sign should be used in front of a name (variable) to return a reference.

Note: Any variable that holds a reference is a scalar variable. So the reference of an array or hash is held by a scalar variable. The reference of a scalar is still held by a scalar variable.

Using a Hash Reference
One way to get the hash from a hash reference (variable holding the reference) is to use the braces. For the above reference, you would type,

    %{$aRef}

You begin with the hash sign, %, since we are dealing with a hash. This is followed by braces. Inside the braces, you have the variable that holds the reference.

You usually do not use the hash as a whole. You usually use a value from the hash. For a hash that has a name, if you want to use the hash name to get a value, you would type something like,

    $ha{'key'}

where the name of the hash (variable) is $ha. When you have a reference to the hash, you do a similar thing but with the braces as follows:

    ${$hRef}{'key'}

That is you replace, ha with {$hRef}.

The other way of accessing a hash is applicable when you want an element from the hash (this is what you do most of the time). With this way, you do not start with the preceding scalar sign, $. You also omit the braces. However, you follow the hash reference variable, with an arrow, ->, as in the following example:

    $hRef->{'key'}

Using this way does not depend on whether the hash reference was from a hash that has a name or an anonymous hash.

Read and try the following code:

use strict;

my %ha = (Apple => "purple", Banana => "yellow", Pear => "green", Lemon => "green");
my $hRef = \%ha;
print ${$hRef}{'Apple'};

print "\n";

my $haRef = {Apple => "purple", Banana => "yellow", Pear => "green", Lemon => "green"};
print $haRef->{'Banana'};

Passing Argument by Reference to a Subroutine
Read and try the following code:

use strict;

my %ha = (Apple => "purple", Banana => "yellow");

sub mySub
    {
        print $_[0], " ", $_[1], " ", $_[2], " ", $_[3], " ", $_[4], " ", $_[5], " ";

    }

mySub("one", "two", %ha);

In the function call, the first argument is "one", the second argument is "two" and the third argument is %ha. As soon as the function definition begins execution, "one" becomes the first value of @_; "two" becomes the second argument of @_. Then the items of the hash are flattened out in the rest of the values of @_: In my computer, Apple becomes the fifth value of @_ and this is not a good idea since Apple is actually the first key of the hash. purple becomes the sixth value of @_ in my computer. Banana becomes the third value of @_ in my computer. In my computer, yellow becomes the fourth value of @_.

Note that the hash argument (%ha) in the calling function is not a reference. This gives rise to two problems. The items of the hash are flattened out in the @_ array. Also, the hash pairs in the @_ array are not in the order, they were typed in the creation of the hash. For the array, only the flattened problem occurs; the elements order is maintained. To solve the flattened problem, that is to maintain the structure of a hash or array when passed to a function, you have to pass the hash or array by reference. The following code illustrates this.

use strict;

my %ha = (Apple => "purple", Banana => "yellow");

sub mySub
    {
        print $_[0], " ", $_[1], " ", $_[2]->{'Apple'}, " ", $_[2]->{'Banana'};

    }

mySub("one", "two", \%ha);

The third argument in the function call is a reference to the hash. This was achieved by preceding the hash variable name with \ in the parentheses of the arguments. In the execution of the function definition, there are only three values for the @_ array now (two scalars and one reference). The first value for the @_ array is the first argument in the function call; that is alright. The second value for the array is the second argument of the function call; that is alright. Now, the third value of the array is a reference to the hash; that is accepted. From this reference, you can get all the values of the hash. Note how the two values of the hash were gotten inside the above function definition.

When you pass an array or a hash as argument ordinarily to a function, you end up with two copies of the items for the array or hash: one copy of the items remains in the array or hash variable, outside the function definition. The other copy of the items are in the @_ array, when the function is called. In the above code, the structure of the hash has been maintained and you have only one copy of the hash items, which are those of the created hash outside the subroutine definition. The original hash with its structure is of course, maintained (the hash structure is maintained but the order of the pairs may not be maintained).

If you assigned the hash reference to a variable (scalar), you can still use the variable (that holds the reference) as argument in the function call. The following code illustrates this:

use strict;

my %ha = (Apple => "purple", Banana => "yellow");
my $hRef = \%ha;

sub mySub
    {
        print $_[0], " ", $_[1], " ", $_[2]->{'Apple'}, " ", $_[2]->{'Banana'};

    }

mySub("one", "two", $hRef);

Why use Reference
Imagine that you copy the content of a large file from the hard disk to the memory. You may want to use different variables to refer to this content. If the variables you use are not references, then you would have many copies of the same content in memory. That will not be an economical use of memory. Memory to the computer is similar to money to people; memory is always scarce. I hope you see why references can be useful.

We have covered a lot for this part of the series. Let us stop here and continue in the next part, with a different topic.

Chrys

Broad Network

Related Articles

Basics of Perl Reference

Perl Basics – Part 20

Perl Course

Introduction

Related Links

Comments