Broad Network


Reading Text File Content in Perl

Handling Files and Directories in Perl – Part 1

Perl Course

Foreword: In this part of the series I explain the different ways you can read part or all of the content of a text file.

By: Chrysanthus Date Published: 27 Aug 2015

Introduction

This is part 1 of my series, Handling Files and Directories in Perl. In this part of the series I explain the different ways you can read part or all of the content of a text file. For simplicity, the file is assumed to be in the working directory, which is the directory that has the Perl script. I give you some of the knowledge directly and for the rest, I answer frequently asked questions in an optimum way. If you are not using ActivePerl, then begin every code sample with something like, #!/usr/bin/perl .

Pre-Knowledge
At the bottom of this page, you have links to the different series you should have read before coming here. This series is part of the volume, Perl Course.

Filehandle
A filehandle is a type of variable. Its purpose is to open and close files of the disk. When a file is opened, a copy of it is in memory. When a file is closed, the relationship between the file in memory and the corresponding one in disk is ended; also any part of the file that was in memory and had not been sent to disk, is flushed to disk. You access a file and its content using the filehandle. To open a file for reading (from disk) you would type something like:

        open(fHandle, "<", "input.txt") or die "cannot open input.txt : $!";

This is an example of a Perl simple statement. It has the or operator in the middle instead of a statement modifier. (Statement modifiers are: if, unless, while, until, foreach, for and when.). The meaning of “or” here, is the same as in English. Note that the open() function has 3 arguments.

At the left operand of the or operator, you have the open predefined function call. The first argument is the filehandle. In Perl, you do not have to declare a filehandle before you use it. The second argument, means open file for reading. The last argument is the name of the file (which is in the working directory).

At the right of the or operand, you have “die” followed by a string. die is not an operator. It is a predefined function. It means: stop executing the script and exit. exit means: leave the program (script). The string after die is the argument of die. It has the error message that will be displayed to the user. The special variable $!, at this point, has the technical error message. Remember, a variable in double quotes is expanded (replaced by its value). The value of $! at this point may also contain the current script line number and input line number.

The open function returns a nonzero value on success and undef otherwise. In Perl a nonzero value is equivalent to true and undef is equivalent to false.

To close a file, you would type something like:

    close fHandle;

Here, fHandle is the same handle you used in opening the file. The close function returns true on success and false on failure.

The getc, tell and seek Functions
The syntaxes for the getc function are:

    getc FILEHANDLE
    getc

This function returns the next character from the input file attached to FILEHANDLE, or undef at end of file or when an error is encountered (in the latter case $! is set). If FILEHANDLE is omitted, it reads from STDIN (the console).

Assume that you have the following as content of a text file:

    I like the Star Wars series.
    Many people appreciate it.

If the file handle is fHandle, then at the beginning, the function,

    getc fHandle

will return I, which is at the first position. The character pointer will now point to the next character, which is a space. If you execute the function again, the space will be returned. The character pointer then points to the next character, which is l. Execution of the function will return l. The character pointer will then point to the next character, which is i. Execution of the function will return i. It continues like that. Note that at the end of the first line, there is a newline character, which is, \n, but not displayed.

The syntaxes for the tell function are:

    tell FILEHANDLE
    tell

If FILEHANDLE is omitted, tell() assumes the file last read (and not STDIN). The character pointer is called the seekpointer. At the beginning of the program execution, the seekpointer points to the first character. After some handling of the file, it may be pointing at any character in the file. The tell function returns the current position in bytes for FILEHANDLE, or -1 on error. Here, FILEHANDLE refers to the file content. The current position is the position of the character the seekpointer is pointing to.

Since the return value of the tell function is in bytes, you do not use it directly; you use it in a variable.

The syntax for the seek function is:

    seek FILEHANDLE,POSITION,WHENCE

Here, FILEHANDLE is a filehandle such as, fHandle. Position is the variable holding the bytes returned by the tell function. WHENCE has three possible values, but based on what I have said so far, use the value of 0.

Read and try the following code:

use strict;

    open(fHandle, ">", "myFile.txt") or die "cannot open myFile.txt : $!";
    print fHandle "I like the Star Wars series.\nMany people appreciate it.";
    close fHandle;

    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";
    for my $no (0..6)
        {
            print getc fHandle;
        }
    print "\n";
    my $pos = tell fHandle;
    close fHandle;

    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";
    seek fHandle,$pos,0;
    for my $no (0,1,2)
        {
            print getc fHandle;
        }
    close fHandle;

The first code segment begins by opening a new file called, myFile.txt, for writing. To open a file for writing, you use, >. To open a file for reading, you use <. The file is created and saved in the current directory. The next line in the segment, uses the filehandle to print the following two lines into the file (not to the console):

    I like the Star Wars series.
    Many people appreciate it.

The last line in the segment closes the file.

The next code segment begins by opening the same file for reading. It uses the getc function to get and print to the console, the next 7 characters (“I like ”) from the beginning of the file. It goes on to obtain the position of the next character to be gotten and assigns the position to the variable $pos. It then closes the file.

The third code segment, reopens the file for reading. At this point, the seekpointer is at (has gone back to) the beginning of the file. After the file has been opened, the seek function sends the seekpointer to the position that it was before the file was closed. The next 3 characters are read and printed to the console. Do not confuse between printing to the console and printing to a file.

The output of the code is:

    I like
    the

The read Function
The syntax for the read function:

    read FILEHANDLE,SCALAR,LENGTH

The last argument is optional. This function attempts to read LENGTH characters of data into variable SCALAR from the specified FILEHANDLE. LENGTH is an integer, and the reading starts from the beginning of the file. Try the following code for the above file:

use strict;

    my $str;
    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";
    read (fHandle, $str, 40);
    print $str;
    close fHandle;

Note that in Perl, parentheses for arguments of a function call are optional. Also not that the \n character is counted and becomes part of the returned string. The output is:

    I like the Star Wars series.
    Many people

If you do not want the reading to start from the beginning of the file, then you have to use the seek function, similar to the way it was used above.

Reading one Line at a Time
In a file, a line ends with \n. To read one line at a time, you use a while loop, the <> operator and the filehandle. The following code illustrates this for the above text file:

use strict;

    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";

    while (<fHandle>)
        {
            print $_;
        }
    close fHandle;

The output is:

    I like the Star Wars series.
    Many people appreciate it.

In the code, <fHandle> reads one line for each iteration, and assigns the line to the special $_ variable. Note that each line read includes the \n character. To read any group of lines from the file, you will have to use the tell and seek functions.

The readline Function
The syntax for this function is:

    readline EXPR

In scalar context, it reads one line at each call, into the $_ variable until end-of-file. In list context, it reads all the lines in the file into an array, with each line read going into an array cell. Each line read includes the \n character. Try the following code for the above file, which first reads in scalar context and then in list context.

use strict;

    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";
    while (readline fHandle)
        {
            print $_;
        }
    print "\n\n";
    close fHandle;

    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";
    my @arr = readline fHandle;
    print $_ foreach @arr;
    close fHandle;

The output is:

    I like the Star Wars series.
    Many people appreciate it.

    I like the Star Wars series.
    Many people appreciate it.

For the scalar content the main code is:

    while (readline fHandle)
        {
            print $_;
        }

If you do not want to use $_, then you can use code like:

    while ($str = readline fHandle)
        {
            print $str;
        }

In list context, the main code is:

    my @arr = readline fHandle;
    print $_ foreach @arr;

The second line in the code is an example of a Perl simple statement. Here, $_ is the content of each cell in the list (array). The first line in the code reads all the lines in the file and sends each to a cell in the array, in the order read.

Solution to Frequently asked Questions
For the rest of this tutorial, I offer solutions to frequently asked questions concerning the reading of a text file.

Counting the Number of Lines in a File
To count the number of lines in a file, you open the file for reading, read the file line-by-line. As you are reading the file, the special variable $. is keeping the number of lines read. $. is the current line number from the last filehandle read: The following code will read the number of lines in the above file:

use strict;
    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";
    while (<fHandle>)
        {
        }
    print $. ;
    close fHandle;

You can add other code in the while-block. If you try this code with the above file, 2 will be printed.

Reading a File by Paragraphs
The preceding text in a file is considered as a paragraph, if it ends with \n\n. A single \n is end-of-line and beginning of new line, while \n\n is end of paragraph or end of a number of lines and beginning of new paragraph (the first \n in \n\n is end of line). There should be no space between the consecutive \n\n. That is, \n \n does not separate paragraphs.

The special $/ variable, holds the line separator, which is normally, \n. To read a file paragraph-by-paragraph, you have to assign "\n\n" to $/. In this case, $_ will hold a paragraph instead of a line. When you are done, reading the file, you change $/ back to \n.

The following code will read a file paragraph-by-paragraph:

use strict;
    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";
    $/ = "\n\n";
    while (<fHandle>)
        {
            print $_;
        }
    close fHandle;
     $/ = "\n";

The following code will send one paragraph to a cell of the array:

use strict;
    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";
    $/ = "\n\n";
    my @arr = <fHandle>;
    close fHandle;
    $/ = "\n";

Note that instead of using,

    my @arr = readline fHandle;

you can use,

    my @arr = <fHandle>;

Now, will you always remember to be setting $/ to "\n\n" and then setting it back to "\n "? No. So use the following code:

use strict;

    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";
    {
        local $/ = "\n\n";
        my @arr = <fHandle>;
    }
    close fHandle;

Now, we have an anonymous (no name) block, where the $/ variable is preceded by the reserved word, local. This makes the change in $/ local to the block, and outside the block, $/ has its normal value. As I said in one of the previous series, Perl is an easygoing language, but you must learn the optimum way of coding it.

Reading the Entire file at Once
You can read all the content of the file independent of the lines, into a string. Now, at the end of a file, Perl sees undef, everything being equal. So to read all the file into a string, use the following code:

use strict;

    my $str;
    open(fHandle, "<", "myFile.txt") or die "cannot open myFile.txt : $!";
    {
        local $/ = undef;
        $str = <fHandle>;
    }
    print $str;
    close fHandle;

Now, in the anonymous block, $/ , which is the line or record separator, is temporarily given the value, undef. Within the block the whole file will be seen as one line ending with undef. So, <fHandle> sees and reads only one line, which is the whole file. Since $/ was preceded by local, the change in line separation was only in the block. Outside the block, $/ has its normal value. With “local $/ = undef;” in the block, instead of

    $str = <fHandle>;

you can use,

        $str = readline fHandle;

since in the block, the whole file is now one line. Under the same condition, you can also replace that line with the following while loop:

        while (<fHandle>)
            {
                $str = $_ ;
            }

So, the secret to read all the file at once, is to make the whole file appear as a single line.

Well, it has been a long ride. Let us take a break here and continue in the next part of the series.

Chrys

Related Links

Perl Basics
Perl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course

NEXT

Comments

Become the Writer's Fan
Send the Writer a Message