Overview of Internet Sockets
Internet Sockets and Perl - Part 1
Writing a Perl Module
Foreword: In this part of the series, I give you an overview of Internet sockets.
By: Chrysanthus Date Published: 18 Oct 2014
You should have completed a professional and advanced course in Perl. The link, “Perl Course” below, has the two courses. You should also have completed a database and a MySQL (professional) course as these are used in this volume to explain how to write a Perl module. Links to these courses are given below.
A network socket is an endpoint of an inter-program communication flow, across a computer network. Assume that you have two programs in two different computers in a network. For communication to take place between these two programs, there should be a channel. Data passes from one program through the channel, to the other program (and vice-versa). A socket is like a door through which data leaves a program, into the channel, and to the other program. So, there are two doors at either ends of the channel; that is, there are two sockets at either ends of the channel. On one side of a socket, you have the channel, and on the other side, you have a program.
The two programs do not need to have been written in the same computer language. The two can be in two different languages and that is the more reason why you need the two sockets and a channel. The two programs do not necessarily have to be in two different computers; they can still be in the same computer. However, they are often in two different computers.
The IP address is a numeric figure. Each computer in an Internet network is identified by an IP address. Examples of IP address are 172.16.254.1 and 1021:458:0:1234:0:567:8:1. There are two types of IP address. The first type has 4 numbers, separated by dots; the second type has 8 numbers separated by colons. The first example here, is of the first type, and the second example is of the second type. The first one is known as IP Version of 4, abbreviated, IPv4 and the second one is known as IP Version of 6, abbreviated, IPv6. IP stands for Internet Protocol.
For data to move from one computer to another, it has to know the IP address of the destination computer.
IP addresses are difficult to remember, so domain names such as yahoo.com or google.com, or localhost are used in place of IP addresses (one domain name per IP address).
When data reaches a computer because it knows the IP address of the computer, it then needs to know the program it should go to. Even if the data has to be saved in a disk in the destination computer, it first has to be received by a program before it can be saved. The program is identified by a number called a port number. The main program that handles a website (e.g. yahoo.com) is at port number, 80. Note: the computer that has yahoo.com has an IP address whose corresponding domain name is, yahoo.com. However, the program that handles the website is not the domain name, but its default port number for this is, 80.
You might have noticed that some websites are secure (transmission information are difficult to hack) and others are not. For the websites that are secure, the URL (website address) begins with https, while for those that are not secure, the URL begins with http. The default port for non-secure websites is 80, while the port for secure websites is 443. Most websites are not secure websites and so their port is 80.
Actually it is a web page that is secure and not the web site. Many web pages of the website of a financial institution such as a bank, are secure, and so you can consider the whole website of a bank as secure.
So, to reach a socket on either side of a channel, you need an IP address and a port. These two parameters on their own, identify the destination program. You also need a socket type and a Protocol – see below.
Socket Type and Protocol
For data to be transmitted through the Internet, it needs sets of rules to respect. One of such set of rules is called, UDP that stands for User Datagram Protocol. Another set of such rules is called, TCP that stands for Transmission Control Protocol.
UDP is normally used for transmission of data that is of live nature, such as television and radio information. You might have heard of streaming video. This is video that is either live or is expected to be seen like live. TCP is normally for data that is of non-live nature, like web page information and email.
Two socket types you should know are Datagram Sockets and Stream Sockets. Datagram sockets are for the UDP protocol and stream sockets are for the TCP protocol. Note, streaming video is normally for UDP and not TCP, even though “streaming video” and “stream sockets” have a resemblance in name.
I hope you now appreciate the fact that to use a socket, you need to type the destination IP address, a port number, a socket type and a protocol. In some cases if the protocol is typed, the socket type is assumed (as both go together).
For the communication channel, there are always two sockets. These are known as socket pairs. One socket may send and receive data; the other socket may also send and receive data. Each socket program can be identified by its IP address and port number. In the operating system of each computer, each socket is represented by an integer, which is not the IP address and not the port number. This integer is called a socket descriptor.
Local and Remote Sockets
Assume that there are two computers with two sockets. For the computer you are using, the socket is the local socket; for the computer at the other end, the socket is the remote socket. For the person at the other end, the socket is local socket and your socket is remote socket.
A Socket Address consists of the IP address and the port (number). A socket pair is characterised by a unique combibnation of the local socket address, a remote socket address and a protocol (e.g.TCP or UDP).
A socket address is the combination of the IP address and the port number.
Client – Server
The Internet is a network of computers. Consider a computer in the Internet that has a web page and people (users) in their different houses, can watch the web page in their home computers. The home computers are called clients or client computers. The computer that has the web page is called the server or server computer. Also the software (program) in the server computer, that sends the webpage (handles the web page) to the different home computers is still called the server. This software can still further be called, the web server.
The software programming library used to create a socket at the client can also be used to create a socket at the server. However, the client and server sockets have special features as follows:
In a client-server communication, the client (browser) socket is the one that normally connects to the server.
This is normally used at the server to associate a socket with a socket address (IP address and Port). As you will see later, a socket itself does not really have the socket address coded to it; that is why binding is necessary.
This is normally used at the server and places the server socket in a listening state to listen for connection from the client.
This is used at the server to accept a connection from the client.
This feature is for all sockets. It is used to create and automatically open a socket. A socket is like a door. It has to open to let data go into the channel and let data move from the channel and enter the same door. When a socket is open, system resources (memory) are allocated to it
This feature is for all sockets. It is used to close and automatically destroy a socket. A socket is like a door. It has to close to prevent data to go into the channel and prevent data to move from the channel and enter the same door. When a socket is closed, system resources (memory) allocated to it, are released.
A port is a construct which works with a socket to identify a program at one end of a socket channel. The IP address identifies the computer, the port identifies the program. A port is indicated by a number (integer). However, a port refers to a program (application). A program offers a service, so a port can be seen as a service (e.g. website service).
The Internet Assigned Numbers Authority (IANA) has decided on certain ranges of numbers for all the different possible services. There are three ranges: the well-known ports, the registered ports, and the dynamic (private) ports. The numbers for well-known ports range from 0 to 1023. Registered ports range from 1024 to 49151. The dynamic ports range from 49152 to 65535. The dynamic range is also called the pivate range or the ephemeral range.
Most websites are at port 80 which is in the well-known range. If you are doing an experiment on ports, you have to choose a port number from the private range 49152 to 65535. Any number choosen here is unofficial and can always get into conflict if another service has already choosen the number, in your system.
That is it for this part of the series. We stop here and continue in the next part.
Related LinksInternet Sockets and Perl
Perl pack and unpack Functions
Writing MySQL Protocol Packets in PurePerl
Developing a PurePerl MySQL API
Using the PurePerl MySQL API
More Related Links
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library