Broad Network


DOM Tree

DOM Basics for HTML – Part 1

DOM for HTML

Foreword: In this part of the series I introduce DOM and I explain what is known as the DOM Tree or Node Tree. DOM stands for “Document Object Model”.

By: Chrysanthus Date Published: 13 May 2015

Introduction

This is part 1 of my series, DOM Basics for HTML, of the volume, DOM for HTML. In this part of the series I introduce DOM and I explain what is known as the DOM Tree or Node Tree. DOM stands for “Document Object Model”. ECMAScript enables you to make a website interactive. However, it has its limits. It cannot, on its own, be used to create an HTML document or to change the values of html element attributes.

DOM is one of those types of software, called Application Programming Interface. DOM is part of your browser. DOM is considered as part of HTML today. It is a set of software objects in the computer’s memory, which represent the html elements displayed on a web page. A software object is a region (of consecutive cells) in memory that holds a programming entity. These objects have attributes and methods that can be used to modify the html elements displayed on the screen. Attributes are like variables. Methods are like functions. As indicated above, you can even use DOM to create a new HTML document. ECMAScript is used to program DOM.

Pre-Knowledge
This series is part of a program called, Major in Website Design. You should click the link tilted, “Major in Website Design” below, to know what you should have studied before reaching here.

A Basic HTML Document
A basic HTML document looks like this:

<!DOCTYPE html>
<html>
<head>
  <title>Simple Page</title>
</head>
<body>
  <h1>Simple Page</h1>
  <p>This is a <a href="demo.htm">simple</a> sample.</p>
  <!-- this is a comment -->
</body>
</html>

Elements
In the above code, html is an element. It may not look like it, but it is an element. It has two other elements, which are the head, and body elements. The head element has the title element. The body element (above) has an h1 element, a paragraph element and a comment element. The paragraph element has a hyperlink element (the a element). These are all the elements in the above document code.

The root element of a document is always the HTML element.

Nodes
You can use ECMAScript to create a software object in the computer’s memory. If that object represents an HTML element, then it is called a node. Nodes are normally created by the browser to represent elements in a document.

Not all nodes have corresponding elements. The document itself is a node, but it does not have a corresponding element. However, a document should have the html element. The HTML element should have the head and the body elements. It is recommended that the head element has at least the title element. The body element might be empty, but that does not produce a practical document.

Note: An attribute of a node is a variable in the node. A method of a node is a function in the node.

DOM Tree
The above basic document can be described by the following tree:

|_DOCTYPE: html
|
|_html
    |
    |_head
    |   |
    |   |_text: \n sp sp
    |   |
    |   |_title
    |   |   |
    |   |   |_text: Simple Page
    |   |
    |   |_text: \n sp
    |
    |_text: \n sp
    |
    |_body
        |
        |_text: \n sp sp
        |
        |_h1
        |   |
        |   |_text: Simple Page
        |   
        |_text: \n sp sp
        |
        |_p
        |   |
        |   |_text: This is a
        |   |
        |   |_a href="demo.htm"
        |   |   |
        |   |   |_text: simple
        |   |
        |   |_text:  sample.
        |
        |_text: \n sp sp
        |
        |_comment: this is a comment
        |
        |_text: \n sp \n

Remember, a node is a software object in memory; an element is what you see (has an effect) on the screen. Any element has an equivalent node in memory; but not all nodes have equivalent elements on screen. An element has one or two html tags.

The tree begins with the DOCTYPE node. The DOCTYPE node (object in memory) represents the DOCTYPE tag, which indicates the type of document we are dealing with. The type of document is an html document. The DOCTYPE node is not the document node. A document should have the html element, which should have the head element and the body element. However, the document node is not the DOCTYPE node. The document node is a software object in memory, but the document node is not represented in the tree.

Unlike a vegetation tree whose root is at the bottom and it grows upward, the DOM tree has its root at the top and it grows downward. The tree begins with the DOCTYPE node. This grows down to the html node that represents the double-tag html element.  The html node has a growth downward which branches into two. One branch leads to the head node and the other goes to the body node. There is a text node before the body node; I will talk about such text nodes soon. The head node is for the double-tag head element and the body node is for the double-tag body element.

Recall: this tree is for the previous document code above, with the title, Simple Page. The head node has a growth downward which branches to the title node. Again a node is a software object in memory and an element is what you see or has an effect on the screen (web page). The title element has as content or data, Simple Page. In the memory, this data is a text node (a software object). In the tree, the title node has a growth, which leads to the text node with data, “Simple Page”.

Note: Software object is simply referred to as object.

In the tree, the body node has a growth, which branches to the h1 node, for the h1 element. The h1 node has a growth, which leads to a text node, that has the data, Simple Page. Both the title node and the h1 node, each has a child node, which is a text node. Each of these text nodes has the same data, which is, “Simple Page”.

The growth of the body node has a branch to the p node, for the paragraph element. The paragraph element tags is:

  <p>This is a <a href="demo.htm">simple</a> sample.</p>

The content of this paragraph element has another element, which is the a element. The content of the a element is the text, simple. This text is represented in memory by a text node whose data is, simple. The a element itself, has a corresponding a node, whose child is the text node, whose data is, simple. In the paragraph, the phrase, “This is a ” has a corresponding text node whose data is, This is a . The phrase, “ sample.” at the end of the paragraph has a corresponding text node, whose data is, “ sample.”. Note that “This is a ” and “ sample.” in the paragraph, do not have the same text node, because they are separated by the a element.

The p node has a growth, which branches to a text node whose data is, “This is a ”; it also branches to the a node; and it also branches to a text node whose data is, “ sample.”. The a node has a growth, which leads to a text node whose data is “simple”.

The growth of the body node has a branch to a comment node, for the comment element. The data for the comment node is “this is a comment”.

Note: When writing a text node or a comment node in the tree, you have to indicate the text data.

Also note: When writing any node in the tree, if the node has attributes, you have to indicate the attributes in the tree, as with href="demo.htm" for the a node, above.

Text Node
There are two kinds of text nodes: the text node which is content of an element, such as content for the a element, or text node which is a sequence of newline characters and keyboard spacebar characters. I have talked about the former, above. The latter type occurs as follows: As the author (programmer) types the code for the HTML document, at the end of a tag, he can press the keyboard Enter key and so introducing the newline character. He may then press the spacebar key for indentation before typing the next element or inner element, and so introducing the space characters. In the tree, the newline character is indicated by \n and the space character is indicated by sp. Do not confuse between \n and <br>: <br> is for the web page display and \n is for the typed code. \n is not displayed by the browser, while <br> is displayed.

In the tree, you have text nodes such as \n sp sp, \n sp and \n sp \n . To explain their presence or the presence of any node in the tree, you have to compare the document code above and the tree diagram. While explaining the presence of these second type of text nodes, you have to ignore the presence of the end tags of elements. I leave that explanation as an exercise for you.

The second type of text nodes is called whitespace. Note: whitespace before the head element start tag in the document code, is dropped by the browser and not included in the DOM (tree). Whitespace after the body element end tag in the document code, is placed at the end of the body element; this is done by the browser. Note: whitespace is not displayed by the browser, but they are present.

In this part of the series you have got the introduction of what DOM is; you should now know what the DOM tree (Node tree) is and how to compare it to the document code. The DOM tree consists of nodes and the document code consists of elements. All elements have corresponding nodes in memory but all nodes do not have corresponding elements in the document code.

That is it for this part of the series. We take a break here and continue in the next part with the relationship between nodes and the different types of nodes.

Chrys

Related Links

DOM Basics for HTML
DOM Event Basics for HTML
HTML Text and Other Elements in DOM
HTML Grouping and Sectioning Content Elements in DOM
DOM and HTML Embedded Content
HTML Canvas 2D Context
More Related Links
PurePerl MySQL API
Major in Website Design
Web Development Course
Producing a Pure Perl Library

NEXT

Comments

Become the Writer's Fan
Send the Writer a Message