Quick Introduction to HTML
What is HTML?
-
HyperText Markup Language
-
The language in which World Wide Web pages are written
When you look at a web page with a web browser such as Internet Explorer or Firefox, you are looking at HTML being rendered (displayed graphically) by the browser.
-
A page formatting language
This is the “markup” part—text is marked up to have various formatting, such as bold, italic, etc.
-
A language that provides for references in one document to other documents
This is the “hypertext” part—the document can contain links to other documents.
Why learn HTML?
Many web page authors build their web sites using commercial software packages, called web site builders. These packages shield the author from the HTML code they produce. This is a good thing, by and large, allowing authors to produce very elaborate and beautiful pages conveniently. So why bother with HTML?
Web page builders ultimately produce HTML. To make full use of a web page builder, and to understand its limitations, you should understand the HTML that it produces.
Often, you may want to alter a site, but don’t have access to the software with which it was produced. In this case, some knowledge of HTML is indespensible.
HTML files
An HTML file...
-
is a plain text file
Can be edited with any text editor, such as Notepad on Windows, Alpha on the MacOS, or vi on unix.
-
has a name that ends in .html by convention (or .htm on DOS based systems)
-
can be viewed using your web browser
From the “File” menu, choose “Open...”, then find the file on your hard disk.
-
contains text and HTML “tags” that describe how the text is to be displayed
The rest of this document is about these tags.
Tags and attributes
All commands in HTML are tags, which consist of a special
keyword between angle brackets. For instance, the tag
<em>
indicates to emphasize the following text.
Most HTML tags require an end tag which looks just like the
tag except the keyword starts with a slash, like this one:
</em>
which indicates to stop emphasizing the text.
So the HTML code
This is <em>emphasized</em> text.
is rendered in a web browser as
A few tags, notably line break <br>, horizontal rule
<hr> and <img> for images, are
self-terminating, meaning they don’t take an end tag. The
preferred way to code them is to end the tag with a slash, like
so:
<br />, <hr />, <img.../> .
Any tag can also contain attributes, which provide further description for the element.
For example, the tag img which loads an image file into
the document, requires a path to the image file, specified by
the src attribute, and can take an alt
attribute that gives an alternative textual version of the image.
Such a tag might look like this:
<img src="/path-to-imagefile"
alt="my image description" />
The attribute consists of attribute name, followed by an equals sign, then by the value, which is a character string in quotes.
Comments
There is a provision in HTML for code comments, which is text you can put in the code but does not show up in the web page.
You can use comments to make a memo to yourself about why you coded something in some particular way, or to “comment out” a chunk of HTML code that you want to disable temporarily.
Any text between the “begin comment” marker
<!--
and the “end comment” marker
-->
is completely ignored by web browsers.
For instance, if your HTML document contains the code
<!-- This is just a comment. -->
the browser will not display the code.
Document structure
All HTML documents should contain at least the html
start and end tags, to indicate where the HTML begins and ends.
These html tags should enclose a single
head block, followed by a single body block:
<html><head>The title and other coding information goes in this head block
</head><body>The part of your document that will show in the browser goes in this body block
</body></html>Most of the content of the web page goes in the body block. Some information about the web page goes in the head block.
The title
The only head block tag we will discuss here is
<title>.
Typically, a web browser will display the title of the document in
the top of the browser window.
Also, in browsers that have a “bookmarks” or “favorites” list, the
title is the name given to the document in the list.
For example, if you put the code
<title>My Web Page</title>
in the head block of your HTML document, the browser should
put the text “My Web Page” at the top of the window when you
look at your web page.
Text
The information content of a typical HTML document is its text. Any text in your document should be enclosed in some kind of tag.
In most tags, white space such as consecutive space
characters, tab characters, and line endings, are replaced by a
single space.
Line-breaking within a tag is handled automatically by the browser.
(The <pre> tag is the one exception; see below.)
To specify how the page text appears in the web browser, you must
use HTML tags or styles.
To make a vertical break between sections of text, such as paragraphs or section headers, put the sections between separate pairs of block tags. There are several types of block tags.
To specify the formatting of the text (the font size, shape, etc) within a single line, use inline markup tags.
Paragraphs and lines
The simplest block tag is the paragraph tag,
<p>which delimits paragraphs of text. Usually text in two separate paragraphs is rendered with extra vertical space between the paragraphs. So this code<p>
My first paragraph.
</p>
<p>
My second paragraph.
</p>
is rendered
My first paragraph.
My second paragraph.
To force a line break within a paragraph, use the line break tag,
<br />(note that the line break tag is self-terminating, so it doesn’t require an end tag).The one exception to the rule about white space is the case of text between the “pre-formatted” tags
<pre>, within which the white space of the text is rendered as-is. For example, the code<pre>
line 1
line 2
</pre>
is rendered
line 1 line 2
Headings
There are six levels of headings, with tags
<h1>through<h6>. Several such headings are used in this document to label the sections and subsections. The<h1>tag is usually reserved for the document title at the top of the page, and higher-numbered headings are used for the titles of different levels of subsections of the document.It would mess up the formatting of this document to actually render an example heading here. But the code might look like:
<h1>My Document Heading </h1>Each heading forms its own vertical block, and is usually rendered in a large font, or boldface or italic, according to the depth of its subsection.
Typically, a single
<h1>heading is used at the top of a web page to display its title (so the<title>and the<h1>heading should have similar content).Content markup
The following tags are used to mark up inline text. The browser is free to format such markup as it chooses.
<em>- emphasis.
<strong>- stronger emphasis.
<cite>- a citation or a reference to other sources.
<dfn>- definition of a term.
<code>- a fragment of
computer code. <samp>- sample output from programs, scripts, etc.
<kbd>- text to be entered on a keyboard.
<var>- a computer program code variable or argument.
<abbr>- abbreviation (e.g., WWW, HTTP, URI, Mass., etc.).
<acronym>- acronym (e.g., WAC, radar, etc.).
<q>- did somebody say:
quote
? (Note: this tag is ignored by some browsers.) <blockquote>This is a whole passage inset from the text.
Formatting markup
<i>- italic text style.
<b>- bold text weight.
<tt>- teletype or monospaced text family.
<sub>- subscript.
<sup>- superscript.
<big>- bigger font size.
<small>- smaller font size.
Special characters
You can type most English text directly into HTML, and it will be rendered by a browser as it is typed. However, there are a few exceptions. Then there is the question of how to handle letters and symbols that aren’t in the English alphabet.
HTML defines a large number of special character entities. They are coded with an ampersand (“
&”) followed by a keyword, then a semicolon (“;”). We list some of the most common here.code char description && ampersand << less than >> greater than non-breaking space ©© copyright sign ®® registered sign ™™ trademark sign ¢¢ cents ££ pounds ¥¥ yen €€ euro °° degrees Notice that the first three are necessary in HTML code because the ampersand is used to code character entities, and the angle brackets are used to code tags.
Many letters from Western European alphabets are also among the HTML character entities. For a complete list, see HTML 4 Character Entities.
Note that, while it is acceptable to include any particular character in a document this way, you should not not use the character entities for typing text, say, French or Greek. That is not what they are for. This is a job for a character encoding. By specifying the document’s encoding, one can accommodate (almost) any of the world’s writing systems.
Careful! A few punctuation marks used in English often cause problems. The most common are the “curley quotes” and “dashes”:
‘‘ left single quote ’’ right single quote; apostrophe ““ left double quote ”” right double quote —— “m” (long) dash –– “n” (short) dash You probably don’t want to use HTML character entities to make curley quotes. But if you type these characters directly into your document, be sure that the character encoding makes it plain how they are to be interpreted. See Internationalizaton: Curley quotes and dashes.
Lists
HTML is especially well suited for outline-style documentation, such as this document. It provides several flexible types of lists, which may be nested.
Ordered lists
Numbered lists can be coded using the ordered list tag
<ol>. Items in the list are indicated with the list item<li>tag.The numbering style can be specified with the
typeattribute, which can take values"1"indicating to number with Arabic numbers,"a"indicating lowercase alphabetic style,"A"indicating uppercase alphabetic style,"i"indicating lowercase Roman numerals,"I"indicating uppercase Roman numerals.For example, the code
<ol type="I">
<li>first</li>
<li>second</li>
<li>third</li>
</ol>is rendered as
- first
- second
- third
Unordered lists
Unordered lists, whose items optionally marked by bullets, can be coded using the unordered list tag
<ul>. Items in the list are indicated with the list item<li>tag.The style of bullets is specified with the
typeattribute, which can take values"disc","circle","square", or"none",For example, the code
<ul type="disc">
<li>first</li>
<li>second</li>
<li>third</li>
</ul>is rendered as
- first
- second
- third
Definition lists
A list of items, with definitions, can be coded with the definition list tag
<dl>. The items are indicated with the definition term<dt>tag. The definition of each item follows the definition description<dd>tag.For example, the code
<dl>
<dt>first</dt>
<dd>first definition</dd>
<dt>second</dt>
<dd>second definition</dd>
</dl>is rendered as
- first
- first definition
- second
- second definition
Tables
Tables are one of the most powerful things in HTML. Beyond allowing for neatly organized table data, they are the only means in pure HTML to arrange text into columns.
The data in the table is placed between <table>
start and end tags. Each row of the table is indicated by a table
row <tr> tag. Each item in a row is indicated
by either a table data <td> or a table heading
<th> tag.
The border of the table is controlled by the border
attribute. The default value of "0" indicates no
border. Other values are typically interpreted as the width of
the border line.
Both the <table> and <td> tags
take a width attribute, which can be a percentage, such
as width="100%", or a number of pixels. A percentage
here refers to the width of the entity that contains the table.
For example this code
<table border="1">
<tr><th>header one</th><th>header two</th></tr>
<tr><td>data one</td><td>data two</td></tr>
<tr><td>data three</td><td>data four</td></tr>
</table>
is rendered as
| header one | header two |
|---|---|
| data one | data two |
| data three | data four |
The spacing between table cell contents and its border can be
specified with the cellpadding attribute of the
<table> tag, and the space between cells in the table
can be specified with the cellspacing attribute.
The value of both these tags is a number, which is rendered as
a number of pixels.
By default, the contents of a cell are centered vertically on the
cell. The valign attribute of the
<td> tag can force the contents to be aligned to
the top or bottom of the cell with the top and
bottom values, respectively, or force the first line of
text of the cell to align with that of the other cells with the
baseline attribute.
<table border="1">
<tr><td width="10">several lines of text</td>
<td width="10" valign="baseline">fewer lines</td></tr>
</table>
is rendered as
| several lines of text | fewer lines |
Without the valign="baseline" attribute, it is rendered as
| several lines of text | fewer lines |
A single cell in a table can be made to cross multiple rows and
columns by specifying its rowspan and colspan
attributes.
For more thorough information on HTML tables, see Quick HTML—Tables.
Horizontal Rule
The horizontal rule tag makes a horizontal line across the page.
It is self-terminating, so is coded as <hr />.
There are tricks with <table> and
horizontal rule that can make shorter horizontal lines.
For example, the code
<table width="70%" border="0" cellspacing="0" cellpadding="0">
<tr><td width="10%"></td><td width="80%"><hr /></td></tr>
</table>
is rendered as
URL’s—Uniform Resource Locators
Web pages are best when they contain references to other documents on the Internet, such as pictures and other web pages. A picture in a web page is a separate file from the HTML file, and when you click on a link in a web page, that might take you to another HTML file altogether.
All outside documents are referred to within a HTML document in the same way, using a URL, or Uniform Resource Locator. A URL indicates the means by which to get a file, which Internet site the file is on, and a path to that file.
A typical URL looks like this:
protocol://server-address/dir1/dir2/file-name
The protocol is the network communications
protocol to use to get the file. It is often http, which
stands for “hypertext transfer protocol”, which is the main protocol
of the World Wide Web. The server-address is
the Internet address of the server the file is on, for instance
“www.w3.org”. The rest of the URL is the directory path and filename
of the file.
The protocol and server-address can be left off to indicate that
the file is on the same server as the current web page. If the
URL begins with a slash, the directory path is assumed to be
relative to the default directory of the server:
/dir1/dir2/file-name
If the URL doesn’t begin with a slash, the directory path is assumed
to be relative to the current HTML document
dir2/file-name
Images
An image can be included in a web page by means of the image tag
<img>. The image itself is contained in a
separate file, usually either a GIF or a JPEG file. Generally,
code like
<img src="URL-to-imagefile"
alt="my image description" />
where URL-to-imagefile describes the
location of the image file to be displayed.
The size of the image isn’t known until the image is successfully
loaded. You can leave space in the page for the image by specifying
the width and height attributes of the
image tag.
If your image contains any information that might be lost if the
image doesn’t load, you can specify this in the text of the
alt attribute. This text is also displayed by
text-only web browsers, such as lynx.
Links
A web page can contain links to other web pages, and also links to other parts of the same page. These links put the hyper into hypertext.
To make a link to another web page, use the anchor tag
<a> with the href attribute specifying
the URL of the other page.
For example, the code
<a href="http://www.w3.org/">
The W3 Organization</a>
is rendered as
If you click on it with your mouse, the browser will take you to the
web page specified by the href attribute value,
"http://www.w3.org/".
For more information on URL’s, see “What is a URL?” We will show a few simple cases here.
The path can be a complete URL, like
http://server-name/dir1/dir2/other.html
or a path relative to the current HTML document
dir2/other.html
or a path relative to the root directory of the server of
the current page
/dir1/dir2/other.html
To make a link to another part of the current page, use the
anchor tag with the id attribute to label a place
to go in the document. For instance, I put the code
<a id="WHAT"></a>
at the beginning of this document. The following code
<a href="#WHAT">Back
to -->What is HTML</a>
is rendered as
Click on it to see what it does.