Quick Introduction to HTML
Automatic Generation of Content
Many of the pages you see on the World Wide Web do not correspond to HTML files on a hard disk. They are automatically generated. For instance, search results pages are all automatically generated.
Common uses of automatic generation are to:
- respond to an HTML form submission
- include content looked up from a database
- tailor a page for the user’s preferred language or operating system, or other preferences stored in a cookie.
This page will not teach you how to generate web content automatically (which involves bona fide computer programming). It is intended to familiarize you with some of the concepts and technologies involved.
Since automatic content generation is performed on the server computer, two issues that arise are security and resource usage on the server.
The oldest way to generate a web page automatically is with CGI, for Common Gateway Interface. This is a very simple approach to automatically generated web content, but it has its limitations.
When a user accesses the CGI program (either by submitting a web form or accessing the script directly by HTTP), the server starts a separate process for the CGI program. This program takes as its input a standard representation of the content of a form, and produces output that is the HTML of the web page.
CGI programs can be written in any programming language. The scripting languages Perl, Python and VBScript are popular because they have good text handling functions and are relatively easy to use. For this reason, one often hears the term “CGI script” to refer to a generic CGI program.
One limitation with CGI is that the program is written in some programming language (not HTML!). It requires a person with some understanding of the language to make any changes to the page.
Another serious practical limitation has to do with system resources: the server actually launches a separate program each time a user accesses a CGI program, so if a lot of users start using CGI at once, the server computer can quickly become overloaded.
Finally, CGI puts no restriction on the function of the CGI program: it could in principle do anything on the server computer. Unless you have some special relation with your ISP, a responsible system administrator is unlikely to permit you to put your own CGI script on their computer. It is just too great of a security risk.
There are packages composed of sets of tested CGI programs that perform common sorts of chores, such as mailing you when someone fills out a form in your web page. Your ISP may well have such a package installed.
There are several technologies, also known as server-side scripting, which generate web pages on the server computer. These include PHP, ASP, CFM, and JSP.
As opposed to CGI, which completely generates the page HTML, pages made by these technologies can start with a web page mostly written in HTML, containing embedded code written in the scripting language. When a user accesses the page, the server interprets and executes the embedded code to generate more HTML in the page.
(Note the difference between server-side scripting and browser scripting: code of a server-side script is executed by the web server on the server computer, whereas code of a browser script is executed by the web browser on the user’s computer.)
Since the bulk of the page may be written in simple HTML, a web site administrator should be able to modify them easily.
Since the scripts are actually executed by the server, the “one program launch per user” issue of CGI scripts does not occur.
This makes a lot of sense if your page is mostly static, and you just want to look up a small part of the content in a database.
Here are descriptions of some of the more popular server-side scripting technologies.
PHP (PHP Hypertext Preprocessor)
PHP is an open-source scripting language that has become the post popular dynamic content technology on the web. The fact that it’s completely free doesn’t hurt.
The language looks very much like the popular programming language “C”, so most computer programmers have little trouble learning it.
ASP (Active Server Pages)
Microsoft’s ASP is a “framework” with multiple-language support: at least VBScript and JScript may be used for scripting. However, at the time of this writing, almost all ASP pages are written in VBScript.
This is a Microsoft-only technology: really, it runs only on Microsoft servers. Recent versions are packaged as part of Microsoft’s “.Net” framework. You have to buy it.
For several years, ASP was the most popular dynamic content technology on the web. The ascendancy of PHP, together with the association of ASP with some very poor Microsoft web servers, has caused its popularity to slip.
CFM (ColdFusion Markup)
Adobe (prev. Macromedia) ColdFusion is a rapid page development environment. Its proprietary scripting language CFM is tag-based, meaning that it looks rather like HTML.
This is meant to be easy for a non-programmer to set up. The author cannot confirm to what degree it succeeds. However, this solution costs money.
JSP (JavaServer Pages)
JSP is the Java-based contender. Somebody has to be proficient with Java to use this system. Also, the server has to be running a Java Virtual Machine. It can all be set up for free, though.
In JSP, the individual programs that build web pages (called servlets) typically run in a single Java virtual machine on the server machine. This can be much more efficient than running individual programs in the server’s operating system.
JSP is said to be the most resource-efficient solution of its class. Unlike some of the other technologies, it is platform-independent, so you can easily move your site to a different machine or operating system.
You might consider this for a heavily-used site, or if you are already involved with Java products.
Most web servers provide server-side includes, (SSI) which is a means of including a fragment of HTML or text within another HTML document.
This is useful for sites which must present an identical fragment of HTML on many pages of the site, for example, a directory listing or a copyright notice. To change the listing or notice throughout the site, only the file containing the fragment needs to be changed.
Server-side includes is a very limited technology, but it is much simpler than server-side scripting. No programming is involved—one simply puts at the desired point in the HTML document a command to include a certain file. It is relatively safe, so a site administrator who would not give clients access to server-side scripting might consider providing server-side includes.
The applications of such an approach are rather limited. In particular, it will work only with browsers that support the scripting technology being used. A more serious limitation is that, a browser script doesn’t have access to most of the resources on the server, such as databases.
An advantage to a browser script approach is that such pages can be placed on an ISP’s web site where you do not have permission to run other dynamic content technologies. It also takes up no more resources on the server than a simple web page.
One such technique is to have a script in the page determine something about the user, then fetch appropriate content from the ISP site, then display the information.
For example, a browser script can determine the user’s preferred human language. Then it can fetch a file containing text appropriate for that language from the same web site as the page, and then generate the web page using that text.
Most users expect their web browsing activity to be anonymous. Conversely, the identity of a site visitor should not be the business of a public web site’s owner. However, this anonymity causes a usage problem.
The generic problem arises when the user makes a choice on one page of a site, that is to affect later behavior of the site. This is what HTTP Cookies are for: with them, the site can save such information for later retrieval.
Cookies are part of the HTTP standard, and are passed with other information about the web page in the HTTP headers. They may be stored by the user’s browser, on the user’s computer. They are used by the server to determine how to alter the content of web pages based on previous actions of the user at the site.
A typical example is a login-page. A user logs in on one page at a site. In order for the site to later know that the user is logged in, the site sends a cookie to the user’s browser, which stores it with the information of which site sent it. When the user then browses other pages at the same site, the site requests the cookie from the browser, which delivers the cookie back to the site. A script at the site can then produce pages appropriate for a user who is logged in.
A browser will return a cookie only to the site that originally sent it, so the information is safe from other sites. Also, a cookie only contains information from the site that sent it, not information from the user’s computer, so all the site can learn from cookies is how the user has interacted with the site previously. Cookies also expire after a certain amount of time, and are then deleted from the user’s computer.
As with all other technologies, you shouldn’t assume cookies are available unless you have a specific reason for them (for example, your site requires a login). Be aware that most browsers permit cookies to be turned off, and that many users choose to use this feature.
It should be noted that some early browsers had bugs in their cookie mechanism that permitted bad people to get at the user’s computer. We hope that all such problems have long since been solved.