Home


Introduction to CGI programming for web servers

CGI programming is a set of techniques and specifications for allow servers on the World Wide Web to give a measure of interactivity. This class will be based around Perl as an implementation language, though that is not required for actually writing a CGI program. In addition, the class will be based on the assumption of using cgi.peak.org, a machine at Peak to support user written CGI applications, but in a restricted environment.

It is expected that the people who take this course have some familiarity with some programming language, and the basic ideas of how the web looks from the user point of view. It is also very useful to have a web server available to practice the techniques discussed here. This last requirement is not always an easy one to satisfy, as there are significant security issues involved.

This class is organized as two sessions. The first session attempts to discuss the issues involved in CGI programming and then work through sample programs. At the end of the class a number of suggested exercises are available for working through by the students. A second session is used so that the students can bring questions and problems to the instructor for resolution.

These course notes are available on the World Wide Web at http://www.peak.org/~regan/cgi/. The on-line version of these notes has the advantage of linking into the FAQs and other on-line references which are too bulky or would get out of date too quickly to print out.


Books and References

CGI programming is part of the World Wide Web, thus you would expect to find many references on the web. You will not be disappointed.

In addition, there are a couple of printed books which are very useful to have available.

There are also several newsgroups which discuss issues related to CGI programming and Perl.

	comp.infosystems.www.authoring.cgi  CGI
	comp.lang.perl.misc		    The Perl language in general.
	comp.lang.perl.announce		    Announcements about Perl. (Moderated)


What is CGI?

The standard model of web pages has the web browser requesting a page, and the server retrieves an appropriate file and delivers it to the browser. The Common Gateway Interface (CGI) is a specification on how general programs can be run on the server in response to a web request. This differs from Java which runs programs on the client machine. Ideally, programs written to the CGI specification will run on any server which has the appropriate runtime system. In addition, the CGI specification does not specify any specific implementation language. This course will focus on Perl as an implementation language, but other languages such as C/C++, shell scripts, AppleScript, and Visual Basic are all reasonable choices for writing CGI programs for the appropriate operating systems. This class focuses on Perl because it is available for most operating systems that run web servers, and is reasonably suited to the string processing tasks encountered in CGI programming, not to mention it is available on cgi.peak.org

CGI programs are started each time they are referenced. They are given certain information such as the machine name of the browser (among others). NCSA has information on these variables The CGI program also has the information the user entered into HTML forms.

Using this information, the CGI program can produce a custom HTML web page which the user then views. Everything comes down to standard HTML.


CGI Security

User written CGI programs are not available at all ISPs because of the special security issues involved. Some of these issues only affect the user who wrote or installed the program. If that was all that was at stake, ISPs wouldn't worry too much, just as they don't worry about user's HTML code. Unfortunately, there is much more at stake, and other user's files and system files on the CGI machine can be compromised on a standard web server. The following discussion hopefully explains things to watch for on a standard web server, and those to watch for on Peak's CGI machine. Most of the following discussion is oriented to Unix based web servers.

  • Who are you? A program running on a Unix computer runs as a specific user. The web server is no exception. Each user can read certain files and directories, and write to other files and directories. If the web server were to run as the user root then any problem in security would allow any file on the system to be examined and changed. Obviously that isn't done. If the user is a standard non-root user, then all CGI programs can read or write any file that any other CGI program can get to. This is the model that a standard web server works under. This means that the author and installer of a CGI program must trust all other people who can write and install CGI programs on that machine. If the user is set to the owner of the CGI program, then the CGI program can only get to the files and directories that you can yourself. Thus a security problem that you introduce can only affect you. This is the model that cgi.peak.org operates under.
  • What files can you get to? You can only get to the filesystems available to the web server machine, and then only to those files that the program has permission to get to. Peak deals with this issue by ensuring that cgi.peak.org has no filesystems mounted to other Peak machines. Thus none of your normal users files is in danger from a dangerous CGI program. It also means that even if the entire CGI machine is subverted, it will be hard to launch attacks on the rest of Peak's machines.
  • How do you get information to and from the CGI machine? There is a directory on the user machine called red which is mirrored on the CGI machine as red. Copies are made in both directions on a regular basis. You can also use FTP to move files to the CGI machine. There are very few login accounts on the CGI machine.
  • Security problem #1 The first security issue is a malicious CGI programmer. It is possible to write a simple CGI program which will execute any command on the CGI machine. Don't do it. This can give far too much information to potential vandals.
  • Form handling One of the most common ways that CGI programs are attacked are by passing unexpected data to your CGI program in form variables. While you may think that your CGI program will only be started because someone used your page which set up an HTML form, the bad guys can easily bypass that step and send you anything that they want. Some easy ways that this is abused are:
    • File handling If you use one of the items to build a filename, and you expect that item to be a simple name, then the vandal can pass in something like ../../../etc/passwd to see what will happen. Make sure that the filenames you construct are reasonable names.
    • System commands Sometimes you will use an existing command on the CGI machine with the system() function or using backquotes in Perl. If the user has control over the parameters passed, they can cause havoc. Consider a simple program which is supposed to do a finger of a web supplied user name:
      		print `finger $UserName`;
      
      Then consider what happens when the user supplies a username of fred; rm -rf /

For more information on security issues of web servers and CGI programming in general, please look at:

Please look at notes for Peak's CGI machine for more information on how Peak operates its CGI machine.


Example CGI programs

This section discusses several CGI programs.

First, some apologies are in order. These programs are mostly just hacks put together for fun, learning, or to accomplish a simple task. I did not necessarily do things The Right Way; I did them my way. In particular, it would have been better if I had used the warning features of Perl such as -w and use strict;. In addition, I should have used the CGI.pm library to do the routine work of web programming. The flip side is that these are fairly simple, self-contained programs.

cgi_echo.cgi is one of the first CGI programs I wrote, and it shows. It has very simple output. However, I still use it as a basis when starting on a new program because it contains all of the essential parts of a CGI program. The purpose of this program is to display all of the environment variables and the contents of the HTML form variables so that I know what I have to work with.

error_log.cgi is a trivial program I wrote for use on cgi.peak.org. It is basically the output of tail on the error_log file. It is very useful to see what is at the end of the web server's error log file. However, as most people do not have access to this file directly, this program makes that information available. As this example shows, CGI programs don't have to be large, especially if the work is done in other programs.

lunch.cgi is a program which I wrote on a whim. A number of people from work go out to eat together once a week. We can never remember where we ate last, and it is always a last minute decision. This voting program is a simple program with no security which allows each person to vote for where they want to eat sometime before we go out so that a consensus is reached over time rather than at the last minute.

People have different ideas of what they want for appearances. The people who make pleasing HTML are often not the people who write the CGI programs. So I have pulled a large portion of the boilerplate HTML into text files which can be edited just like standard HTML files with a few additional commands which the CGI program uses to know where to fill in the details that are different from person to person. This is done with a program called webc. This approach is overkill for this little application, but was convenient for what I wanted to do. Some of the associated files for this are shown below. They really aren't important except to show the linkage between Webc and the application.

You can try the lunch program.

The above example shows a CGI program which shows the interaction of HTML forms, minimal security, Environment variables to get identity, and file access.


Assignments

This next section lists several possible exercises for you to do. Some of these examples have suggestions to help you on your way. In some cases, the suggestions go all the way to coding. Only look at these suggestions when you run into roadblocks. Don't feel compelled to do these assignments if you have a project you want to do.

Write a CGI program which generates a random file from a specified directory. The name of the directory is specified in the URL like:

	http://cgi.peak.org/~regan/randimg.cgi?~regan/Images
This can be used in a <img src="..."> HTML expression.

Take care to restrict which directories can have files delivered in this fashion. One way is to demand that a file of a particular name exists in the directory with the file requested.

This requires that the graphics image is sent directly as opposed to a URL.

Hints

Write a CGI program which takes a test file which looks something like:

	Header	Example test
	Q	What is the square root of 16?
	A	1:256 *2:4 3:16 4:59

	Q	Approximately how many people live in Corvallis OR?
	A	1:5,000 2:10,000 *3:45,000 4:100,000
The Header lines render an appropriate header in the HTML. The Q lines gives the question text. The A lines gives the possible answers. The correct answer is preceded by a *.

The name of the test file is specified in the URL like:

	http://cgi.peak.org/~regan/webtest.cgi?test1
Once the user fills in the test, write a new page indicating how the user did, and mail results back to an appropriate email address. It will be appropriate to ask the person taking the test to identify themself.

Hints


If you have any questions about this course, feel free to drop me a message at regan@peak.org and I'll get back to you.

                Dave Regan
		regan@peak.org

PEAK


Last modified 27 May 2006
Dave Regan
http://www.peak.org/~regan/
Resume / Biography