Introduction to CGI programming for web servers
CGI programming is a set of techniques and specifications for
allow servers on the World Wide Web to give a measure of interactivity.
This class will be based around
Perl
as an implementation language, though that is not required for
actually writing a CGI program.
In addition, the class will be based on the assumption of using
cgi.peak.org, a machine at Peak to support user written
CGI applications, but in a restricted environment.
It is expected that the people who take this course have
some familiarity with some programming language, and the
basic ideas of how the web looks from the user point of view.
It is also very useful to have a web server available
to practice the techniques discussed here. This last requirement
is not always an easy one to satisfy, as there are significant
security issues involved.
This class is organized as two sessions. The first session attempts
to discuss the issues involved in CGI programming and then work
through sample programs. At the end of the class a number of
suggested exercises are available for working through by the
students. A second session is used so that the students can
bring questions and problems to the instructor for resolution.
These course notes are available on the World Wide Web at
http://www.peak.org/~regan/cgi/.
The on-line version of these notes has the advantage of linking
into the FAQs and other on-line references which are too bulky
or would get out of date too quickly to print out.
Books and References
CGI programming is part of the World Wide Web, thus you
would expect to find many
references
on the web. You will not be disappointed.
In addition, there are a couple of
printed books which are very
useful to have available.
There are also several newsgroups which discuss issues
related to CGI programming and Perl.
comp.infosystems.www.authoring.cgi CGI
comp.lang.perl.misc The Perl language in general.
comp.lang.perl.announce Announcements about Perl. (Moderated)
What is CGI?
The standard model of web pages has the web browser requesting
a page, and the server retrieves an appropriate file and delivers
it to the browser. The Common Gateway Interface (CGI) is a specification
on how general programs can be run on the server in response to a
web request. This differs from Java which runs programs
on the client machine. Ideally, programs written to the CGI
specification will run on any server which has the appropriate
runtime system. In addition, the CGI specification does not
specify any specific implementation language. This course
will focus on Perl as an implementation language, but
other languages such as C/C++, shell scripts, AppleScript,
and Visual Basic are all reasonable choices for writing CGI
programs for the appropriate operating systems. This class focuses
on Perl because it is available for most operating systems
that run web servers, and is reasonably suited to the string processing
tasks encountered in CGI programming, not to mention it is available
on cgi.peak.org
CGI programs are started each time they are referenced.
They are given certain information such as the machine name
of the browser
(among others).
NCSA has information on these
variables
The CGI program also has the information the user entered into HTML forms.
Using this information, the CGI program can produce a custom
HTML web page which the user then views.
Everything comes down to standard HTML.
CGI Security
User written CGI programs are not available at all ISPs because
of the special security issues involved. Some of these issues
only affect the user who wrote or installed the program.
If that was all that was at stake, ISPs wouldn't worry too much,
just as they don't worry about user's HTML code. Unfortunately,
there is much more at stake, and other user's files and system files
on the CGI machine can be compromised on a standard web server.
The following discussion hopefully explains things to watch
for on a standard web server, and those to watch for on Peak's
CGI machine. Most of the following discussion is oriented
to Unix based web servers.
- Who are you?
A program running on a Unix computer runs as a specific user.
The web server is no exception. Each user can read certain
files and directories, and write to other files and directories.
If the web server were to run as the user root then
any problem in security would allow any file on the system
to be examined and changed. Obviously that isn't done.
If the user is a standard non-root user, then all CGI programs
can read or write any file that any other CGI program can
get to. This is the model that a standard web server works
under. This means that the author and installer of a CGI program
must trust all other people who can write and install CGI programs
on that machine.
If the user is set to the owner of the CGI program, then the
CGI program can only get to the files and directories that you
can yourself. Thus a security problem that you introduce
can only affect you. This is the model that cgi.peak.org
operates under.
- What files can you get to?
You can only get to the filesystems available to the web server
machine, and then only to those files that the program has permission
to get to. Peak deals with this issue by ensuring that cgi.peak.org
has no filesystems mounted to other Peak machines. Thus none
of your normal users files is in danger from a dangerous CGI program.
It also means that even if the entire CGI machine is subverted,
it will be hard to launch attacks on the rest of Peak's machines.
- How do you get information to and from the CGI machine?
There is a directory on the user machine called red
which is mirrored on the CGI machine as red. Copies
are made in both directions on a regular basis. You can also
use FTP to move files to the CGI machine. There are very few
login accounts on the CGI machine.
- Security problem #1
The first security issue is a malicious CGI programmer.
It is possible to write a simple CGI program which will execute
any command on the CGI machine. Don't do it.
This can give far too much information to potential vandals.
- Form handling
One of the most common ways that CGI programs are attacked are by
passing unexpected data to your CGI program in form variables.
While you may think that your CGI program will only be started
because someone used your page which set up an HTML form, the
bad guys can easily bypass that step and send you anything that
they want. Some easy ways that this is abused are:
For more information on security issues of web servers and CGI programming
in general, please look at:
Please look at notes for Peak's CGI machine
for more information on how Peak operates its CGI machine.
Example CGI programs
This section discusses several CGI programs.
First, some apologies are in order. These programs are mostly
just hacks put together for fun, learning, or to accomplish a
simple task. I did not necessarily do things The Right Way;
I did them my way. In particular, it would have been better if
I had used the warning features of Perl such as -w and
use strict;. In addition, I should have used the
CGI.pm
library to do the routine work of web programming.
The flip side is that these are fairly simple, self-contained programs.
cgi_echo.cgi
is one of the first CGI programs I wrote, and it shows.
It has very simple output.
However, I still use it as a basis when starting on a new program
because it contains all of the essential parts of a CGI program.
The purpose of this program is to display all of the environment
variables and the contents
of the HTML form variables so that I know what I have to work with.
error_log.cgi
is a trivial program I wrote for use on cgi.peak.org.
It is basically
the output of tail
on the error_log file.
It is very useful to see what is at the end of the web server's
error log file. However, as most people do not have access to this
file directly, this program makes that information available.
As this example shows, CGI programs don't have to be large,
especially if the work is done in other programs.
lunch.cgi
is a program which I wrote on a whim.
A number of people from work go out to eat together once a week.
We can never remember where we ate last, and it is always a last
minute decision.
This voting program is a simple program with no security which
allows each person to vote for where they want to eat sometime
before we go out so that a consensus is reached over time
rather than at the last minute.
People have different ideas of what they want for appearances.
The people who make pleasing HTML are often not the people who
write the CGI programs. So I have pulled a large portion of
the boilerplate HTML into text files which can be edited just
like standard HTML files with a few additional commands which
the CGI program uses to know where to fill in the details that
are different from person to person.
This is done with a program called
webc.
This approach is overkill for this little application,
but was convenient for what I wanted to do.
Some of the associated files for this are shown below. They really
aren't important except to show the linkage between Webc
and the application.
You can try the
lunch program.
The above example shows a CGI program which shows the interaction
of HTML forms, minimal security, Environment variables to get identity,
and file access.
Assignments
This next section lists several possible exercises for you to do.
Some of these examples have suggestions to help you on your way.
In some cases, the suggestions go all the way to coding. Only
look at these suggestions when you run into roadblocks.
Don't feel compelled to do these assignments if you have a
project you want to do.
Write a CGI program which generates a random file from a specified
directory.
The name of the directory is specified in the URL like:
http://cgi.peak.org/~regan/randimg.cgi?~regan/Images
This can be used in a <img src="..."> HTML expression.
Take care to restrict which directories can have files delivered
in this fashion. One way is to demand that a file of a particular
name exists in the directory with the file requested.
This requires that the graphics image is sent directly as opposed
to a URL.
Hints
Write a CGI program which takes a test file which looks something
like:
Header Example test
Q What is the square root of 16?
A 1:256 *2:4 3:16 4:59
Q Approximately how many people live in Corvallis OR?
A 1:5,000 2:10,000 *3:45,000 4:100,000
The Header lines render an appropriate header in the HTML.
The Q lines gives the question text.
The A lines gives the possible answers.
The correct answer is preceded by a *.
The name of the test file is specified in the URL like:
http://cgi.peak.org/~regan/webtest.cgi?test1
Once the user fills in the test, write a new page indicating
how the user did, and mail results back to an appropriate email
address. It will be appropriate to ask the person taking the
test to identify themself.
Hints
If you have any questions about this course, feel free to drop me a message
at regan@peak.org and I'll get back to you.
Dave Regan
regan@peak.org