CS 3733 Operating Systems, Fall 2002 Assignment 1


Due Thursday, September 19, 2002

Assignment 1 has 5 parts.


Introduction
The purpose of this assignment is to test you skills at C programming and testing, including the use of strings and pointers. You will also write routines that may be useful in a later assignment.

Web browsers and web servers communicate using a protocol called HTTP. In the simplest situation, a browser sends a request to a web server and the web server sends a reply. The initial line of a request has the following format:
Method SP request-URI SP HTTP-version CRLF
SP represents white space, any number of blank and tab characters. CRLF represents a carriage return followed by a line feed. The Method is one of a small number of valid requests. The one of interest to us is GET.
The Request-URI gives information about the location of the requested resource. For this assignment, the HTTP-version will be HTTP/1.0.

Some request-URIs specify a host name while others do not. If the host name is not specified, the request-URI is just an absolute path, that is, a path to a file starting with an initial / character.

If the host name is specified we will assume that the request_URI starts with http// which will be immediately followed by the host name and the absolute path.

In this assignment you will be parsing the initial line of an HTTP request and outputting information about the request.

You may use any of the C library routines to do this assignment, except for those whose man page specifically indicate that they are unsafe in multithreaded applications. In particular, you may not use strtok. You should also avoid using strtok_r for this assignment. The test programs you write (the ones whose name ends in test) may assume that input lines have a reasonable maximum length and may truncate the input if necessary. The other programs may not assume anything about the size of the input. No programs that you write should allow a buffer overflow to occur. For example, avoid using gets but fgets is OK.


Part 0: Looking for 3 tokens
Make a clean directory called assign1 for this assignment. Make a part0 subdirectory for this part. Copy all of the files from /usr/local/courses/cs3733/fall2002/assign1/part0 into this directory. Write a function with the following prototype:

int check_tokens(const char *inlin);
that returns 1 if the array given by inlin contains exactly 3 tokens. The array is terminated by a newline character. It returns 0 if it does not contain exactly three tokens. Tokens are separated by an arbitrary number of blanks. The line may contain leading blanks and trailing blanks. The newline may have a carriage return before it. Any character other than a blank, a newline or a carriage return will be considered part of a token. Note that this is not a string as it may not have a string terminator. You are not allowed to change or even examine any characters before the start of inlin or after the newline. Put your function in a file called check_tokens.c. Note that check_tokens is not allowed to change the input line, even temporarily.

Write a main program called check_tokens_test for testing this. Your test program will read lines from standard input, pass them to check_tokens, and display the result in human-readable form. Use separate compilation and the makefile you copied to compile and lint this program. You will hand in the lint output and sample output generated by testing the program. Use cut-and-paste to put the lint output and the test output in a file to be printed. You will be graded on the correctness, and simplicity of your program as well as the test data that you hand in.


Part 1: Parsing the initial request
Copy all of the files from your part0 directory into a new directory called part1. Execute "make clean". Rename check_tokens_test.c parse_initial_test.c, rename check_tokens.c parse_initial.c and modify the makefile accordingly. write the function parse_initial which has the following prototype:

int parse_initial(char *inlin, char **commandp, char **serverp, char **pathp,
                  char **protocolp, char **portp);
The first parameter will be as before, but note that it is no longer constant. You will modify the string so that string terminators are put in after each token. If it returns without error, *commandp will point to the first token, *pathp will point to the second token, and *protocolp will point to the third token. These will now each be a strings because of the string terminator you insert. The *serverp and the portp parameters will be set to NULL. (These will be used in Part 2.) The parse_initial will return 0 on error or 1 on success as before. Modify parse_initial_test so that if there is no error, it outputs the command, path, and protocol on separate lines in a nice format. The only change to the input line that should be made is to replace 3 bytes with string terminators. Again, you are not allowed to modify or examine anything before the start of inlin rm after the newline.

The example below shows a case in which there are two blanks after the GET and after the path and no blanks after the protocol. The first blank after the GET and path are replaced along with the carriage return. Note that if the optional carriage return had not been there, the line feed character would heave been replaced.


Part 2: Parsing the server and port number
Copy all of the files from your part1 directory into a new directory called part2. This will behave exactly as part 1 unless the second token starts with http:// and also contains at least one additional / character. In this case the name of the server is between the second / following http: and the next / that starts the path. You must move the server name to the left one character and put a string terminator between the server name and the path. Set the serverp parameter to point to the server. The figure below shows this in a case in which there are two blanks after the GET and after the path and no blanks after the protocol.

Now also handle the case in which a port number is given. If a server is given and the server string contain a colon character (:), also replace the colon with the string terninator and have the portp pointer point to the character after the colon.

Modify your parse_initial_test from Part 1 so that it also prints the server and the port if these pointer are not NULL.


Part 3: Logging HTTP GET commands

Copy all of the files from /usr/local/courses/cs3733/fall2002/assign1/part2 into a part3 directory along with your parse_initial.c. You will have a makefile and a few test files. Write a function with the following prototype:

int command_logger(int fd, char *inlin);
Put this in the file command_logger.c. The first parameter is an open file descriptor and the second is a line to be interpreted as a possible HTTP GET command line. Note again that this parameter might not be a string. This function writes logging information to the given file descriptor in the format described below. The first line is a copy of the line given by inlin. Next is either a single line indicating that this line does not have exactly three tokens, or it consists for 5 lines giving the five parts on separate lines. If it does not have three tokens (as determined by parse_initial) output the following line exactly:
Line does not contain 3 tokens.
After the period there should be a single line feed character. If it is valid, output 5 lines in the format below:
Command: commandname
Server: servername
Port: port
Path: pathname
Protocol: protocolname
Each of these lines should end in a single line feed with no carriage returns. There should be a single blank character after the colon. The commandname, servername, port, pathname, and protocolname should come from parse_initial and contain no white space. If parse_initial returns a NULL pointer for the server or port, then the line feed will appear after the blank following the colon on the corresponding line.

The function command_logger returns 0 on success and -1 on error. The command_logger should only return an error if an I/O error occurs. It is not considered an error for command_logger if parse_initial returns 0. The only I/O command_logger does is to the file descriptor parameter as described above.

Lint and compile your program using the makefile given. The logger_test program has been supplied. It reads lines from standard input and sends them to command_logger using standard output for the output. Test your program thoroughly. When you think it is working correctly, execute:

   logger_test < infile > outfile
The resulting outfile should be identical to outfile_correct. Use diff to compare these. There should be no differences.


Part 4: Final test

Copy all of your files from your part3 directory into a part4 directory. Then copy the files from /usr/local/courses/cs3733/fall2002/assign1/part4 into this directory. This will give you a new makefile and a new object file called tester.o. Rewrite command_logger so that all output generated by a given call to it is written with a single write statement. Do not make any assumptions about the size of the input line. This will require that you allocate and free storage inside command_logger. Under no circumstances should command_logger allocate any memory that is not freed before it returns. If it does, this is called a memory leak. Memory created in the way cannot be freed without terminating the process. Test to see that your program produces exactly the same output as in Part 3.

When you a satisfied that your program is working correctly, run the tester program. If all goes well you should have two messages that say "Congratulations". If not, try to find your errors and correct them. You should only run the tester program when you think your program is working correctly. Try to run the tester program as few times as possible. Keep track of the number of times you ran it. If tester reports an error, the file tester.log will contain information about what test failed.



Handing in the program
Print out the cover page and fill in the page numbers of the various parts of the assignment. Include all of the documents indicated on the cover page. Use a single staple in the upper left corner to attache the cover page to your output. Hand everything in at the beginning of class on the due date.