The fork system call is often used to create multiple cooperating processes to solve a single problem. This is especially useful on a multiprocessor where different processes can truly be run in parallel. In the best case, if we have N processes running in parallel and each process works on a subset of the problem, then we can solve the problem in 1/N the time it takes to solve the problem using one processor.
If it were always that easy, we would all be writing and running parallel programs. It is rarely possible to divide up a problem into N independent subsets. Usually, the results of each subset need to be combined in some way. There is also a difficult tradeoff to make between the benefits of parallelism and the cost of starting up parallel processes and collecting results from different processes.
For this assignment, you will write a parallel sorting program using Unix processes (i.e., fork ). We can hope to gain some benefit from using more than one process even on a uniprocessor since we will be reading data from a file, so we might win by letting the scheduler overlap the computation of one process with the file I/O of another process. We would hope to see a performance improvement if the program is run on a multiprocessor.
If we genuinely wanted to write a fast parallel sort program on a shared-memory system, we would use a thread package rather than multiple Unix processes. Linux/Unix processes are costly to create, and the communication mechanisms between processes are also expensive. However, a large part of the purpose of the assignment is to give you practice creating and using multiple processes, and it is also interesting to measure the performance of this kind of program.
The data to be sorted is a list of records where a record contains a word and its frequency measure (see struct rec in helper.h ). The records are written to the file in sequence in binary format. This means that you can use fread to read one record at a time or multiple records in one fread call.
You will write a C program called psort that takes 3 arguments:
the number of processes to create,
the name of the input file to sort, and
the name of the file to write the output to.
It will be called as shown below. You must use getopt to read in the command-line arguments. The
command will give you the correct man page, which has a nice example that you can use as a
psort ‐n <number of processes> ‐f <inputfile> ‐o <outputfile>
If the correct number of command-line arguments and the correct options ( ‐n , ‐f , and ‐o ) are provided, you may assume that the values corresponding with the options are valid. If the incorrect number of