Description
Given a one–dimensional array of complex or real input values of length N, the Discrete Fourier Transform consists af an array of size N computed as follows:
N 
1 

k^{X} 
where j = ^{p} 

H[n] = 
W ^{nk}h[k] 
where W = e ^{j2} ^{=N} = cos(2 =N) jsin(2 =N) 
1 
(1) 

=0 
For all equations in this document, we use the following notational conventions. h is the discrete–time sampled signal array. H is the Fourier transform array of h. N is the length of the sample array, and is always assumed to be an even power of 2. n is an index into the h and H arrays, and is always in the range 0 : : : (N 1). k is also an index into h and H, and is the summation variable when needed. j is the square root of negative one.
The above equation clearly requires N^{2} computations, and as N gets large the computation time can become excessive. There are a number of well–known approaches that reduce the comutation time considerably, but for the purpose of this assignment you can just use the simple double summation shown above.
Given a two–dimensional matrix of complex input values, the two–dimensional Fourier Transform can be computed with two simple steps. First, the one–dimensional transform is computed for each row in the matrix individually. Then a second one–dimensional transform is done on each column of the matrix individually. Note that the transformed values from the first step are the inputs to the second step.
If we have several CPU’s to use to compute the 2D DFT, it is easy to see how some of these steps can be done simulataneously. For example, if we are computing a 2D DFT of a 16 by 16 matrix, and if we had 16 CPUs available, we could assign each of the 16 CPU’s to compute the DFT of a given row. In this simple example, CPU 0 would compute the one–dimensional DFT for row 0, CPU 1 would compute for row 1, and so on. If we did this, the first step (computing DFT’s of the rows) should run 16 times faster than when we only used one CPU.
However, when computing the second step, we run into difficulties. When CPU 0 completes the one–dimensional DFT for row 0, it would presumably be ready compute the 1D DFT for column 0. Unfortunately, the computed results for all other columns are not available to CPU 0 easily. We can solve this problem by using message passing. After each CPU completes the first step (computing 1D DFT’s for each row), it must send the required values to the other processes using MPI. In this example, CPU 0 would send to CPU 1 the computed transform value for row 0, column 1, and send to CPU 2 the computed transform value for row 0, column 2, and so on. When each CPU has received k messages with column values (k is the total number of columns in the input set), it is then ready to compute the column DFT.
Finally, each CPU must report the final result (again using message passing) to a designated CPU responsible for collecting and printing the final transformed value. Normally, CPU 0 would be chosen for this task, but in fact any CPU could be assigned to do this.
We are going to use 16 CPUs in the deepthought cluster to perform the 2d DFT using distributed computing. http://support.cc.gatech.edu/facilities/instructionallabs/deepthoughtcluster/
howtorunjobsonthedeepthoughtcluster
For now ignore the discussion about using qsub to submit job requests. We will discuss this in class.
Copying the Project Skeletons

Log into deepthought19.cc using ssh and your prism login name.

Copy the files from the ECE6122 user account using the following command: /usr/bin/rsync avu /nethome/ECE6122/FourierTransform2D . Be sure to notice the period at the end of the above command.
1

Change your working directory to FourierTransform2D cd FourierTransform2D

Copy the provided fft2dskeleton.cc to fft2d.cc as follows: cp fft2dskeleton.cc fft2d.cc

Then edit fft2d.cc to implement the transform.


Implement a simple one–dimensional DFT using the double summation approach in the equations above.



Use MPI send and receive to send partial information between the 16 processes.



Use CPU at rank zero to collect the final transformed values from all other CPU’s, and write these results to a file called MyAfter2D.txt using the SaveImageData method in class InputImage.


Compile your code using make as follows: make
Resources

fft2dskeleton.cc is a starting point for your program.

Complex.cc and Complex.h provide a completed C++ object containing a complex (real and imaginary parts) value.

Makefile is a file used by the make command to build fft2d.

Tower.txt is the input dataset, a 256 by 256 image of the Tech tower in black and white.

after1D.txt is the expected value of the DFT after the initial one–dimensional transform on each row, but before the column transforms have been done. This is for debugging only as the assignment does not need to write the one–dimensional transformed results.

after2D.txt is the expected output dataset after the two dimensional transformation, a 256 by 256 matrix of the transformed values.

InputImage.cc and InputImage.h that will ease the reading of the input data. This object has several useful functions to help with managing the input data.

DO NOT ASSUME that the MPI size will always be 16. Instead implement the program with a variable (perhaps nCPUs) that contains the MPI size.


The InputImage constructor, which has a char_{*} argument specifying the file name of the input image.



The GetWidth() function that returns the width of the image.



The GetHeight() function that returns the height of the image.



The GetImageData() function returns a onedimensional array of type Complex representing the original timedomain input values.



The SaveImageData function writes a file containing the transformed data.



The SaveImageDataReal function writes a file with only the real part of the image transformed data.

Output File Naming Convention In order to ease the grading procedure, you must write the results of the 2D transform on a file named MyAfter2D.txt. For debugging, if you want the results of the 1D transforms, write the results to a file named MyAfter1D.txt. For graduate students also computing the reverse transform, write those results to a file named MyAfterInverse.txt.
2
Graduate Students only . After the 2D transform has been completed, use MPI again to calculate the Inverse transform, and write the results to file MyAfterInverse.txt. Use the SaveImageDataReal function to write results. This function writes the real part only (as the imaginary parts should be zero or near zero after the inverse.
Turning in your Project. Details about turning in your program will be provided.
Some thoughts on implementing the 2D DFT

Consider using Rank 0 as the traffic cop to handle the data collection and dissemination. This means you would actually need 17 ranks, using one (rank 0) as the coordinator, and the other 16 ranks as the “worker” processes.

After reading in the original Tower.txt, create a second array of size (width * height) which represents the H array (the transformed data). We need this because the simple DFT algorithm given above cannot transform inplace. In other words, the H array and the h array are different areas of memory.

Clearly a working one–dimensional DFT is needed before a correct two–dimensional DFT can be implemented. Consider using a single CPU (no MPI) to read the Tower.txt file (using InputImage), and doing a one– dimensional DFT on all rows. Then save the resulting file (again using the InputImage.SaveImageData) and comparing to after1d.txt.

Once the one–dimensional DFT is working, use MPI as described above, and assign each CPU the correct number of rows, and what row they should start on. Assuming nCpus is the variable containing the number of CPU’s, nRows is the number of rows in the image, and myRank is the rank number of this CPU, then the number of rows per CPU is nRows / nCpus, and the starting row number for each CPU is nRows / nCpus _{*} myRank.

After computing the one–dimensional DFT on each assigned row use MPI to send information to all other CPU. Each CPU will need to send nCpus – 1 messages, one message to all other CPUs. Also you will likely need a separate array of type Complex, as the data to be send is not necessarily in sequential memory locations.

You shoudl consider a mix of blocking and nonblocking MPI calls to move the data around the various ranks as needed.

Also consider using the Tag field in the send and receive calls to get the received data in the proper location as you move the rows and columns around.

After receiving information from all of your peers (and storing that information in the H array in the right place), perform the columnwise one–dimensional DFT on each of your columns. You might need yet another array of size (width * height) for this as well, although you can actually use a smaller array (each CPU only uses operates on a subset of all the columns).

After the second set of transforms, use MPI to send all computed information to the master (CPU rank 0). You can use nonblocking send here, as we are sure CPU 0 will eventually call recv.

Finally, write out the final 2d transform using SaveImageData.
3