Problem Set 4 Solution

$30.00 $24.90



Please read the late submission and collaboration policy on the course website:

All code will be tested on Anaconda for Python 3.6, available from

This problem set is provided as a ZIP file. You are reading the main problem set PDF in the root pset4/ directory. Look in the pset4/code sub-directory for code for different problems, with data in pset4/code/inputs. You will be required to modify and fill-in various functions in the .py files in the code directory. Running this code (e.g., with python ./ and so on) will create output images in the pset4/code/outputs directory.

  1. Complete the code files in pset4/code/ by filling out the required functions.

  1. Run each program to generated output images in pset4/code/outputs directory.

  1. Create a PDF report file created with LATEX. Use the template provided on the course website (scroll down to the resources section at the bottom). In particular, make sure you fill out your name/wustl key (top of the .tex file) to populate the page headers. Also fill out the final information section describing how long the problem set took you, who you collaborated / discussed the problem set with, as well as any online resources you used.

The main body of the report should contain responses to the math questions, and also include results and figures for the programming questions as asked for. These figures will often correspond to images generated by your python code in the pset4/code/outputs/ directory.

Place this report file as solution.pdf in the root pset4/ directory.

Then, zip all the contents and sub-directories of pset4/ into a single ZIP file, and upload it to blackboard.

PROBLEM 1 (Total: 30 points)

is the 3 3

is a 3-

Consider a pair of cameras with projection matrices K[Ij0] and K[Ijt], where I

identity matrix, 0

dimensional all zeros vector, t is a three dimensional vector representing translation, and both share the intrinsic matrix,

  • 3

f 0 0

K = 6 0 f 0 7 ;

  • 5

0 0 1

i.e., they have a focal length of f in pixels, and we are assuming that image co-ordinates are zero-centered.

  1. Consider the horizontal stereo case, where t = [ tx; 0; 0]T (where tx > 0), that is the second camera is shifted “right” by tx (in world length units, say meters). Show that if a point in the world projects to (x; y) in camera 1 and (x0 ; y0 ) in camera 2, then y0 = y, x0 x, and that the disparity d is given by

d = x x0 = fZtx ;

where Z is the depth (i.e., distance in the view direction from the camera center, or z co-ordinate) of the world point (in meters). (10 points).

  1. Assuming the same pair of camera projection matrices as in part (a), and consider a set of 3D world points that lie on a plane such that the world co-ordinates (X; Y; Z) for all points satisfy the equation X + Y + Z = k for some plane parameters ( ; ; ; k). Show that for all such points, the disparity d is a linear function of the projected (left) image co-ordinates (x; y), as d = ax + by + c. Derive an expression for (a; b; c) in terms of the world plane parameters ( ; ; ; k)

and the camera parameters f; tx. (10 points).

  1. Now, consider the problem when the second camera has a translation vector t = [0; 0; tz]T (where tx > 0). For this case, derive expressions for the projected location (x0 ; y0 ) in camera 2 in terms of the location (x; y) in camera 1, and the depth of the point Z (from the center of camera 1). Assuming that Z tz for it to be visible in both cameras, what are the set of possible co-ordinates (x0 ; y0 ) that could match to (x; y) (this is when you don’t know the depth Z) ? (10 points).

PROBLEM 2 (Total: 15 points)

  1. Implement the function buildcv in code/ to build a cost-volume. The function will be provided two

gray scale images and the maximum possible disparity value input, and should return a three dimensional array of size H W (Dmax + 1), where the the [y; x; d] element is the cost of matching pixel x; y in the left image to x d; y in the right image (for 0 d Dmax). Use the hamming distance of 5 5 census transform as in the last problem set. For values where x d < 0, set the cost to 24 (corresponding to the maximum possible hamming distance).

The support code will call this function with grayscale versions of the left and right image, and then simply compute the disparity map by doing an arg min to produce a color-mapped disparity image outputs/prob2a.jpg. Include this in your report. (5 points).

(b) Next, implement bfilt to smooth this cost volume using Bilateral filtering with the left RGB image as the guide.

Formally, given a cost volume C[n; d], you want to return a smoothed volume C[n; d], where


C[n1; d] = B[n1; n2]C[n2; d]:


Here, B[n1; n2] is spatially-varying, and defined with respect to the RGB left image X:


; n2




jn1 n2j2

jX[n1] X[n2]j2


B[n1; n2] = 1:

2 2

2 I2



This is therefore very similar to the standard Bilateral filter (from Problem Set 1), except that you are computing the kernel based on the RGB image X, and applying to the cost volume C.

The support code will call your function (with appropriate parameters for window size, and spatial and color variances for the kernel), and again do an arg min to produce a second disparity map outputs/prob2b.jpg. Include this in your report. (10 points).

PROBLEM 3 (Total: 40 points)

  1. Implement the function viterbilr in code/ to implement the forward-backward algorithm described

in class to find an exact solution for disparity optimization with a smoothing cost applied only along horizontal neighbors. You will also need to copy the buildcv function from problem 2. Given a cost volume C, and values of P1 and P2, your function should return a disparity map d[n] that exactly minimizes the cost function:



S(d[n]; d[n0 ]);

d = arg min C[n; d[n]] +


n;n )

where the second summation is over pairs of pixels that are horizontal neighbors, and the smoothness cost is defined as:


  • 0 if d = d0


S(d; d0 ) = P1 if jd d0 j = 1


: P2 otherwise:

Implement this to be efficient as described in class. You will need to do a for loop over the x co-ordinate as you go left-to-right and back, but you should operate on all lines in parallel, and do efficient minimization over disparities.

Running code/ will call your function and store the returned disparity map in outputs/prob3a.jpg.

Include this in your report. (20 points).

  1. Implement the function SGM in code/ (you will again need to copy the buildcv function to this file) to perform semi-global matching as described in class. This function should form “augmented” cost volumes by traversing the image in four directions: left-to-right, right-to-left, top-to-bottom, and bottom-to-top, take their sum, and compute the disparity map as the arg min of this summed volume. The smoothness cost function to be used is the same as in part (a).

Running code/ will generate outputs/prob3a.jpg. Include this in your report. (20 points).

PROBLEM 4 (Total: 15 points)

Implement the function lucaskanade in code/ to perform Lucas Kanade optical flow estimation. Your function will be provided two grayscale image frames, and a window size to aggregate equations / gradient moments from.

Use convolutions with the filters f x and f y defined in the code to compute x and y image derivatives. (Also, apply these derivatives to the ’average’ of the two frames).

Running code/ will call your function on the two frames in the inputs/ directory, and display a “quiver” plot. Save this plot as an image, and include it in your report.