Realistic Graphics Lab

Homework 2

Homework 2

CS328 — Numerical Methods for Visual Computing and Machine Learning

Out on Thursday 14/10/2021, due on Thursday 28/10/2021

This notebook contains literate code, i.e. brief fragments of Python surrounded by descriptive text. Please complete/extend this notebook for your homework submission:

  • In addition to your code, please also provide a short description of what your solution is doing and how it works, either by adding comments or in an extra markdown cell.
  • Before handing in, please make sure that your notebook runs from top to bottom after selecting "Kernel->Restart & Run All" without causing any errors. To simplify the grading process, please do not clear the generated output.

Make sure to use the reference Python distribution so that project files can be opened by the TAs. In this course, we use Anaconda, specifically the version based on Python 3.8.

Please keep in mind that homework assignments must be done individually.

Problem -1: Warmup (not graded)

$$ \newcommand{\vb}{\mathbf{b}} \newcommand{\vc}{\mathbf{c}} \newcommand{\vx}{\mathbf{x}} \newcommand{\mA}{\mathbf{A}} \newcommand{\mL}{\mathbf{L}} \newcommand{\mU}{\mathbf{U}} \newcommand{\mP}{\mathbf{P}} \newcommand{\mI}{\mathbf{I}} $$

The following Yes/No and paper & pencil questions are meant to check your comprehension of lecture and reading material in addition to linear algebra prerequisites.

  1. Recall that a permutation matrix $\mP$ has exactly one entry with value $1$ in each row and column and zeros elsewhere. Prove that $\mP^{-1}=\mP^T$. You'll need the definitions of matrix multiplication and the matrix inverse, namely that $\mA\mA^{-1}=\mI$. (where $\mI$ is the identity matrix)

  2. Which of the following parametric models can be fit to data using linear regression techniques? $$ \begin{align*} f_1(x, \vc) &= \vc_0 + \vc_1 x^2 + \vc_2 \sin(x) - \sqrt{\vc_3}\\ f_2(x, \vc) &= \vc_0^3 x^3 - x\\ f_3(x, \vc) &= \exp(\vc_0 x)\\ \end{align*} $$ here, $x$ represents the independent variable and $\vc$ is a vector containing the model parameters.

  3. A matrix $\mA\in\mathbb{R}^{n\times n}$ is said to be symmetric and positive definite (SPD) iff $\mA=\mA^T$ and $\vx^T\mA\vx>0$ for all $\vx\in\mathbb{R}^n$ with $\vx\ne \mathbf{0}$. It's useful to know whether matrices are SPD, since specialized storage representations (don't store entries below the diagonal) and factorization algorithms (Cholesky decomposition) can exploit these properties, reducing the expense of linear system solving by approximately 50%.

    Now recall the definition of the normal equations corresponding to the least squares system $\mA\vx\approx\vb$: $$ \mA^T\mA\vx=\mA^T\vb. $$ Assuming that we'd like solve a least squares problem with this approach, it would naturally be useful to know whether the product matrix $\mA^T\mA$ is symmetric and positive definite so that these more efficient algorithms can be used. Can you prove that $\mA^T\mA$ is SPD? (you'll likely need to introduce some extra assumptions on $\mA$ for this)

0 Prelude

As in the last assignment we'll begin by importing essential NumPy/SciPy/Matplotlib components that are needed to complete the exercises. The first two lines instruct Matplotlib to embed figures directly into the notebook and render them in sufficient quality for modern high-DPI ("retina") displays.

In [ ]:
%matplotlib inline
%config InlineBackend.figure_format='retina'

import numpy as np
import scipy.linalg as la
from matplotlib import pyplot as plt

You will get to run numerical algorithms on high resolution images in this homework, which will involve a fair amount of plotting and visualization of intermediate results. The Matplotlib default settings unfortunately cause figures to be shown at a tiny resolution, so this next line changes the settings to make all figures large by default. The second line ensures low resolution images are rendered with sharp pixel contours instead of a blurry approximation, which will be helpful later on.

In [ ]:
plt.rcParams['figure.figsize'] = (20.0, 10.0)
plt.rcParams['image.interpolation'] = 'nearest'

Problem 1: Fitting a function to data, a.k.a. Regression Analysis (15 points)

The following plot visualizes the total population of the Canton Vaud over a period of 41 years from 1979 to 2019 (source).

In [ ]:
population = np.array([
    518515, 521658, 525303, 530013, 533344, 531374, 535333, 541343, 548138, 555458, 562688, 574661, 583903,
    589657, 593190, 598587, 602172, 603156, 604440, 607879, 612276, 615978, 621784, 627933, 635850, 644097,
    650791, 658659, 668581, 684922, 697802, 708177, 721561, 729971, 743317, 755369, 767497, 778251,  794384,
    800162, 806088], dtype=np.float64)
years = np.arange(1979, 2020, dtype=np.float64)
assert len(population) == len(years)

plt.plot(years, population, 'o');

In this exercise, you will apply the tools of linear regression analysis and and linear least squares to fit an approximate parametric model to this dataset.

  1. TODO: Solve a linear system via the LU factorization of the normal equations (via la.lu_factor() and la.lu_solve() similar to what was shown in class) to fit a degree-6 plynomial to this dataset, which maps from years to population values. Plot the original data, as well as your resulting polynomial (using a higher resolution with 300 evaluations between 1979 and 2019) in a figure below to compare the trend to the model.
    Why are we using the normal equations here instead of directly applying the LU factorization?
    Assuming this very simple model is correct, how large would you predict the population of Vaud to be in the year 2025?

  2. TODO: Now create a similar figure where you fit again a degree-6 polynomial, but using the economy size QR factorization (and without the use of the normal equations). The functions to use are la.qr(matrix, mode='economic') and la.solve_triangular(). Which of the methods (1. or 2.) produces the solution with a lower residual?

  3. TODO: What is the condition number of the product matrix $\mathbf{A}^T\mathbf{A}$ created by the normal equation approach in subproblem 1? (hint: use np.linalg.cond()). Relate this information to your observations in subproblem 2.

  4. TODO: Use the same approch as in subproblem 1 (LU factorization), but this time without the normal equations (why?) to fit a degree-40 polynomial to the data. Does it result in a closer fit to the collected data? What could be the problem(s)?

Hint: A degree-n polynomial has the form $p(x) = \sum_{k=0}^{n} c_k x^k$ with the $n+1$ constants $c_0, \dots, c_n$.

In [ ]:
# Solution part 1
In [ ]:
# Solution part 2
In [ ]:
# Solution part 3
In [ ]:
# Solution part 4

2 Developing a raw photograph (85 points)

Virtually any computer device in existence today is equipped with at least one digital camera; with over 5 billion users of digital photography, this technology has now fully permeated our society. In this exercise, you will get to build your own image processing pipeline that applies color space transformations to turn the raw data measured by a digital camera into a usable image that can be viewed on a computer screen. In the following, we will explain what this means, and how to implement the associated steps using tools from numerical linear algebra. But first, we shall begin with a brief review of the process that takes place when taking a picture.

A digital camera uses an assembly of optical elements to focus the incident light onto a silicon sensor that consists of millions of tiny regions arranged in a regular grid. The silicon is sensitive to light, and each small region on the sensor (generally referred to as a pixel) measures the portion of light that falls on it. When no picture is being taken, the sensor is either inactive or used for a live preview (as in mobile phones or smaller cameras). When the actual photograph is taken, it is important to be able to control the precise amount of time during which light is collected by the sensor. This is realized by means of the shutter, a small mechanical or electronic barrier that prevents light from reaching the sensor. The shutter can open and close very quickly, in about 1/4000th of a second.

When taking a picture, the camera performs a sequence of steps in rapid succession: first, the shutter opens, allowing light to reach the pixels on the sensor. For a brief duration (known as the exposure time), the sensor collects all light that reaches the surface. To be able to distinguish colors, each pixel on the sensor is covered with a tiny filter that will only permit certain wavelengths of light to pass through. Due its physical nature, light reaches the sensor in discrete amounts not unlike droplets of rain that fill a number of buckets with water. When the exposure time is very short, each pixel may only have received a few droplets, and the image is very noisy. When the exposure time is too long, the buckets will fill up completely, and it is impossible to recover a usable image. Once the exposure time has elapsed, the shutter closes so that the measurement ceases to change any further.

Once the measurement is frozen, the camera proceeds to perform a full readout of the data associated with each pixel (i.e. the "fill height" of each bucket in the illustration above). This data is known as a raw image: a representation of the original sensor data without any additional processing. Raw images play a similar role as negatives in traditional photography: they contain all of the information that is needed to eventually create an image, but they aren't yet usable as an image on their own. Converting a raw image into an actual image entails decoding the information on the sensor and translating this data into displayable red/green/blue intensities that "make sense" on a computer screen, and which reproduce the colors that a human would have observed in the moment when the photo was taken.

Analogous to classical photography, the process of turning the raw sensor data into a usable image is referred to as "developing" the image. Cell phone cameras generally perform this step automatically without user intervention. On the other hand, most professional cameras allow the user to choose between developing the image automatically within the camera or storing undeveloped raw images that can be manually developed by the user later on. The latter has become extremely popular, since raw image contain more information than would be available in a fully processed image (such as a JPEG image). For instance, JPEG files may not be able to represent the full range of colors perceived by the camera's sensor, and the lossy compression implies that there is some loss of image quality. A range of commercial (Adobe Photoshop, Lightroom, ..) and free (RawTherapee, darktable, ..) software tools can be used to develop raw image with full artistic control over the output.

Developing an image can be an arbitrarily complex process, though the simplest version can be reduced to only four steps:

  1. Loading the raw image data
  2. Demosaicing
  3. Color space transformation
  4. Gamma correction

We will now walk through each of these steps (slightly out of order):

2.1 Loading the raw image data

Begin by downloading the following file (~72 megabytes), which contains the raw sensor data of two photographs taken with a Canon 6D camera: http://rgl.s3.eu-central-1.amazonaws.com/media/uploads/wjakob/2016/10/20/lacleman_raw.npz.

Place the file into the same directory as the CS328 - Homework 2.ipynb file so that it is easy to load from Python. The following set of commands then import the file and extract all relevant data (this will take a few seconds):

In [ ]:
data = np.load("lacleman_raw.npz")
tree = data['tree']
whitebal = data['whitebal']
patches_target = data['patches_target']

The file contains three 2D NumPy arrays—two high resolution photographs (tree and whitebal) with approximately 20 million pixels each, and a mysterious small array (patches_target) that we'll need later on. We will only focus on the tree array for now.

In [ ]:

2.2 Demosaicing (15 points)

Let's try to display one of the raw images using Matplotlib!

In [ ]:

The image is just a raw grid of sensor data without explicit color information, hence Matplotlib decided to use a heat color map to visualize the array contents. Purple, blue, green, and yellow colors correspond to pixel values of increasing magnitude. A strange grid-like artifact should be apparent when inspecting the image more carefully.

TODO Do the following to zoom into a smaller part of the image: plot the image region between rows 1000...1100 and columns 1000...1100 (hint: use NumPy array slicing). Make use of the color map (cmap) parameter of imshow to ensure that Matplotlib visualizes the image data using a grayscale (gray) color map.

In [ ]:
## TODO...

Very strange.. what is going on here? The pattern you see turns out to be related to the design of the sensor that was discussed earlier. Because it's impossible for a single pixel to distinguish among different colors, the sensor consists of a regular arrangement of pixels that are sensitive to red, green, and blue. This is known as a Bayer grid. Note that the sensor contains twice as many green pixels as red or blue pixels; the reason for this physiology: the eyes of humans are more sensitive to green wavelengths of light, hence it makes sense to capture the associated color information at a higher resolution.

Unfortunately, this kind of representation is not particularly useful: to be able to print or display images, we'll generally want to have information about all colors at every pixel. The process of taking the partial information from the Bayer grid and converting it into a full color image is known as demosaicing. Very high quality demosaicing algorithms exist that do a great job at filling in the missing information, but they can be quite complicated to implement.

Here, we will implement the most basic version of demosaicing possible: instead of trying to infer the missing pixel values to obtain a full-color image with 5494x3666 pixels, we generate an image at half resolution (2747x1833 pixels) by collapsing repeated groups of 4 monochromatic pixels into a single colored pixel. We'll simply ignore one of the green pixels as part of this step for simplicity.

TODO: Create a function demosaic(image), which takes a raw image (a 5494x3666 array) as input and performs the operation discussed above, returning a 2747x1833 3-tensor (i.e. a matrix with a third dimension for color with the ordering R, G, B). The easiest way to do this is by extracting the red, green, and blue channels into their own images and then using np.dstack to create the 3-tensor. The figure above depicts the Bayer grid arrangement used by the Canon 6D camera that took this image. You probably shouldn't need more than 5 lines of code for the demosaicing function.

A general suggestion for this assignment: since we're working with fairly large images, it's important to use vectorized NumPy function—in particular, you should never explicitly need to loop over the pixels of the image.

In [ ]:

TODO: Run the demosaicing function on the tree image and plot the result.

In [ ]:

2.3 Gamma correction (10 points)

Assuming that you got all the color channels into the right places, the grid-like patterns should disappear and the previous visualization should have resulted in a very dark image of a tree on our beautiful Geneva Lake with a strange blue-green tint. What's going on here?

As discussed before, the color values in the raw image are analogous to the "fill level" of a bucket of water. There is a direct (linear) relationship between how much light a sensor pixel received and the magnitude of the corresponding entry in the raw image.

Somewhat surprisingly, this way of representing image intensity values is not directly suitable for display on a modern computer monitor. Current displays assume a more complex (nonlinear) relationship between pixel values and the brightness that will be produced as a result. In other words: when doubling the value stored in a pixel, the actually visible pixel brightness will not generally double as a result. This nonlinear mapping is known as the gamma curve of the display and is defined as $$ I = V^\gamma $$ where $I$ is the resulting pixel brightness, $V$ is the pixel's value, and $\gamma$ is a device dependent constant—usually $\gamma\approx 2.2$. To be able to display a raw image properly, we'll want to go the other way, i.e. to find out which pixel value to set so that the display reproduces the brightness that was originally recorded on the sensor.

TODO: Define a function invgamma(image) that takes an arbitrary image of intensities and inverts the gamma curve (i.e. it computes $V$ given $I$), returning an array of pixel values of the same size. Implement the function in such a way that negative values are mapped to zero and values greater than 1.0 are mapped to 1.0 (this will be useful to avoid some potential pitfalls later on). You probably shouldn't need much more than 5 lines of code to do this.

In [ ]:

TODO: Plot the tree image once more, but this time run it through both the demosaic and the invgamma function.

In [ ]:

2.4 Color space transformations (60 points)

2.4.1 Color space transformations (5 points)

Assuming a correct implementation of the previously discussed components, you should now see an overall brighter image of the tree with natural contrast. However, the colors of the tree still look "weird" (it's the same unnatural blue-green tint we observed before). Why does this happen?

There is an infinite space of possible varieties when building screens or sensors that emit or measure certain colors of light. Any particular RGB screen or camera will need to commit to a specific set of emission colors or sensitivities, which conceptually defines the axes of a 3D color coordinate system that is associated with that device. Unfortunately, this means that the red, green and blue colors perceived by the camera's sensor are generally not the same colors that are produced by the corresponding phosphors or LEDs on your screen. In other words, a color (such as [0.0, 0.5, 0.5], which could be a dark blue green) captured by one device will need to be transformed so that it truly reproduces that same color on another device.

The above discussion greatly simplifies some aspects of color theory, but this description shall be sufficient for our purposes here. The tool we'll use to transform colors is simple: matrix-vector multiplication! The following fragment defines a small helper function that applies a color space transformation to every pixel in an image.

In [ ]:
def apply_colormatrix(image, matrix):
    ``image`` is assumed to be an RGB image represented
    as a NumPy array with shape [width, height, 3]. 
    ``matrix`` is a 3x3 color transformation matrix
    that will be applied to the color values of each pixel
    return image @ matrix.T

TODO: Let's try using it! Define a $3\times 3$ matrix that swaps the blue and red color channel (what should this matrix look like?). Use it to transform the colors of the tree image and visualize the result.

Note that you need to be careful when applying color transformation matrices: the sequence is always

  1. Demosaic
  2. Color transformation
  3. Gamma correction

In particular, be careful not to swap steps 2 and 3. It doesn't make sense to transform a (nonlinear) gamma representation of colors using (linear) matrix-vector multiplication.

In [ ]:

2.4.2 Color checkers

Without any additional knowledge, it's unclear what transformation should be applied to ensure correct color reproduction. In the context of photography, the way that this problem is generally solved is by taking a picture of a color checker, which looks something like this:

A color checker is a standardized piece of cardboard, usually with 24 squares of different colors. The patch colors are specially chosen to represent a range of colors found on man-made and natural materials such as human skin or plants.

We define the following helper function, which plots a given set of color checker values in a 6x4 grid similar to the above image.

In [ ]:
def plot_colorchecker(patch_data, title = ''):
    plt.figure(figsize=(6.0, 4.0))
    for i in range(4):
        for j in range(6):
            plt.subplot(4, 6, i*6+j+1)
            plt.imshow(invgamma(patch_data[np.newaxis, np.newaxis, i*6+j, :]))

Importantly, the red, green, and blue pixel intensities that are needed to reproduce the color checker colors on modern computer screens are known constants. They are stored in the mysterious patch_data array we loaded earlier. Thus, we should be able to plug them into the plot_colorchecker function to generate an image that resembles the above photograph:

In [ ]:
plot_colorchecker(patches_target, title='Reference color checker');

2.4.3 Obtaining and visualizing all required information (15 points)

When I took the photograph of the tree on Geneva Lake, I also made a second photograph of a color checker, which is stored in the whitebal array.

TODO: Use the functions defined earlier to visualize a demosaiced & gamma-corrected portion of rows 680..1150 and columns 1020:1730 of the whitebal image.

In [ ]:

To continue, we'll need to have a way of of extracting the captured colors of the individual patches in the whitebal image. The following function might be useful:

In [ ]:
def extract_patch(image, i, j):
    Extracts the central portion of the color patch ``(i, j)`` from ``image``
    ``image`` is assumed to be a demosaiced linear (i.e. *not* gamma-corrected)
    version of the color checker image
    ``i`` denotes the row of the color checker (from 0 to 3)

    ``j`` denotes the column of the color checker (from 0 to 5)

    start = np.array([i,   j  ]) * 113 + np.array([690, 1040]) + np.array([30, 30])
    end   = np.array([i+1, j+1]) * 113 + np.array([690, 1040]) - np.array([30, 30])

    return image[start[0]:end[0], start[1]:end[1], :]

TODO: Write a piece of code, which extracts all 24 patch colors from the whitebal image and converts it into the same format as the reference colors stored in patches_target (i.e. a $24\times 3$ matrix). Store the resulting information in an array named patches_camera.

In [ ]:

TODO: Apply the plot_colorchecker function to the patches_camera array generated above. The result should match the photograph from earlier (i.e. a weird color checker with a blue-green tint). Be careful that your extraction code does not change the ordering of patches (that much should be visible even with the distorted colors)

In [ ]:

2.4.4 Deriving the linear system for the color space transformation (25 points)

Now begins the truly interesting part: we need to find a color space transformation that will allow us to convert red/green/blue values on the sensor to meaningful red/green/blue values on a modern display. We've previously taken an image of a color checker and thus know the value of each patch in the camera's RGB coordinate system (stored in patches_camera). We also know, what value it should theoretically have in the display RGB coordinate system (patches_target).

A suitable color space transformation $f$ should thus satisfy a set of equations of the form $$ f(\mathbf{c}_{\mathrm{camera}, i}) = \mathbf{c}_{\mathrm{target}, i} \qquad (i=1, \dots, 24), $$ where f applies a color-space transformation using a matrix-vector multiplication $$ f(\mathbf{c}) = \mathbf{F}\mathbf{c}. $$

Here, $\mathbf{F}$ is a $3\times 3$ matrix, i.e. there are 9 unknowns. On the other hand, we have observed the RGB values of 24 different color patches, which means that there were 72 observations. Since there are more observations than unknowns, the problem does not have a solution in the traditional sense. However, perhaps we can find a solution such that $$ f(\mathbf{c}_{\mathrm{camera}, i}) \approx \mathbf{c}_{\mathrm{target}, i} \qquad (i=1, \dots, 24). $$

This is a least squares problem!

In the following we want to convert the above set of equations into a concrete least squares system of the form $\mA\vx = \vb$ that we can then solve.

TODO: In this task the the matrix $\mathbf{F}$ above is the unkown, which is a bit unusual. So instead, we reinterpret this $3\times 3$ matrix as a $9\times 1$ column vector $\vx$. This however also forces us to re-arrange the rest of the problem somewhat. Write code that builds the corresponding matrix $\mA$ and right-hand-side vector $\vb$.

This is the hardest part of this problem. You'll want to carefully think about how to solve the problem on pencil and paper first before implementing it in Python. A few good topics to think about:

  • Where should the camera color observations go?
  • Where do the target pixel values go?
  • What dimensions should the system have?
  • What kind of structure does the system matrix have? (A hint: it's not a full matrix -- some elements will be filled with zeros.)

In this exercise, you must design a single linear system solving for all channels at once, not three systems solving for R, G and B independently.

TODO: Describe the structure of your matrix. You may add a picture of your hand-drawn sketch. (It's fine to upload it as a separate file on Moodle or to embed it in the notebook.)

Hint: Two functions could be helpful here: np.ravel() rearranges the contents of a matrix into a tall column vector. np.reshape() goes the other way and permits rearranging a vector back into a matrix.

In [ ]:

2.4.5 Solve for the color space transformation matrix (15 points)

TODO (10 points): Solve the above system using any of the methods discussed in class. Note that you will need to invert the transformation from matrix $\mathbf{F}$ to vector $\vx$ above to get the final color-space transformation.

In [ ]:

TODO: Execute the following code to apply your computed color space transformation to the color values in patches_camera and visualize the result using the plot_colorchecker function. The result should look virtually identical to the reference colors.

In [ ]:
plot_colorchecker(patches_camera, title = 'Before color transformation')
plot_colorchecker(apply_colormatrix(patches_camera, x), title = 'After color transformation')

TODO (5 points): Finally!! Now, apply the color space transformation to the tree image and plot the result.

You have learned how digital photographs are developed and implemented an end-to-end pipeline that solves a least squares problem to obtain calibrated colors. Congratulations! :)

In [ ]: