Project 1: Images of the Russian Empire: Colorizing the Prokudin-Gorskii Photo Collection

by Maya Joiner

In this project, I attempt to create a single color image by aligning 3 Prokudin-Gorskii glass plate images (each representing the red, green, and blue color channels). The data is one long image/matrix that I cut into 3 parts and label r, g, and b. In order to align the three channels together, I use the exaustive aligning method for small .jpg images and the image pyramid method for large .tif files to align the red and green channel onto the blue channel (more explanation on the methods below).

Exaustive Search

For cathedral.jpg, monastery.jpg, and tobolsk.jpg, I used the exaustive search approach because these photos were smaller than the others. I searched the best shift of the red and green channels compared to the blue channel out of a -15 to 15 window of possible displacements. To decide which shift was the "best" shift, I tried using both the Euclidian method (L2 norm) and NCC (Normalized Cross Correlation). More specifically, I chose whichever shift either minimized the L2 norm or maximized NCC. However, for all the photos, taking the L2 norm seemed to work better. Once I got the best shift, I used those values to shift the channels by that much using the np.roll function.

For monastery.jpg and tobolsk.jpg, even the L2 norm version seemed to be a bit off (which I figured was because of the border around the photos). Therefore, I used only a 300x300 subset of the image instead of the entire 341x391 image. To do this, I got the x and y center coordinates of the image and added/subtracted 150 to the x and y coordinates to get the x and y bounds. Taking out the border helped make the image more clear.

Image Pyramid

For larger files ending in .tif, I used the image pyramid method. Upon researching about the image pyramid method (sources at end), these were my findings (sources below):

  1. Make image smaller by half every time (multiply by 0.5 recursively for 4 levels)
  2. Align smallest image (top of the pyramid)
  3. Upsample by 2x and align the image using the upsampled shift. The upsampled image will be same size as original image but lower res
Conceptually, as the photos get very large, it is computationally expensive to use the exaustive approach (since it uses a double for loop). The image pyramid method mitigates that by making the photos smaller and of a lower resolution (i.e. condensing the original but keeping only the essential information of the original photo when doing so) when calling the exaustive align function. To recursively move down the pyramid (i.e. account for larger versions of the photo), I upsampled the x and y alignment of the small image (i.e. multiplied by 2) because theoretically, if the size of the image doubles, the shift also doubles. According to the Stanford pdf (again, sources below), the pyramid should have no more than 4 levels because having more than 4 levels would make the top layer too small. Then I calculated the total shift, which is just the original shift + shift for each scale/level of the pyramid. Similar to the exaustive search, some images had a problem where the border screwed up the alignments, so I cropped the border of the image. This time the .tif images were larger (~3200 x ~3650), so I cropped the image to about (~2550, ~2850) using the same approach as the 300x300 case. Although this improved aligment for a majority of the images that didn't turn out well without the crop, the only one that was still a bit stretched was emir.tif (especially for the red channel). However, when I got rid of the upsampling portion (i.e. didn't multiply the scales by 2), emir.tif became very aligned. I'm not sure why not upsampling makes emir.tif better, but I figure it has something to do with the initial shifts at the smallest level not being accurate enough (magnifying a bad initial shift will only make the shift worse).

Sources

  • https://youtu.be/Q372k8Xg_ds
  • section 3.5 of the Szeliski textbook
  • http://robots.stanford.edu/cs223b04/algo_tracking.pdf
  • https://www.youtube.com/watch?v=ZAtt4M6JAug
  • https://youtu.be/pdBeqMGdGiU
  • Cathedral (Example of L2 vs. NCC)

    left: result using L2 norm. right: result using NCC. Used L2 Norm for all photos. Offset: red: (7, -1), green: (1, -1).

    Monastery (Example of Cropping)

    left: cropped L2 norm. right: uncropped L2 norm. Used cropped L2. Offset: red: (3, 2), green: (-3, 2)

    Tobolsk

    Offset: red: (6, 3), green: (3, 3)

    Church

    Offset: red: (52, -6), green: (0, -5)

    Emir (Example of no upscale)

    from left to right: cropped L2 norm, non cropped L2 norm, noncropped L2 norm with no upscale. Used noncropped L2 norm with no upscale. Offset: red: (57, 17), green: (-3, 7)

    Harvesters

    Offset: red: (124, 13), green: (59, 16)

    Icon

    Offset: red: (52, 21), green: (36, 16)

    Lady

    Offset: red: (112, 11), green: (51, 8)

    Melons

    Offset: red: (176, 7), green: (83, 4)

    Onion Church

    Offset: red: (108, 35), green: (52, 22)

    Sculpture

    Offset: red: (140, -26), green: (33, -11)

    Self Portrait

    Offset: red: (130, -5), green: (50, -2)

    Three Generations

    Offset: red: (108, 7), green: (52, 5)

    Train

    Offset: red: (87, 32), green: (42, 5)