It has been observed that one's nose is never so happy as when it is thrust into the affairs of another, from which some physiologists have drawn the inference that the nose is devoid of the sense of smell. –– Ambrose Bierce, "The Devil's Dictionary"

Vision, Image Processing and Graphics

Introduction

During the Summer I worked on several computer vision projects most of which are still ongoing. I had time, nevertheless, to work in an example that would allow me to introduce myself and my work to you, my students.

As an introduction to my classes I wanted to give my students some useful hints as to when it would be prudent to approach me with a question, or comment. If you're observant you would see that the degree of alertness of your humble professor is in direct proportion to the displacement of his glasses along the curve of his nose. If it gets too low, there is not much sense in seeking help from me, because I might then be in more dire need of help myself. To help you visualize what I mean, I modified a portrait of my profile to display a graded scale on my nose in the risky zone. You will see it at the end of this document.

Computer science is very much like cooking for yourself. You have an more or less precise idea of something you would like to eat, and you must be hungry. Then, you look inside your refrigerator and see what's in your pantry. You fetch the ingredients that suit you the best, and see which pots, pans and utensils are left clean. You then apply a mixture of creativity, past experience, and good luck. After some time you get the results and you enjoy them.

My ingredients are this time a single picture of my profile. My kitchen is my computer, and my pots, pans and utensils will be for this time, Mathematica 6, Xfig, the GIMP, high school algebra, elementary calculus, and geometry. Of course, I'm also using my experience and whatever common sense I'm left. My hunger is my eagerness to get this application done in a reasonable amount of time with some decency in the presentation. I hope you enjoy it!

Acquiring data

We read in a picture onto which we would like to map a graded scale on the nose profile. Ginger took this picture with my little camera.

In[2]:=

"nariz_2.gif"

Out[2]=

"nariz_3.jpg"

The previous picture is scaled to 30% of the original, which is a big picture.

The graded scale was rapidly drawn with the OpenSource software Xfig, and then imported.

In[3]:=

"nariz_4.gif"

Out[3]=

"nariz_5.gif"

The data of the drawing looks like this. Notice that it's given in RGB triples with the number 255 coding for white, and 0 for black.

In[4]:=

"nariz_6.gif"

Out[4]=

"nariz_7.gif"

In[5]:=

"nariz_8.gif"

Out[5]=

"nariz_9.gif"

In[6]:=

"nariz_10.gif"

Out[6]=

"nariz_11.gif"

The picture includes much more than just the numerical data. It has rules and other meta-data. Here we take the RGB numbers and discard the rest.

Similarly, we separate the numerical data from the drawing.

In[7]:=

"nariz_12.gif"

Out[7]//Shallow=

"nariz_13.gif"

In[8]:=

"nariz_14.gif"

Out[8]=

"nariz_15.gif"

In[9]:=

"nariz_16.gif"

This is a plot of the numerical data corresponding to the first plane of the data matrix.

In[10]:=

"nariz_17.gif"

Out[10]=

"nariz_18.gif"

As you can see, we have now single numbers instead of the RGB triples.

In[11]:=

"nariz_19.gif"

Out[11]//MatrixForm=

"nariz_20.gif"

In[12]:=

"nariz_21.gif"

Out[12]=

"nariz_22.gif"

In[13]:=

"nariz_23.gif"

In[14]:=

"nariz_24.gif"

Out[14]=

"nariz_25.gif"

Mapping the graded scale on a cylinder

We would like to map the scale data on a cylinder with radius one. This requires only an elementary computation, but we need to draw a figure.

In[15]:=

"nariz_26.gif"

Out[15]=

"nariz_27.gif"

Imagine we're looking to the cylinder from above. The plane with the scale drawing looks as the blue line. From the figure above it's easy to see that , for a given value of z, the coordinates of a point (X,Y) on the image plane when projected to the cylinder are given by
"nariz_28.gif" and "nariz_29.gif", where r is the radius of the cylinder and X is the distance from the image plane to the origin. This is coded in Mathematica as follows:

In[16]:=

"nariz_30.gif"

When mapped the points look like this.

In[17]:=

"nariz_31.gif"

Out[17]=

"nariz_32.gif"

This looks allright but we would like to join the points with lines to achieve a more continuous effect.

In[18]:=

"nariz_33.gif"

Out[18]=

"nariz_34.gif"

This almost works but we see that the points originated by the numbers are giving us many undesired lines, so we remove the numbers from the data by taking only the region to the right of the numbers.

In[19]:=

"nariz_35.gif"

In[20]:=

"nariz_36.gif"

Out[20]=

"nariz_37.gif"

Mapping these data on the cylinder we get the following:

In[21]:=

"nariz_38.gif"

In[22]:=

"nariz_39.gif"

Out[22]=

"nariz_40.gif"

It still has many undesired segments. Notice that they are characterized by their long length. The segments we are interested in are small, going from neighbor to neighbor. There is a function "Split" in Mathematica that partitions a list according to a given criterion. We split them whenever the distance between two consecutive points is greater than, or equal, to 0.5.

We then take those lists and prepend the header "Line" to them, so that they can be interpreted as vertices of a polygonal line by the Graphics3d function.

In[23]:=

"nariz_41.gif"

Out[23]=

"nariz_42.gif"

We now look to the data from the side and use thicker, colored segments.

In[24]:=

"nariz_43.gif"

Out[24]=

"nariz_44.gif"

This is good, although we can perceive some gaps between segments of the vertical axis, and my nose does not really look like a cylinder. Is there a way to map the scale on a shape that looks more like a nose? The answer is, yes!

Finding the nose profile

We would like to extract the shape of my nose from the picture. To do that we will have to segment, or separate, the nose from the rest of the picture.

One way to do this is by classifying regions on the image according to their color. Since my nose has more or less the same color everywhere, with slight variations on intensity and saturation, we see that it's reasonable to work with the hue.

The original picture encodes color in the RGB (Red, Green, Blue) system. We need to convert these values to another color system called HSI (Hue, Saturation, Intensity). The following function translates RGB to HSI data. The algorithm is given in Digital Image Processing, by Gonzalez and Woods.

In[25]:=

"nariz_45.gif"

We will work with a subimage.

In[26]:=

"nariz_46.gif"

In[27]:=

"nariz_47.gif"

Out[27]=

"nariz_48.gif"

In[28]:=

"nariz_49.gif"

Out[28]=

"nariz_50.gif"

The subimage is still too big, so we take a portion of it. Notice the structure of the RGB data.

In[29]:=

"nariz_51.gif"

Out[29]=

"nariz_52.gif"

In[30]:=

"nariz_53.gif"

Out[30]=

"nariz_54.gif"

We now apply the RGB to HSI function described above.

In[31]:=

"nariz_55.gif"

In[32]:=

"nariz_56.gif"

Out[32]=

"nariz_57.gif"

Displaying the results we get the following figure. We see that indeed, the nose looks as being distinguished by reddish hues.

In[33]:=

"nariz_58.gif"

Out[33]=

"nariz_59.gif"

To segment the nose we need only apply a couple of thresholds, since we know that in HSI the reddish hues are in the vicinity of 0 (modulo 1). We use Mathematica's Manipulate function to create an interactive graphical interface to play around and find the right values.

In[34]:=

"nariz_60.gif"

Out[34]=

"nariz_61.gif"

To find the edge we will apply Sobel operators which are really difference operators that work very much like the vertical and horizontal derivatives of the image. They measure change, and the intensity in the thresholded image jumps from 0 in the background to 1 in the area of the nose, elsewhere it's constantly 0 or 1.

In[68]:=

"nariz_62.gif"

In[69]:=

"nariz_63.gif"

In[70]:=

"nariz_64.gif"

Out[70]=

"nariz_65.gif"

We now can apply another threshold this image. To select the best value we need to experiment, so we build an interactive interface.

In[38]:=

"nariz_66.gif"

Out[38]=

"nariz_67.gif"

It is now easy to get the coordinates of the edge points by using Mathematica 's Position function, which gives a list of coordinates of points having the value of 1.

In[71]:=

"nariz_68.gif"

In[72]:=

"nariz_69.gif"

Out[72]=

"nariz_70.gif"

Since we now know what to do with larger images, we clear a few unnecessary arrays from the memory, and bring a bigger subimage containing my nose.

In[73]:=

"nariz_71.gif"

In[74]:=

"nariz_72.gif"

Out[74]=

"nariz_73.gif"

In[75]:=

"nariz_74.gif"

In[76]:=

"nariz_75.gif"

Out[76]=

"nariz_76.gif"

We apply the same steps as above, only we use the values for the thresholds found in the previous experiments.

In[77]:=

"nariz_77.gif"

In[78]:=

"nariz_78.gif"

In[79]:=

"nariz_79.gif"

This is the extracted edge.

In[80]:=

"nariz_80.gif"

Out[80]=

"nariz_81.gif"

In[81]:=

"nariz_82.gif"

In[82]:=

"nariz_83.gif"

Out[82]=

"nariz_84.gif"

It has too many points, so we drop 100 points from the top, which don't look as very close to the real profile.

In[83]:=

"nariz_85.gif"

We now fit a continuous function to the data. I like the equation
"nariz_86.gif",
because I can control the steepness of the curvature in the bottom end by plugging in another number than 40 in the second term, and it has a nice, small curvature in the other end.

In[84]:=

"nariz_87.gif"

Out[84]=

"nariz_88.gif"

This is the "nose" polynomial.

In[85]:=

"nariz_89.gif"

Out[85]=

"nariz_90.gif"

We need this curve to vary consistently with our known ranges for x,y,z. Its variables range from 15 to about 750, and from 1 to 220; whereas we would like them to go from 0 to 1 and from 0 to 160, respectively.

We use a simple linear relation between both domains. It can be visualized as a segment of a straight line.

In[86]:=

"nariz_91.gif"

Out[86]=

"nariz_92.gif"


This line's equation is "nariz_93.gif" = "nariz_94.gif" + "nariz_95.gif", where

In[87]:=

"nariz_96.gif"

In[88]:=

"nariz_97.gif"

Out[88]=

"nariz_98.gif"

The other normalizing relation is "nariz_99.gif", where

In[89]:=

"nariz_100.gif"

In[90]:=

"nariz_101.gif"

In[91]:=

"nariz_102.gif"

The following code finds the coordinates of the black pixels in the scale data; the ones that we would like to map on the nose.

In[92]:=

"nariz_103.gif"

Now we use the normalizing relations in our mapping equation. The trick is to multiply the x coordinate found in the cylindrical mapping, by the shape information from the "nose" polynomial, which is made to vary with respect to the vertical coordinate in the scale data. You may as well say we're building a separable polynomial surface.

In[93]:=

"nariz_104.gif"

We partition the points list as above, leaving only chains of near neighbors.

In[94]:=

"nariz_105.gif"

To achieve a continuous effect in the scale's central axis (remember we didn't like the gaps) we find all points with y=33 in the scale image data.

In[95]:=

"nariz_106.gif"

And we define an uninterrupted polygonal line for the axis.

In[96]:=

"nariz_107.gif"

We now map all these lines on the 3D-space and project them on our screen as looked from a point in the right side looking to a point a little bit above the center of the nose. We clip the segments that are on the far side to be consistent with the fact that my nose is not transparent.

In[113]:=

"nariz_108.gif"

Out[113]=

"nariz_109.gif"

This is it! We write the figure on a file.

In[114]:=

"nariz_110.gif"

Out[114]=

"nariz_111.gif"

Using an OpenSource picture manipulating program (the GIMP), we blend the two images and get the final result. Now you can learn how to assess your professor's alertness by locating the position of his glasses on the scale on his nose.

In[115]:=

"nariz_112.gif"

Out[115]=

"nariz_113.jpg"


Created by Wolfram Mathematica 6.0  (26 July 2007) Valid XHTML 1.1!