ALGORITHMS FOR STEREOSCOPIC IMAGING

True 3-D is easier than it looks

This article contains the following executables: STEREO.ARC

Victor J. Duvanenko and W.E. Robbins

Victor is a member of the technical staff at Truevision in Indianapolis, Indiana. He can be contacted at victor@truevision.com.


Humans, as well as many other animals, have two eyes that enable us to detect a certain range of frequencies reflected off or transmitted from surrounding objects. In other words, we have two sensors, offset horizontally by about 2.5 inches, that allow us to see and judge distances (depth) fairly precisely.

The ability to precisely judge depth comes in handy when manipulating objects with our hands. It's also beneficial to have a bit of redundancy in the visual system. Thus, our brain sees the world through our two eyes from two slightly offset perspectives. We learn to merge the two disparate views into a single image with added depth detail via triangulation.

The mechanism for extraction of depth information from two offset inputs is illustrated in Figure 1. It's easy to see that the further away an object is, the less disparate the two views (one from each eye) become. This is easy to verify by placing a finger close to your eyes, and then quickly opening and closing left and right eyes in alternating fashion. The finger appears to move a lot. However, when you fully extend your arm, moving the finger and repeat the procedure, your finger doesn't seem to move as much. Our brain learns to interpret the amount of disparity between the two views as distance (or depth). Thus, only objects that are very far away will be perceived to have little or no disparity. This ability is commonly referred to as "stereoscopic vision" (or, more formally, "binocular disparity").

The ability to present distance or depth information in a natural fashion is absent from most current-generation computer-graphics environments, which usually present only a single view of the world, as if the viewer had only one eye. Thus, the visual sense is not fully exploited, and stereoscopic skills are not utilized. The amount of information that can be effectively communicated is reduced, although many applications would greatly benefit from an increased bandwidth.

Depth information is especially useful when dealing with three-dimensional objects. By presenting depth information, a more natural human-computer interaction occurs, with higher communication bandwidth and improved understanding. In fact, the need for object rotation and movement becomes less critical, since it's no longer necessary for understanding of depth relationships, but is only needed to reveal obscured sections. Depth is an integral part of the three-dimensional space we live in. Only when depth is presented to the viewer will the visual experience offered by computers be indistinguishable from that of the real world.

True 3-D vs. Conventional 3-D

Stereoscopic, or true 3-D image-generation methods differ from conventional 3-D techniques. The 3-D techniques most computer-graphics books describe are really less than 3-D, because much depth information is lost when the 3-D image is projected onto a 2-D screen. The only depth cues that remain are the relative order of objects from the viewer (that is, one object obscures another that is behind it), limited depth information in the perspective sizing (for example, objects and their features get smaller the further they are from the viewer) and shadows. These cues provide limited depth information, which in many cases must be augmented by motion to furnish more detail about the objects. The viewer then gathers depth information from movement of the objects and constructs a mental picture of the presented scene.

For example, if you're presented with several nonoverlapping objects of various sizes, you can't tell whether the smaller ones are further away or are simply smaller. It's also impossible to tell whether the objects are 20 inches or 20 feet away from you. This effect is especially pronounced with spherical objects (planets of the solar system), since there's little, if any, perceived perspective sizing. The only way to understand such a scene is to view it in motion from an angle. For complex scenes, motion is difficult to achieve and, unless real-time motion is used, the results aren't very useful or interesting.

True 3-D, on the other hand, provides more-detailed depth information, augmenting perspective sizing and object occlusion. When several objects are presented, you immediately, and almost instinctively, surmise the depth relationships. This additional insight is particularly useful when viewing objects not common in everyday life, or when objects exhibit uncommon, and possibly confusing, relationships. In other words, if what is being viewed is common or easily understood, monoscopic techniques may be sufficient. More complex objects call for more complex viewing techniques to minimize confusion and maximize comprehension. In any case, monoscopic 3-D techniques are less than true 3-D, and are referred to as "2 1/2-D" by stereoscopic enthusiasts.

In this article, we'll explore stereoscopic presentation, providing an overview of both the hardware and software requirements. We'll then present algorithms--and implement them in C--for generating the left-eye and right-eye views fundamental to stereoscopic viewing. This article is based on our work with the scanning-tunneling microscope (STM) data from North Carolina State University's Precision Engineering Center. The STM is capable of measuring properly prepared samples of materials at atomic levels and can sense differences in vertical height of 0.01 angstrom. A sample of material is scanned by the STM, providing the vertical-height information at every regularly spaced sample point. The result is an array of 3-tuples (x,y,z) that represents the surface of a sample at the atomic level.

The application that reads and displays an STM file was written in mid-1990 on a uVAX under VMS, displayed images on a LEX/90 24-bit/pixel stereo-capable display/accelerator, and used a menu-driven, text user interface. The complete source code that implements the system is available electronically; see "Availability," page 5.

Stereoscopic Hardware

Stereoscopic hardware is necessary for the proper delivery of the two disparate (left- and right-eye) images to the respective eyes. One approach time multiplexes the left- and right-eye views (that is, alternates between them on every frame) on a single display. The views are demultiplexed (separated) through a pair of shutter glasses worn by the user. While the left-eye image is shown on the display, the left-eye lens is transparent, and the right-eye lens is opaque. When the right-eye image is shown, the opposite occurs. This technique requires the glasses to be synchronized with the computer display. StereoGraphics (San Rafael, California) manufactures infrared-controlled LCD shutter glasses that allow several people to view the display simultaneously.

Another technique is to place a polarizing LCD panel in front of the screen. This polarizes the right-eye frame in one direction and the left-eye frame 90 degrees off. The user wears passive glasses that have the left-eye lens polarized 90 degrees from the right-eye lens. Tektronix (Beaverton, Oregon) manufactures such LCD panels.

With both techniques, the user's left eye sees only the left-eye image, whereas the right eye sees only the right-eye image. If the images are displayed fast enough, the human brain fuses the two views, and the depth information becomes naturally evident.

Both techniques use a single monitor capable of displaying frames at rates of 90 to 120 frames per second. This ensures that each eye is presented with a refresh rate high enough to avoid flicker, since each eye sees its corresponding image at half the total rate. However, by running at higher vertical rates, you lose resolution, because most display hardware has an upper limit on pixel-output bandwidth. Thus, when the vertical refresh rate doubles for stereoscopic viewing (from 60 to 120 Hz), the resolution has to be halved to keep the pixel-output bandwidth constant. Also, the apparent brightness of the image is reduced due both to passing through the glasses and to each eye seeing an image for only half of the time and seeing nothing for the other half.

Another approach uses a head-mounted display (HMD) consisting of active LCD display glasses that display the left-eye view on the left lens of the glasses and right-eye view on the right lens.

One low-cost hardware configuration for stereoscopic viewing consists of a display capable of high vertical-refresh rates (preferably 120 frames/second), a pair of active LCD shutter glasses with synchronization hardware, and a stereo-ready graphics card capable of displaying 120 frames per second, switching between two images during every vertical screen retrace, and sending a synchronization signal to the controller of the LCD glasses. Stereo-ready graphics cards are available from Truevision (Indianapolis, Indiana) and other vendors.

Stereoscopic Software

Many rendering software packages are currently available. As long as the software is capable of generating a single-perspective view of a scene, it is sufficient for generating stereoscopic views. However, a particular package may not provide enough information or have enough flexibility to generate proper stereoscopic views. Then the process becomes that of trial and error, which may lead to improper results. There's actually no need to create or use a full rendering package--wire-frame images produce very dramatic results--since the addition of depth information enhances the outcome. (If you succeed at displaying wire-frame images, try closing one eye to see what you have been missing!)

Once the two views have been rendered, both must be loaded into display memory, but only one of them should be visible at a time. The software must then be capable of toggling from one image to the other during the vertical blanking period. Most graphics cards allow for two images to be loaded at once and for fast toggling between them. The software must be capable of detecting the vertical retrace interval. This can be accomplished by either servicing a vertical retrace interrupt or by polling a line counter in the display controller. During the vertical blanking period, the software must flip to the other-eye image and synchronize the glasses. The software must know which is the left-eye image and which is the right-eye image, and allow you to see only the left-eye image when the left-eye lens is transparent. If the two images are displayed backwards, you'll most likely be incapable of fusing them into one and be very uncomfortable. (The effect is equivalent to swapping the left and right eye on your head.)

Stereoscopic Algorithms

At this point, we'll discuss the steps, algorithms, and mathematics necessary for generating stereoscopic images. Most of the methods are based on those described by Larry Hodges in his PhD thesis, "Technologies, Applications, Hardcopy and Perspective Transformations for True Three-Dimensional CRT-Based Display" (North Carolina State University, 1988). Keep in mind that the data set we're working with is a uniform 2-D 200 x 200 grid with a Z value (height) given at each grid point.

To generate perspective stereoscopic views of the scene, we'll perform these steps:

  1. Make the viewing transformation matrix.
  2. Compute shading values and normals of each triangle.
  3. Transform interocular distance from the physical coordinate system to the viewing coordinate system.
  4. Make the left- and right-eye transformation matrices.
  5. Concatenate all matrices needed for the left-eye transformation into a single matrix.
  6. Transform, homogenize, clip, and render the left-eye view.
  7. Repeat steps #5 and #6 for the right-eye view.
  8. Display alternating views during successive frames.

Viewing Transformation

To simplify perspective projection, clipping, depth sorting, and stereoscopic-image generation, you need a viewing transformation that places the viewer reference point (VRP) at the origin, viewing down the negative Z-axis. In other words, the first step is to compute the transformation matrix necessary to move the viewer down to the origin, looking down the negative Z-axis. This transformation is explained as a four-step process by Foley and van Dam in Fundamentals of Interactive Computer Graphics (Addison-Wesley, 1984, pp. 258-261, 274-276). The technique is not efficient, but it is clear and performed only once per image. (See make_composite_viewing_matrix() in Listing Two (page 76) for implementation details.) This computation need be performed only when the viewer moves.

Homogeneous matrix form (4x4) was used for simplicity and efficiency. This method is beautiful, as it allows for accumulation of a series of transformations into a single matrix. For example, it's possible to construct a single matrix that not only performs the viewing transformation, but also the scaling to the device/display coordinates. Thus, each sample point in the original image needs to be multiplied by a 4x4 matrix only once to transform it all the way to the device/display coordinates.

Some computational optimizations were performed here. For instance, instead of handling each triangle separately, the entire image was transformed. This reduced the amount of computations by six, since every sample point generates two triangles, and all of the vertices are shared between neighboring triangles. The multiplication of a vector (the sample point, 1x4) by a matrix (4x4) is computationally intensive: 16 multiplications and 12 additions per sample point (a total of 640,000 multiplications and 480,000 additions). Also, three divisions, necessary to bring a point back to 3-D from the homogeneous space, have been replaced by one reciprocal and three multiplications. This should execute faster on most floating-point platforms.

More Details.

Shading

Every triangle in the uniform STM grid was flat shaded (but other shading algorithms can easily be used). For every sample/grid point (the grid is 200X200), two triangles are created. The intensity/color of each triangle is based on the angle between its normalized normal and the normalized negative light-source direction vector (a normalized vector from the origin to the light-source position). Since both vectors have been normalized (and thus have a magnitude of unity), the angle between them is obtained by computing the dot product. The result will be between -1 and 1. For simplicity, all negative values were set to 0, implying that all triangles facing away from the light source were rendered as dark as possible. Thus, the resulting triangle, shade values were between 0 and 1. The range of possible display intensities was known to be between 0 and 255. Therefore, the final triangle-shading value was obtained by multiplying the shade value with 255.

For a 200x200 grid of sample points (or a total of 40,000 samples ), 80,000 triangle normals must be computed. Each triangle normal is obtained by computing the cross product of two adjacent edge vectors. (This is how a normal to a plane is found, and triangles are guaranteed to be planar, since it takes a minimum of three points to define a plane.) For example, for a grid point P[i,i] the normal to the first triangle is obtained by computing (P[i,i] P[i+1,i])X(P[i,i]P[i+1,i+1]); and for the second triangle by computing (P[i,i]P[i+1,i+1])X(P[i,i]P[i,i+1]). A general cross-product formula would require performing three subtractions per edge to compute the edge vectors and then six multiplications and three subtractions, for a total of nine subtractions and six multiplications. Thus, for 80,000 triangles 720,000 subtractions and 480,000 multiplications are required, which would take a noticeable amount of time even on the best of current workstations.

Several computational optimizations are possible due to the restrictive structure of the STM database. The grid of samples is uniform in both X and Y, and is X- and Y-axis aligned. Thus, the X and Y components of the vectors are known in advance, and need not be computed for each triangle. Only the Z component has to be computed. Also, for vectors between successive-row grid points (for instance, P[i,i]P[i+1,i]), the X component is 0; for vectors between successive-column grid points (for example, P[i,i]P[i,i+1]), the Y component is 0. This leads to several simplifications in the cross-product computations, as the 0 components do not contribute to the result and can be eliminated. The result is that only five subtractions and four multiplications are performed for every pair of triangles (or a total of 200,000 subtractions and 160,000 multiplications--a 3.6-times reduction in subtractions and 3.0-times reduction in multiplications). See procedure compute_normals() in Listing Three, page 76. However, these optimizations produce results that are not general (that is, specific for the STM database structure).

Several floating-point operations per triangle are also necessary to normalize the normal vectors: three multiplications, two additions, three divisions, and one square root. To compute the dot product between the normalized light-source direction vector and the normalized triangle, normal requires three multiplications and two additions. It would be an interesting exercise in vector mathematics to try and reduce the number of operations required.

Transforming Interocular Distance

Once a scene can be generated from a single viewpoint, the move to stereoscopic scene generation is not difficult--two scenes must be generated from two slightly different perspectives. The trick is to decide how far apart to place the two viewpoints--the two eyes.

On average, the distance between human eyes (interocular distance) is 2.5 inches. This implies that when you look at any object, the views that each eye sees will differ by no more than 2.5 inches. This interocular distance stays nearly constant for each of us. We get used to it and derive depth information based on it. The further away an object is, the more similar the two views are; the closer an object is, the more disparate the two views.

Most computer-graphics books describe the rendering pipeline as a sequence of steps that ends with drawing a pixel on the display device. Pixel aspect ratio, gamma, and the color/shade capabilities of the display device may be the only other parameters considered. Several additional parameters must be considered when generating stereoscopic images. These parameters move beyond the pixels of the display and into your physical world.

Two images (a "stereopair") must be generated, one for each eye. However, your interocular distance (eye-to-eye spacing) is known only in the physical-coordinate system (about 2.5 inches). This distance needs to be transformed into (brought into) the viewing-coordinate system where the object being viewed lies. The result will be the interocular distance in the viewing-coordinate system, which will correspond to 2.5 inches in the physical-coordinate system.

Foley and van Dam refer to the transformation from the viewing-coordinate system to the device/display-coordinate system as "window-to-viewport scaling." The transformation from the device-coordinate system to the physical-coordinate system depends on the pixel size of the physical display. Since the interocular distance is measured along the X-axis, only the pixel width is significant (and this is obviously monitor dependent).

To map the interocular distance from the physical-coordinate system to the viewing-coordinate system, it must first be mapped to the device-coordinate system (pixels or dots). In other words, inches are mapped to pixels (and dots per inch, or dot pitch, varies between monitors). For instance, the dot pitch of the LEX/90 monitor is 0.021 inches/dot. Thus, 2.5 inches are equivalent to 119 dots (pixels). The 119 pixels is the interocular distance in the device-coordinate system. The 119 pixels must now be mapped to the viewing-coordinate system. The horizontal scale factor is the window width divided by the viewport width. Multiplying the 119 pixels by this scale factor will transform the 2.5 inches in the physical-coordinate system to the viewing-coordinate system. This computation is shown in set_per_projection_plane() in Listing Six, page 78.

Of course, your physical distance from the display needs to be transformed into the viewing-coordinate system as well. This is done in set_per_projection_plane(). These two transformations bring you from the physical world into the viewing-coordinate system (once again, where the object is). The set_per_projection_plane() procedure suggests the distance of the projection plane in the viewing-coordinate system (which corresponds to 18 inches in the physical world).

Now that we have the interocular distance and your distance from the display transformed into the viewing-coordinate system, the left- and right-eye view transformation matrices can be formed. Larry Hodges presents a simple technique whereby you first look down the Z-axis, and your eyes are centered around the origin (along the X-axis). Thus, the left eye is at coordinate (e/2, 0, 0) and the right eye is at (-e/2, 0, 0), where e is the interocular distance in the viewing-coordinate system. Secondly, the two views are generated by using the TPT{-1} model (translate, project, translate back). By using homogeneous (4x4) matrices, a single (4x4) transformation matrix results. The left-eye view transformation consists of a concatenation of three operations: translation to the right by e/2 distance, projection, and translation to the left by the same amount (T[R]PT[L]); see Figure 2. The display_3D_data() procedure (see Listing One, page 76) follows this model by creating first a left-eye (4x4) transformation matrix, then a right-eye matrix.

Concatenating All Matrices into One

The display_3D_data() procedure then combines the left-eye matrix with the viewing-transformation (4x4) matrix to produce a single (4x4) matrix that contains the complete transformation that each vertex of the 2OOx2OO grid must undergo. Once a single (4x4) transformation matrix has been obtained, transform_and_homogenize_image() (Listing Four, page 78) is called, transforming the 2OOx2OO grid (or 40,000 vertices). The result is homogenized (X, Y, and Z are divided by W) to bring the points back into 3-D from the homogenous space.

At this point, render_image() (Listing Five, page 78) is called, which takes transformed grid with shading information for each of the 80,000 triangles, clips the triangles to the viewport, and renders it (utilizing the LEX/90 polygon command). Clipping is done very simplistically--the triangle is rendered only if it is entirely inside the viewport.

Displaying Alternating Views

The LEX/90 has two display buffers (double-buffered). It is capable of displaying either of the two buffers and of alternating between the buffers on each frame. The left-eye image was rendered into display-buffer one, and the right-eye image was rendered into display-buffer two. Then LEX/90 is set into the alternating buffer mode, and--stereo! The LEX/90 displays the left-eye image in one frame, then displays the right-eye image, then the left-eye image, and so on, 60 times per second.

To view this image, we used a crusty pair of piezoelectric shutter glasses synchronized with the vertical sync pulse of the LEX/90 display. Thus, the left eye saw only the left-eye image and the right eye saw only the right-eye image. The display flickered a little, since each eye saw a picture at 30 Hz (frames/second), but the result was sufficient to get quite a few "Wow!"s.

Rolling Your Own Stereopair Images

Barr E. Bauer

Barr is a research scientist at Arris Pharmaceutical and can be contacted at 385 Oyster Point Blvd., South San Francisco, CA 94080.

In my work as a pharmaceutical researcher, I find stereo imaging to be not only useful, but essential because true 3-D presentation is often the only way to clearly present results. In other words, when viewed in conventional 3-D or 2-D, DNA structures (such as that on the cover of this magazine) can lose much of their structural content.

Workstations like Silicon Graphics' (Mountain View, California) Indigo provide real-time 3-D effects directly on the monitor. Vector objects can be displayed depth-cued, whereby uniform fading along the screen Z-axis gives a good sense of front to back. The pseudo-3-D effect is magnified when the object is rotated or when the fading is dramatic. Interior objects can be highlighted by making them less bright than objects in front, maintaining a clear what's-in-front distinction. The depth-cued display sometimes inverts in the mind's eye, making the bright part of the display appear to be in the rear. This curious side effect occurs at random and can complicate presentations. The Indigo can display solid-shaded objects, too, which also gives a good pseudo-3-D effect.

CrystalEyes from StereoGraphics (San Rafael, California) uses a special monitor that alternates the display of left- and right-eye views to achieve real-time stereo. Liquid-crystal goggles synched to the monitor through an infrared emitter separate the displays so that each eye sees only the image appropriate for it. The stereo effect is startling: Objects appear to float in front of the screen and can be rotated in real time. On the downside, the hardware is costly, significant programming must be done to support the graphics, the picture quality is degraded due to fewer display lines, and the manipulation of 2-D menus presents a programming challenge. Despite all this, CrystalEyes can show stereo to up to four people in front of a monitor and up to 20 people with projection.

Stereo slides enable stereo presentation to larger crowds or when maximum picture quality is desired. Separate left- and right-eye views are photographed, then projected through polarized filters and viewed with polarized glasses, one eye at 90 degrees with respect to the other. The images are rotated six degrees with respect to each other along the screen's Y-axis. Internal coordinate frames must be normalized to the screen frame before the two images are rotated and photographed, otherwise the resulting projected stereo images can be very distressing to look at. The result is a static image that looks good but requires careful image alignment, and it often takes a moment for the audience to "see" the stereo effect. The polarizing filters steal light intensity the farther you sit from the central axis of the screen, making the image appear dim to those sitting in the front corners of the room. The screen can fool camera exposure control often enough that multiple exposures and careful intensity matching give the best stereo image. If one of the pair is noticeably different in exposure, the stereo effect becomes more difficult to see. Regardless, this method is inexpensive and portable. I combine depth-cuing with stereo slides for a double effect that accentuates the stereo.

Finally, stereo images can be put in print as stereo pairs, even with line art. The simplest picture that minimizes unnecessary overlap provides the clearest 3-D image. The pairs can be viewed with glasses, or by defocusing your eyes and merging the two images into one. Figure 3, for instance, is a stereo triplet from AI software used to recognize 3-D molecular shape developed at Arris Pharmaceutical Corp. To see in stereo, follow these steps:

Most people cross their eyes; if you are "wall-eyed" (eyes turned outward), look instead at the left and center images. Beware, however, that this can give you a severe headache if you look long enough.



_ALGORITHMS FOR STEREOSCOPIC IMAGES_
by Victor Duvanenko and W.E. Robbins


[LISTING ONE]


display_3D_data( proj )
int proj;
{
   double  Tw[4][4], S[4][4], Td[4][4], Ry[4][4], Per[4][4], tmp[4][4];
   double  Left[4][4], Right[4][4];
   double  e_v,      /* interocular distance mapped back to the view  coord. */
           e_w;      /* interocular distance mapped back to the world coord. */
   printf( "Computing normals for shading.  " );
   compute_normals();      /* and the dot products for each triangle */
   printf( "Done.\n" );
   /* Perspective projection must use three steps:
      1)  Compute normals and project,
      2)  Divide by W (homogenize)
      3)  Transform into the device coordinates. */
   if ( proj == PERSPECTIVE )
   {
      /* map physical interocular distance into the view port coordinates. */
      e_v = INTEROCULAR_DISTANCE / PIXEL_WIDTH;      /* e_v == pixels */
      /* map from the viewport coordinate system to the world. */
      e_w = e_v / (( x_right_d - x_left_d ) / ( x_right - x_left ));
      set_to_identity( Left,  4 );
      set_to_identity( Right, 4 );

      /* Use the Translate, Project, Translate back model. */

      /* Create the Left eye transformation matrix. */
      set_to_identity( Tw, 4 );    /* translate the left eye to the origin */
      Tw[3][0] = -e_w / 2.0;
      matrix_mult( Left, 4, 4, Tw, 4, 4, tmp );
      /* Create the perspective projection matrix. */
      set_to_identity( Per, 4 );
      Per[2][3] = 1.0 / proj_plane;          /* 1/d */
      Per[3][3] = 0.0;
      matrix_mult( tmp, 4, 4, Per, 4, 4, Left );

      Tw[3][0] = e_w / 2.0;             /* translate back */
      matrix_mult( Left, 4, 4, Tw, 4, 4, tmp );
      copy_matrix( tmp, Left, 4, 4 );

      /* Create the Right eye transformation matrix. */
      set_to_identity( Tw, 4 );    /* translate the right eye to the origin */
      Tw[3][0] = e_w / 2.0;
      matrix_mult( Right, 4, 4, Tw, 4, 4, tmp );
      /* Create the perspective projection matrix. */
      set_to_identity( Per, 4 );
      Per[2][3] = 1.0 / proj_plane;          /* 1/d */
      Per[3][3] = 0.0;
      matrix_mult( tmp, 4, 4, Per, 4, 4, Right );

      Tw[3][0] = -e_w / 2.0;             /* translate back */
      matrix_mult( Right, 4, 4, Tw, 4, 4, tmp );
      copy_matrix( tmp, Right, 4, 4 );
#if 0
      printf( "Transforming, projecting and homogenizing the image.  " );
      transform_and_homogenize_image( image, tr_image, Tm );
      printf( "Done.\n" );
#endif
   }
   /* Create the world to device transformation matrix. */
   /* Create the translation matrix. Translate only in X and Y. */
   set_to_identity( Tw, 4 );
   Tw[3][0] = -x_left;   Tw[3][1] = -y_bottom;   Tw[3][2] = 0.0;
   /* Create a uniform scale matrix. */
   set_to_identity( S, 4 );
   S[0][0] = ( x_right_d - x_left_d   ) / ( x_right - x_left   );
   S[1][1] = ( y_top_d   - y_bottom_d ) / ( y_top   - y_bottom );
   S[2][2] = ( z_back_d  - z_front_d  ) / ( z_back  - z_front  );

   matrix_mult( Tw, 4, 4, S, 4, 4, tmp );
   copy_matrix( tmp, Tw, 4, 4 );

   /* Create the translation matrix. Translate only in X and Y. */
   set_to_identity( Td, 4 );
   Td[3][0] = x_left_d;   Td[3][1] = y_bottom_d;   Td[3][2] = 0.0;
   matrix_mult( Tw, 4, 4, Td, 4, 4, tmp );
   copy_matrix( tmp, Tw, 4, 4 );

   /* Since device/screen origin on LEX/90 is in upper left, we need to reflect
      Y and translate by screen height to place device origin in bottom left.*/
   set_to_identity( Ry, 4 );
   Ry[1][1] = -1.0;
   matrix_mult( Tw, 4, 4, Ry, 4, 4, tmp );
   copy_matrix( tmp, Tw, 4, 4 );

   set_to_identity( Td, 4 );
   Td[3][1] = SCREEN_HEIGHT;
   matrix_mult( Tw, 4, 4, Td, 4, 4, tmp );
   copy_matrix( tmp, Tw, 4, 4 );
   /* Now, Tw has the world to device/screen transformation matrix. */
   if ( proj == PARALLEL )
   {
      /* Beautiful!!!  Perform a single transformation of the image (for
         parallel projection). */
      printf( "Transforming, projecting and mapping the image onto screen." );
      matrix_mult( Tm, 4, 4, Tw, 4, 4, tmp );
      copy_matrix( tmp, Tm, 4, 4 );
      show_matrix( Tm, 4, 4 );
      transform_and_homogenize_image( image, tr_image, Tm );
      printf( "Done.\n" );
      printf( "Rendering the image.  " );
      render_image( LEFT_EYE );        printf( "Done.\n" );
   }
   if ( proj == PERSPECTIVE )
   {
      printf( "Transforming, projecting, homogenizing and mapping the " );
      printf( "Left image onto screen.  " );
      matrix_mult( Tm, 4, 4, Left, 4, 4, tmp );
      matrix_mult( tmp, 4, 4, Tw, 4, 4, Left );
      printf( "Left eye transformation matrix.\n" );
      show_matrix( Left, 4, 4 );
      transform_and_homogenize_image( image, tr_image, Left );
      printf( "Done.\n" );
      printf( "Rendering the Left eye image.  " );
      render_image( LEFT_EYE );        printf( "Done.\n" );

      printf( "Transforming, projecting, homogenizing and mapping the " );
      printf( "Right image onto screen.  " );
      matrix_mult( Tm, 4, 4, Right, 4, 4, tmp );
      matrix_mult( tmp, 4, 4, Tw, 4, 4, Right );
      /* Move the right eye view into the lower half of the buffer. */
      set_to_identity( Tw, 4 );
      Tw[3][1] = SCREEN_HEIGHT;                 /* move in device coord */
      matrix_mult( Right, 4, 4, Tw, 4, 4, tmp );
      copy_matrix( tmp, Right, 4, 4 );
      printf( "Right eye transformation matrix.\n" );
      show_matrix( Right, 4, 4 );
      transform_and_homogenize_image( image, tr_image, Right );
      printf( "Done.\n" );
      dszom( &0, &512, &1 );                   /* look at the lower half */
      printf( "Rendering the Right eye image.  " );
      render_image( RIGHT_EYE );        printf( "Done.\n" );
   }
}





[LISTING TWO]


make_composite_viewing_matrix( proj )
int proj;      /* PARALLEL or PERSPECTIVE */
{
   double p1[4], p2[4], p3[4], p_tmp[4];
   double T[4][4], Rx[4][4], Ry[4][4], Rz[4][4], C_tmp1[4][4], C_tmp2[4][4];
   double Per[4][4], d1, d2, d12, cos_ang, sin_ang;
   /* Initialize the three points */
   p1[0] = eye_pt.x;  p1[1] = eye_pt.y;  p1[2] = eye_pt.z;  p1[3] = 1.0;
   p2[0] = p2[1] = p2[2] = 0.0;  p2[3] = 1.0;
   p3[0] = p1[0] + vup.x;  p3[1] = p1[1] + vup.y;  p3[2] = p1[2] + vup.z;
                           p3[3] = 1.0;
   /* Magnitude of vector p1->p2 */
   d12 = sqrt( p1[0] * p1[0] + p1[1] * p1[1] + p1[2] * p1[2] );
   /* Create the translation matrix. */
   set_to_identity( T, 4 );
   T[3][0] = -p1[0];   T[3][1] = -p1[1];   T[3][2] = -p1[2];
   /* Translate the three points p1, p2, and p3 to the origin. */
   matrix_mult( p1, 1, 4, T, 4, 4, p_tmp );
   copy_matrix( p_tmp, p1, 1, 4 );
   matrix_mult( p2, 1, 4, T, 4, 4, p_tmp );
   copy_matrix( p_tmp, p2, 1, 4 );
   matrix_mult( p3, 1, 4, T, 4, 4, p_tmp );
   copy_matrix( p_tmp, p3, 1, 4 );

   d1 = sqrt( p2[0] * p2[0] + p2[2] * p2[2] );      /* length of projection */
   cos_ang = -p2[2] / d1;
   sin_ang =  p2[0] / d1;
   /* Create the rotation about Y-axis matrix. */
   set_to_identity( Ry, 4 );
   Ry[0][0] = cos_ang;  Ry[0][2] = -sin_ang;
   Ry[2][0] = sin_ang;  Ry[2][2] =  cos_ang;
   /* Rotate the three points p2, and p3 about the Y-axis. */
   /* p1 is at the origin after translation => no need to rotate. */
   matrix_mult( p2, 1, 4, Ry, 4, 4, p_tmp );
   copy_matrix( p_tmp, p2, 1, 4 );
   matrix_mult( p3, 1, 4, Ry, 4, 4, p_tmp );
   copy_matrix( p_tmp, p3, 1, 4 );

   cos_ang = -p2[2] / d12;
   sin_ang = -p2[1] / d12;

   /* Create the rotation about X-axis matrix. */
   set_to_identity( Rx, 4 );
   Rx[1][1] = cos_ang;   Rx[1][2] = sin_ang;
   Rx[2][1] = -sin_ang;  Rx[2][2] = cos_ang;
   /* Rotate the three points p2, and p3 about the X-axis. */
   matrix_mult( p2, 1, 4, Rx, 4, 4, p_tmp );
   copy_matrix( p_tmp, p2, 1, 4 );
   matrix_mult( p3, 1, 4, Rx, 4, 4, p_tmp );
   copy_matrix( p_tmp, p3, 1, 4 );
   /* Sanity check. */
   printf( "The view vector should be [ %lf %lf %lf %lf ]\n", 0.0, 0.0, -d12,
                                                              1.0 );
   printf( "The view vector is [ %lf %lf %lf %lf ]\n", p2[0], p2[1],
                                                       p2[2], p2[3] );
   d2 = sqrt( p3[0] * p3[0] + p3[1] * p3[1] );
   cos_ang = p3[1] / d2;
   sin_ang = p3[0] / d2;
   /* Create the rotation about Z-axis matrix. */
   set_to_identity( Rz, 4 );
   Rz[0][0] = cos_ang;   Rz[0][1] = sin_ang;
   Rz[1][0] = -sin_ang;  Rz[1][1] = cos_ang;
   /* At this point the translation, and all rotation matrices are known
      and need to be combined into a single transformaation matrix. */
   matrix_mult( T, 4, 4, Ry, 4, 4, C_tmp1 );
   matrix_mult( C_tmp1, 4, 4, Rx, 4, 4, C_tmp2 );
   matrix_mult( C_tmp2, 4, 4, Rz, 4, 4, C_tmp1 );
   copy_matrix( C_tmp1, Tm, 4, 4 );
}






[LISTING THREE]


void compute_normals()
{
   register i, j;
   point_3D_t p11_p21, p11_p22, p11_p12, cross_a, cross_b;
   double d, dx;              /* stepping distance (dx = dy) */
   double dx_sqrd;            /* dx^2 */
   int num_lines;
   point_3D_t na, nb;         /* normals to triangle A and B */
   dx = scan_sz / num_samples;

   num_lines = MIN( num_samples, MAX_IMAGE_SIZE );
   num_lines--;
   dx_sqrd = dx * dx;
   for( i = 0; i < num_lines; i++ )
      for( j = 0; j < num_lines; j++ )
      {
#if DEBUG
         printf( "P11 %f %f %f\n", image[i][j].x, image[i][j].y,
                                   image[i][j].z );
         printf( "P21 %f %f %f\n", image[i+1][j].x, image[i+1][j].y,
                                   image[i+1][j].z );
         printf( "P12 %f %f %f\n", image[i][j+1].x, image[i][j+1].y,
                                   image[i][j+1].z );
         printf( "P22 %f %f %f\n", image[i+1][j+1].x, image[i+1][j+1].y,
                                   image[i+1][j+1].z );
#endif
         p11_p21.z = image[ i + 1 ][ j ].z - image[ i ][ j ].z;
         p11_p22.z = image[ i + 1 ][ j + 1 ].z - image[ i ][ j ].z;
         p11_p12.z = image[ i ][ j + 1 ].z - image[ i ][ j ].z;
#if DEBUG
         printf( "dz11_21 = %f,  dz11_22 = %f,  dz11_12 = %f\n",
                 p11_p21.z, p11_p22.z, p11_p12.z );
#endif
         /* It's possible to eliminate one more multiplication in the
            computations below. */
         na.x = dx * ( p11_p21.z - p11_p22.z );
         na.y = dx * p11_p21.z;
         na.z = dx_sqrd;
#if DEBUG
         printf( "Na %f %f %f\n", na.x, na.y, na.z );
#endif
         nb.x = (-dx) * p11_p12.z;
         nb.y = dx * ( p11_p22.z - p11_p12.z );
         nb.z = dx_sqrd;
#if DEBUG
         printf( "Nb %f %f %f\n", nb.x, nb.y, nb.z );
#endif
         /* Normalize the normal vectors, since the intensity will be
            proportional to the angle between light source and the normal. */
         d = sqrt((double)( na.x * na.x + na.y * na.y + na.z * na.z ));
         na.x = na.x / d;
         na.y = na.y / d;
         na.z = na.z / d;
#if DEBUG
         printf( "Na %f %f %f\n", na.x, na.y, na.z );
#endif
         d = sqrt((double)( nb.x * nb.x + nb.y * nb.y + nb.z * nb.z ));
         nb.x = nb.x / d;
         nb.y = nb.y / d;
         nb.z = nb.z / d;
#if DEBUG
         printf( "Nb %f %f %f\n", nb.x, nb.y, nb.z );
#endif
         /* Compute the dot product between the light source vector and
            the normals (== to the angle between two unit vectors ).
            -1 <= cos( theta ) <= 1, which will be very useful. */
         image[ i ][ j ].sha = light_source.x * na.x + light_source.y * na.y +
                               light_source.z * na.z;
         image[ i ][ j ].shb = light_source.x * nb.x + light_source.y * nb.y +
                               light_source.z * nb.z;
      }
}






[LISTING FOUR]


transform_and_homogenize_image( s, d, tm )
point_3D_ex_t  s[][ MAX_IMAGE_SIZE ], d[][ MAX_IMAGE_SIZE ];
double *tm;         /* transformation matrix */
{
   register i, j;
   int  num_lines;
   double  p[4];      /* the point to be transformed */
   double  t[4], inv_W;

   num_lines = MIN( num_samples, MAX_IMAGE_SIZE );
   for( i = 0; i < num_lines; i++ )
      for( j = 0; j < num_lines; j++ )
      {
         p[0] = s[i][j].x;   p[1] = s[i][j].y;
         p[2] = s[i][j].z;   p[3] = 1.0;
         matrix_mult( p, 1, 4, tm, 4, 4, t );
         if ( t[3] != 1.0 )        /* divide by W (homogenize) */
         {
            inv_W = 1.0 / t[3];
            t[0] *= inv_W;   t[1] *= inv_W;  t[2] *= inv_W;
         }
         d[i][j].x = t[0];   d[i][j].y = t[1];   d[i][j].z = t[2];
         d[i][j].sha = s[i][j].sha;    d[i][j].shb = s[i][j].shb;
      }
}






[LISTING FIVE]


render_image( y )
int  y;
{
   register i, j;
   int    num_lines;
   short  v[6], intensity;

   num_lines = MIN( num_samples, MAX_IMAGE_SIZE );
   num_lines--;
   for( i = 0; i < num_lines; i++ )
      for( j = 0; j < num_lines; j++ )
      {
         v[0] = ROUND( tr_image[ i     ][ j     ].x );
         v[1] = ROUND( tr_image[ i     ][ j     ].y );
         v[2] = ROUND( tr_image[ i + 1 ][ j     ].x );
         v[3] = ROUND( tr_image[ i + 1 ][ j     ].y );
         v[4] = ROUND( tr_image[ i + 1 ][ j + 1 ].x );
         v[5] = ROUND( tr_image[ i + 1 ][ j + 1 ].y );
         /* Render triangle A */
         intensity = ROUND( tr_image[ i ][ j ].sha * (double)(NUM_SHADES - 1));
         if ( intensity > ( NUM_SHADES - 1 ))
            intensity = NUM_SHADES - 1;      /* saturate */
         if ( intensity < 0 )
         {
#if 0
            printf( "Triangle A,  intensity = %d\n", intensity );
            printf( "v11.x = %f, v11.y = %f, v11.z = %f\n",
                    image[i][j].x, image[i][j].y, image[i][j].z );
            printf( "v21.x = %f, v21.y = %f, v21.z = %f\n",
                    image[i+1][j].x, image[i+1][j].y, image[i+1][j].z );
            printf( "v22.x = %f, v22.y = %f, v22.z = %f\n",
                    image[i+1][j+1].x, image[i+1][j+1].y, image[i+1][j+1].z );
#endif
            intensity = 0;
         }
         if ( clip_to_viewport( v, 6, y ) == ACCEPT )
            dspoly( &intensity, v, &6 );

         v[2] = ROUND( tr_image[ i     ][ j + 1 ].x );
         v[3] = ROUND( tr_image[ i     ][ j + 1 ].y );
         /* Render triangle B */
         intensity = ROUND( tr_image[ i ][ j ].shb * (double)( NUM_SHADES-1));
         if ( intensity > ( NUM_SHADES - 1 ))
            intensity = NUM_SHADES - 1;      /* saturate */
         if ( intensity < 0 )   intensity = 0;
         if ( clip_to_viewport( v, 6, y ) == ACCEPT )
            dspoly( &intensity, v, &6 );
      }
}





[LISTING SIX]


display_3D_data( proj )
int proj;
{
   double  Tw[4][4], S[4][4], Td[4][4], Ry[4][4], Per[4][4], tmp[4][4];
   double  Left[4][4], Right[4][4];
   double  e_v,      /* interocular distance mapped back to the view  coord. */
           e_w;      /* interocular distance mapped back to the world coord. */

   printf( "Computing normals for shading.  " );
   compute_normals();      /* and the dot products for each triangle */
   printf( "Done.\n" );








Copyright © 1993, Dr. Dobb's Journal

Back to Table of Contents