Arguably the most important aspect of an image acquisition system is how images are represented. The SANE approach is to define a simple yet powerful representation that is sufficient for vast majority of applications and devices. While the representation is simple, the interface has been defined carefully to allow extending it in the future without breaking backwards compatibility. Thus, it will be possible to accommodate future applications or devices that were not anticipated at the time this standard was created.
A SANE image is a rectangular area. The rectangular area is subdivided into a number of rows and columns. At the intersection of each row and column is a quadratic pixel. A pixel consists of one or more sample values. Each sample value represents one channel (e.g., the red channel). Each sample value has a certain bit depth. The bit depth is fixed for the entire image and can be as small as one bit. Valid bit depths are 1, 8, or 16 bits per sample. If a device's natural bit depth is something else, it is up to the driver to scale the sample values appropriately (e.g., a 4 bit sample could be scaled by a factor of four to represent a sample value of depth 8).
The SANE API transmits an image as a sequence of frames. Each frame covers the same rectangular area as the entire image, but may contain only a subset of the channels in the final image. For example, a red/green/blue image could either be transmitted as a single frame that contains the sample values for all three channels or it could be transmitted as a sequence of three frames: the first frame containing the red channel, the second the green channel, and the third the blue channel.
Conceptually, each frame is transmitted a byte at a time. Each byte may contain 8 sample values (for an image bit depth of 1), one full sample value (for an image bit depth of 8), or a partial sample value (for an image bit depth of 16 or bigger). In the latter case, the bytes of each sample value are transmitted in the machine's native byte order. For depth 1, the leftmost pixel is stored in the most significant bit, and the rightmost pixel in the least significant bit.
Backend Implementation Note A network-based meta backend will have to ensure that the byte order in image data is adjusted appropriately if necessary. For example, when the meta backend attaches to the server proxy, the proxy may inform the backend of the server's byte order. The backend can then apply the adjustment if necessary. In essence, this implements a ``receiver-makes-right'' approach.
The order in which the sample values in a frame are transmitted is illustrated in Figure 2. As can be seen, the values are transmitted row by row and each row is transmitted from left-most to right-most column. The left-to-right, top-to-bottom transmission order applies when the image is viewed in its normal orientation (as it would be displayed on a screen, for example).
If a frame contains multiple channels, then the channels are transmitted in an interleaved fashion. Figure 3 illustrates this for the case where a frame contains a complete red/green/blue image with a bit-depth of 8. For a bit depth of 1, each byte contains 8 sample values of a single channel. In other words, a bit depth 1 frame is transmitted in a byte interleaved fashion.
When transmitting an image frame by frame, the frontend needs to know what part of the image a frame represents (and how many frames it should expect). For that purpose, the SANE API tags every frame with a type. This version of the SANE standard supports the following frame types:
- SANE_FRAME_GRAY:
- The frame contains a single channel of data that represents sample values from a spectral band that covers the human visual range. The image consists of this frame only.
- SANE_FRAME_RGB:
- The frame contains three channels of data that represent sample values from the red, green, and blue spectral bands. The sample values are interleaved in the order red, green, and blue. The image consists of this frame only.
- SANE_FRAME_RED:
- The frame contains one channel of data that represents sample values from the red spectral band. The complete image consists of three frames: SANE_FRAME_RED, SANE_FRAME_GREEN, and SANE_FRAME_BLUE. The order in which the frames are transmitted chosen by the backend.
- SANE_FRAME_GREEN:
- The frame contains one channel of data that represents sample values from the green spectral band. The complete image consists of three frames: SANE_FRAME_RED, SANE_FRAME_GREEN, and SANE_FRAME_BLUE. The order in which the frames are transmitted chosen by the backend.
- SANE_FRAME_BLUE:
- The frame contains one channel of data that represents sample values from the blue spectral band. The complete image consists of three frames: SANE_FRAME_RED, SANE_FRAME_GREEN, and SANE_FRAME_BLUE. The order in which the frames are transmitted chosen by the backend.
In frames of type SANE_FRAME_GRAY, when the bit depth is 1 there are only two sample values possible, 1 represents minimum intensity (black) and 0 represents maximum intensity (white). For all other bit depth and frame type combinations, a sample value of 0 represents minimum intensity and larger values represent increasing intensity.
The combination of bit depth 1 and SANE_FRAME_RGB (or SANE_FRAME_RED, SANE_FRAME_GREEN, SANE_FRAME_BLUE) is rarely used and may not be supported by every frontend.