1.3 Data
Types and Host Representations
Different
host computers have different ways of representing data internally. Some
machines are “big-endian”, meaning the high-order byte of an
integer is stored first in memory, while others are
“little-endian”, meaning the low-order byte is stored first .
Data
that are to be transferred between these machines must be byte-swapped. Most
machines use the IEEE floating point standard, but DIGITAL VAXes and Alphas
running the VMS operating system have their own standard. Some of the
IEEE-format machines are byte-swapped relative to each other. Data transferred
between these machines must be converted as well.
1.3.1 VICAR
File Representations
Conversion
among hosts would be greatly simplified if all data were stored in ASCII
instead of binary. However, that is inefficient in both time and space for
image data. Image data must be stored in a binary representation. The question
is, which one?
A
standard, canonical representation could be chosen, such as Sun format:
big-endian, IEEE floating point. That would simplify the file format, but would
lead to inefficient operation on other machines with different formats. Doing
processing locally on a VAX, every pixel would be converted to Sun format every
time it got read in or written out for every processing step. There wouldn't be
enough coffee in the world to keep you awake while waiting. Due to the huge
quantity of existing images written in VAX format, the canonical representation
would have to be VAX format, which is not desirable in the long run.
Since
most processing is done locally on one machine, and transfers between machine
architectures are comparatively less frequent, the solution is to use the
native format of whatever machine you are running on, and to identify that
machine in the image label. That way, local operations are done efficiently,
and conversion is done only when switching machines.
Applications
must
be able to do data format translations automatically. In order to ease the
burden, the following conventions have been adopted:
- Applications
shall be able to read files from any host representation.
- Applications
shall normally write files in the native host representation of the machine on
which they are currently running.
Placing
the burden only on reading greatly simplifies the writing, while still insuring
that the translations will take place in all cases. Some special-purpose
applications may choose to write in a non-native format on occasion; however,
all
applications must be able to read all formats, without exception
.
The
Run-Time Library relieves most of this burden. When the standard I/O routines
are called (
x/zvread
and
x/zvwrit),
the translations as stated above are performed automatically for the image
data. The application merely calls
x/zvread
and it receives the data in the native format, ready for processing. It calls
x/zvwrit,
and the data is written out in the native format (which is what the buffer is
in).
There
are three cases where applications will have to do their own conversion:
- Binary
labels: both headers and prefixes must be converted. See
Using Binary Labels.
- Array
I/O: Any program using Array I/O will get the data as it exists in the file,
without any translation. Applications using Array I/O are responsible for doing
their own data format translations on the data they read.
- Convert
OFF: It is possible for an application to turn off the RTL's automatic
conversion. This should not normally be done, but is available for special
cases. If this option is selected, the application must do its own translation.
The
x/zvtrans
family of RTL routines are used to translate. Do
not
attempt to write your own data format conversion routines, even if you think
it's only byte-swapping. Although at the present time byte-swapping is the only
integer conversion, this may not always be the case. Other integer
representations exist, such as one's-complement and sign-magnitude, that can
not be translated by a simple byte swap. By having only one set of conversion
routines, porting to a new platform with a different data format is easier.
x/zvtrans
translations are standardized, and thoroughly debugged. They are coded to be
efficient, especially for simple byte-swapping.