3 Porting IBIS to Unix

Table of Contents

Contents


3 Porting IBIS to Unix
      3.1 IBIS-1 File Format
            3.1.1 Platform Dependence
            3.1.2 Intelligibility
            3.1.3 File Format Rigidity.
      3.2 IBISFIL Subroutine Library - Issues
            3.2.1 Robustness for Large Files
            3.2.2 Subroutine Interface
      3.3 New IBIS-2 File Format
            3.3.1 IBIS-2 System Label
            3.3.2 IBIS-2 Property Label
            3.3.3 The New IBIS-2 Subroutine Library
      3.4 Porting IBIS Programs
            3.4.1 Efficiency Considerations
            3.4.2 Obsolete Assumptions
            3.4.3 Stages of Porting an IBIS Program
            3.4.4 Making code "GRAPHICS-1 Friendly"

3 Porting IBIS to Unix

There are a number of good reasons why the current IBIS system and its file formats will need some revision prior to porting the IBIS programs within VICAR.

3.1 IBIS-1 File Format

The old IBIS (which we shall refer to as IBIS-1) file format itself poses a number of problems for the port to Unix :

3.1.1 Platform Dependence

The format is not currently platform-independent. The problem is similar to the "binary label" problem for VICAR images. The number-of-column information is stored in integer*4 format in the first line of data, even though the image label says that the data is BYTE, and most of the columns are to be read as REAL*4. Thus, if the file was VICAR COPY'ed on a foreign host machine the data would not be translated, even though the HOST has now become NATIVE. The same problem holds for GRAPHICS-1 files.

In addition, through the use of EQUIVALENCE statements, a lot of programs try to manipulate the same column space as INTEGER*4 data. At the very least, there needs to be some information in the label telling the subroutine library how to convert the binary contents of each column into the format of the local host.

3.1.2 Intelligibility

Let us suppose you run across some file, and looking at the VICAR label you see that it is a tabular file. What sort of file is it: a VIDS lookup table, a BIDR, or perhaps a set of tiepoints ? For that matter, what format should the data be interpreted as, and if a column has REAL*4 values which represent a measurement of something (say, distance), what units is it using (meters, feet, yards...). Currently no standard mechanism exists by which a user or a program could determine the format and nature of the contents of each column.

GRAPHICS-1 files have the additional problem that the dimension (NC) of the file is not stored in the file, so even the structure of the data is ambiguous. Theoretically, the "DIM" system label could have been used for this purpose, but this was never implemented.

3.1.3 File Format Rigidity.

The column elements of an IBIS file are all 4-bytes long, and by default are interpreted by the subroutine library as REAL*4. The limitations inherent within the current IBIS file format design has over the years encouraged some bad programming practices in some IBIS programs. These occur primarily as work-arounds. For example, many of the FORTRAN IBIS programs use EQUIVALENCE statements to convert between REAL*4 arrays and INTEGER. Also, a common practice is to try to stuff a long ASCII string into several successive 4-byte columns to fake a "text label" column. This should really be a single physical column, and other users that look at the file should not have to figure out what the strange REAL*4 values in those columns mean.

The expansion of the data format capability would contribute to better data representation in applications. If a particular column is to represent only byte pixel values (for example, in a color lookup table), it makes no sense to store the values as REAL*4, when BYTE columns would work just as well, and be more efficient (both in I/O and disk space) besides.

A number of IBIS applications work with rows more often than columns. There are currently users who have expressed interest in the ability to continuously append more and more rows onto the end of a pre-existing IBIS-format archive. It would be useful if the file format could be extended so that, for example, data was organized with "row-contiguous" data, allowing for a fast way to concatenate two IBIS files by row without taking a large I/O hit.

There is also no current ability for an IBIS program to operate on a GRAPHICS-1 file, even though both file formats represent two-dimensional rectangular arrays of data, and both share the "record" concept. The old VICAR programs PCOPIN and PCOPOUT are often called numerous times within an IBIS procedure to address this limitation, and their only function is to convert between "row-oriented" GRAPHICS-1 format and the analogous "column-oriented" IBIS-1 file format.

3.2 IBISFIL Subroutine Library - Issues

Since the IBIS-1 file format will need some work, it is natural that the "IBISFIL" and "IBISGR" subroutine libraries will also need to be redesigned. Independent of this, however, there are some other features of the subroutine library which require some redesign prior to porting the existing programs.

3.2.1 Robustness for Large Files

Many IBIS programs attempt to manipulate columns of data by reading the entire column or file into memory. If the IBIS file has, say, 300,000 rows of data (this is not uncommon), this step would require multiple megabytes of memory, just to (perhaps) swap a couple of columns.

This is because the IBIS subroutines which access columns will currently only read in whole columns at once. Users that have their own VAX platforms have been known to log on as SYSTEM, give themselves huge page file quotas and allocate all available virtual memory, just to be able to run a simple procedure on a single enormous IBIS file. This is not an intrinsic limitation of the IBIS file format, and should not be necessary.

3.2.2 Subroutine Interface

The old IBISFIL subroutine library is written in FORTRAN. There is nothing wrong with that, but owing to a lack of dynamic allocation facilities in FORTRAN, the file-I/O routines required that the client program provide and maintain a "working buffer" for reading and writing records. These implementation-specific structures should really be handled by the subroutines themselves, and the programs have no need to see them.

The ported subroutines will also require separate entry points for the C and FORTRAN calling sequences, and so since the new P2 subroutine libraries are distinct from the old P2SUB, we might as well clean up the interfaces while we're at it, to reflect the abstract model of the new IBIS-2 file format.

3.3 New IBIS-2 File Format

Without further ado, here is the proposed design specification for the new IBIS-2 file format:

An IBIS-2 file consists of one or more columns of various data types. The file itself may have a subtype, such as TABULAR, GRAPHICS, TIEPOINT, LUT, BIDR, or other types TBD.

The information describing each column is stored in the file's "IBIS" Property label. There is no limit on number of rows, except by the amount of addressable disk space available. The number of columns is constrained only by the limitations imposed by VICAR multi-valued label-lengths. For current purposes, the number of columns is restricted to be no more than 1024, but programs should not assume this limitation.

The column data format may be BYTE, HALF, FULL, REAL, DOUB, COMP, or an ASCII format specified by the strings "A1" through "A256". These last refer to CHARACTER*N data, where N is between 1 and 256. The ASCII data is stored in null-delimited form in the file (ie, so that A1 requires two bytes, etc). The data is stored in the VICAR binary header, allowing for the possibility of appending image data to the file.

In addition to having a format, a column may belong to several other defined column GROUPS, including one indicating what UNITs of measurement the column values are measured in. If the format of a column is not specified by a FMT_xxx label, it will be in a format specified by FMT_DEFAULT parameter.

The data may be stored by contiguous column values, or by contiguous row values. The location of the column data is indicated indirectly using byte-offsets.

The host format used for the column entries will be determined by the BHOST, BINTFMT and BREALFMT label entries of the VICAR SYSTEM label. In addition, the BLTYPE of the file will be set to "IBIS" to indicate the type of binary header data.

The IBIS file subtype indicates the presence or absence of various properties, formats or groups, which will evolve as new sub-types become defined. For example, a possible definition of the features common to the current GRAPHICS-1 file would be the following:

GRAPHICS file: An IBIS-2 file which possess (possibly named) columns containing line-sample coordinate values, commonly though not necessarily with contiguous row organization.

Notice that by this definition, many TABULAR files may be used as GRAPHICS files without conversion (i.e. the VICAR programs PCOPIN and PCOPOUT will be unnecessary).

TABULAR files need no longer be internally organized solely by contiguous column values, so that, for example, if a data table will primarily require row operations, it may be created with initially contiguous row values for more efficient disk IO. In addition, the physical order of the columns may not coincide with the logical order; this information is stored in the label, but should not cause too many difficulties, as the subroutine libraries will make logical column access transparent to the program and the user.

Note that GRAPHICS files, as special cases of TABULAR, may now be manipulated with the usual IBIS routines, such as MF, SORT, etc.

Remark: When the time comes to integrate the GRAPHICS-1 file format into the IBIS-2 configuration, consideration should be given to the manner in which polygons are delimited. Currently this is indicated by a set of zero coordinate values within the list, but if non-line-sample coordinate spaces are also desired (ie, zero-zero based), it may be necessary to replace the (0,0) delimiter in the new format with an additional "CONTROL_CODE" column, which would indicate whether a given set of coordinates is a VERTEX or simply an END-OF-DATA delimiter (other extensible codes may also be defined). In the process, this would automatically yield the capability of merging in the GRAPHICS-2 format, which contains more object-based information (e.g, there could be a CONTROL_CODE for ELLIPSE, with the next values being the major-minor axes and orientation. This is all TBD.

For an example of a generic new-version IBIS file, refer to figure 3.3.1.

Col# 1    Col#2   Col#3          Col#4         ... Col # 1022
REAL*4    BYTE    A10            DOUB              COMP_____________
3.0000    012    'image1.red'    2.345978454447    4.56 + i * 1.94
2.5800    128    'image1.grn'    6.392209944e+2    0.00 + i * 0.00
1.e-10    256    '@@@@@@@@@@'    0.000000000e+0    0.00 - i * 3.45
0.0000    000    'image3.blu'    0.0033040e-123    33.0 + i * 2.71
3.0000    012    'image1.red'    2.345978454447    4.56 + i * 1.94
2.5800    128    'image1.grn'    6.392209944e+2    0.00 + i * 0.00
1.e-10    256    '@@@@@@@@@@'    0.000000000e+0    0.00 - i * 3.45
0.0000    000    'image3.blu'    0.0033040e-123    33.0 + i * 2.71

Figure 3.3.1: A typical new IBIS file

3.3.1 IBIS-2 System Label

As the column data is stored in the binary header portion of the file, the data host representation will be represented by the "BHOST", "BINTFMT", and the "BREALFMT" label items in the VICAR SYSTEM label.

In addition, the BLTYPE of the file will be set to "IBIS" to indicate the format of the binary header.

Since the IBIS data is stored in the binary header, this opens the possibility of creating VICAR images with arbitrary image data formats and sizes, which also contain IBIS-2 information in the binary header. This may be useful for installing IBIS-format lookup tables into an image, creating GRAPHICS overlays for the image, etc. However, once an image has been appended to an IBIS file, the IBIS data may not be expanded (by adding columns, etc) beyond its current boundaries.

3.3.2 IBIS-2 Property Label

The internal structure of IBIS files will be fully specified by the "IBIS" Property label of the VICAR file. Any file which does not have an IBIS property label will be assumed by the subroutine libraries to be an old-format IBIS file and handled accordingly.

Here are the REQUIRED IBIS Properties for TABULAR and GRAPHICS1 type.

Property	Values	Descriptions

NC		Integer	Number of Active Columns
NR		Integer	Number of Active Rows
ORG		"COLUMN","ROW"	Contiguous Data Organization
FMT_DEFAULT	'BYTE','HALF'...	Default Column Data Format
SEGMENT	Integer	Smallest complete data group
BLOCKSIZE	Integer	# Usable bytes of each line
COFFSET	(OFF1,...OFFN)	Offset to Logical Column

Property NC: Number of Active Columns

The number of columns in which data is currently defined. There may be more unused space in the file, perhaps from a deleted column left in place.

Property NR: Number of Active rows

The number of rows in which valid data is currently defined. Note that there may be space available in the file for additional rows, which is currently unused.

Property ORG: Physical organization of contiguous data

In old IBIS files, the data was organized so that the elements of a column were contiguous, whereas in GRAPHICS-1 files the row (coordinate) elements were contiguous. This label item allows the specification of either organization for an IBIS file. In some applications where most operations involve rows, ORG='ROW' files may be more efficient than ORG='COLUMN'

Property FMT_DEFAULT: Default Column Format

This is an ASCII label item, indicating the format of all columns whose data formats are not explicitly set by a FMT_XXX label item. Its values may be either BYTE, HALF, ... or a string value of the form "An", where <n> is an integer between 1 and 256, indicating that any defaulted column will be ASCII, of size <n>.

Property COFFSET: This is an internal property used by the subroutine library, and usually won't be needed by the user. For more information on the internal representation of the IBIS-2 file format, see Appendix B.

Property BLOCKSIZE: This is an internal property used by the subroutine library, and usually won't be needed by the user. For more information on the internal representation of the IBIS-2 file format, see Appendix B.

Property SEGMENT: This is an internal property used by the subroutine library, and usually won't be needed by the user. For more information on the internal representation of the IBIS-2 file format, see Appendix B.

The following values are CONDITIONAL, and may or may not appear, depending on the data contents:

Property	Values	Descriptions

FMT_BYTE	Integer Array	List of Columns using BYTE data.
FMT_HALF	Integer Array	List of Columns using HALF data.
FMT_FULL	Integer Array	List of Columns using FULL data.
FMT_REAL	Integer Array	List of Columns using REAL data.
FMT_DOUB	Integer Array	List of Columns using DOUB data.
FMT_COMP	Integer Array	List of Columns using COMP data.
FMT_ASCII	Integer Array	List of Columns using ASCII data.
ASCII_LEN	Integer Array	Length of ASCII columns

Property FMT_BYTE,...FMT_ASCII: Column Formatting Declaration

These Optional labels specify an array of integers, consisting of the column numbers using the corresponding format. If a column is not explicitly set by one of these label items, it is in in the format defined by FMT_DEFAULT, above.

Property ASCII_LEN: ASCII character-length declaration

For the set of columns specified in the FMT_ASCII array, if any, this integer label item specifies the character length of the column; that is, it is the "CHARACTER*N" value of the column. If the FMT_DEFAULT is a particular ASCII format (e.g., "FMT_DEFAULT='A200'), then the defaulted ASCII columns do not appear in this list. Note that the file size of an N-character string is N+1, to allow for the null-delimiter.

User-definable IBIS Properties for TABULAR and GRAPHICS1 files: these are OPTIONAL and do not affect the file structure or data formats.

Propert      Values                  Descriptions

TYPE         'LUT','TIEPOINT'..      IBIS subtype 
UNITS        (String1,...StringN)    degrees, acres, kg*m/sec^2,  etc...
GROUPS       (String1,...StringN)    Line, Samp, Average,...
UNIT_<N>     (Col1, ...ColM)         Columns using unit UNITS[N]
GROUP_<N>    (Col1, ...ColM)         Columns belonging to GROUPS[N]
 

Property TYPE: IBIS Subtype Declaration

This is an ascii value indicating the subtype of the IBIS file, and may be found to be useful in setting up subclasses of IBIS files. For example, a lookup table may be defined as having certain named columns and formats, whose existence will be indicated by the TYPE property being set to "LUT".

Property GROUPS and UNITS: These properties allow client-defined subsets of columns, collectively known as groups. For the <N>th name in the GROUPS (or UNITS) list there is a corresponding integer-valued list called GROUP_<N>, (respectively UNIT_<N>). For example, a client program may wish to define column #1 as "LINE" and #2 as "SAMPLE", and then put both into a group "POSITION". To do this, the following IBIS properties may be added:

GROUPS=("LINE", "SAMPLE", "POSITION")
GROUP_1=1
GROUP_2=2
GROUP_3=(1,2)

Similarly, to define the units of some columns, the properties added could be:

UNITS=("METERS", "MILES_PER_HOUR")
UNIT_1=(3,4,5)
UNIT_2=6

Units differ from groups only in that a column is only allowed to have at most one unit, but may belong to many groups.

As will be seen, these group names may be used in place of explicit column numbers to access data within an IBIS-2 file. For example, instead of having to remember that the line coordinate is in, say, column number 23 of a file, the client program can simply query the file to see which column is in the group "LINE", and then read data from that column.

The name given to a group or a unit must begin with an alphanumeric character, but otherwise the characters following the first character may be ANY printable character except colons, up to 32 characters total. The complete list of characters that may be used are:

[a-z][A-Z][0-9] ~!@ # $ % ^ & * _ + - = (){}[]<> | ? ; , . " ' ` \/ 

Here are some valid examples:

kg*m/sec^2   
2  
a_long_Long_LOng_LONg_LONG_name! 
has_[brackets]_and_{braces}(etc) 
!@#$%^&*}\/?=
A   name    with  spaces

Here are some invalid examples:

%starts_with_NON_alphanumeric
has:colons:in:its:name
a_long_Long_LOng_LONg_LONG_Whoops!_TOO_long_name!
 

3.3.3 The New IBIS-2 Subroutine Library

In support of the IBIS-2 file format, an extended subroutine library has been written. This new library, which is called IBIS2, is designed to be a complete implementation of the file format, and should permit transparent access to the elements and subgroups of an IBIS-2 file. As will be described below, the old IBISFIL routines will be re-written as front-end to this library, but will not be able to handle the full range of new files possible. The IBIS2 routines will fully support the old format in "read" mode for backward-compatibility, but may return an error status in other modes if the operation requested is not permitted within the "old IBIS" paradigm (for example, attempting to set the data format of an old IBIS column to "COMP" or the like). The new library also has the ability to create and update old VAX-VMS formatted IBIS-1 files, but again, the library will not allow operations to be performed on the file which are not supportable by the IBIS-1 format.

3.4 Porting IBIS Programs

3.4.1 Efficiency Considerations

There are basically three different ways in which an IBIS file may be manipulated while in update or write mode:

mode 1: Modifying file architecture - ncols, nrows, or col. formats using IBISRow, IBISColumn Delete/New routines

mode 2: Modifying file values using the IBISRow,IBISColumn Read/Write routines

mode 3: Modifying file values using the IBISRecord routines

Operations performed within each mode will not cause substantial degradation of performance; however, switching back and forth between modes will likely force the subroutine library to perform file-I/O intensive operations to keep the record buffers in sync. This is the price for not storing the whole file in memory at once.

If speed is a premium, there are several things one can do to improve performance. First, if you are creating a brand new IBIS file, by default the IBISFileUnitOpen routine will initialize all of the column values of the file to zero; if you intend to set all the values in your program, you may turn off this feature by changing the AUTO_INIT attribute of the file to 'OFF'.

Also, if this is a special-purpose application, involving the development of an IBIS subtype, consideration should be given to whether the file is to organized by rows or columns:

If the application involves intense manipulation between a relatively small (less than 500, say) number of separate sets of data, and the data is all numeric, it is probably better to put each set of data in a separate column, and organize the file by COLUMNS. It is not expensive in I/O to create or delete additional columns for column-oriented files, and you have the ability to access each column by name, by putting them in separate GROUPs. The weakness of this approach is that appending additional rows will often be an I/O intensive operation. If for some reason you anticipate extending the length of a column at a later date, you can allow for this when your application creates the file by using IBISFileSet to set the NR attribute to a large number, open the file, and then using IBISFileSet to decrease the NR attribute back to the current value. This has the effect of allocating a much larger padded space in the file for each column, so that appending new rows later may be done "in file". For column-oriented files the IBISColumn Read/Write routines are the most efficient, although the IBISRecord routines are also very efficient.

If the application is intended to be a static index database, or for some other reason consists of a very large (more than 2000, say) number of data sets, the elements of the data sets are of different types (mixed ASCII, real, and integer), and are rarely going to be numerically combined with each other, then you should organize the file by ROWS. It is not expensive in I/O to append additional rows to a file of this type, and if you need to have a 'GROUP' capability, you can accomplish this by making on of the columns of the file an ASCII column, containing the names to be attached to each row. The weakness of this approach is that creating new columns will often be an I/O intensive operation. As with the column-oriented files, if you anticipate having to change the number of columns, you can expand the file with a larger 'NC' value first, and then decrease it after opening the file. However, it is not advisable to do this if your database file will contain a huge (more than 500,000, say) number of rows, as the file will be wasting disk space for each row. For row-oriented files the IBISRecord and IBISRow routines are the most efficient.

3.4.2 Obsolete Assumptions

Here is a list of some of the properties of old IBIS files, which the programmer may no longer depend upon:

The column data types will be INTEGER*4, REAL*4, or CHARACTER*4, whose format is not specified in the file, but imposed by the user or the program.

The data is stored by listing all values of column 1, padded out to 512 -byte records, followed by all values of column 2, etc.

The data is organized into 512 byte records, and is stored in the VICAR image data portion of the file.

The number of columns is no more than 40, and is stored in the first VICAR line of the file.

3.4.3 Stages of Porting an IBIS Program

The first level of support for the new file format will be done automatically for most IBIS programs with a planned new release of the IBISFIL subroutine library. The new release will essentially be a front-end to the IBIS2 subroutines, and will allow the old IBIS programs to handle, in addition to the old IBIS format, at least those new IBIS files which are in turn created by other old programs linked to the new libraries. The IBISFIL calling sequences will not be changed, and so all of the currently existing programs should be able to link to the new libraries without problem. The effect of all this should be that old IBIS program and procedures will still work.

Initially, the IBISFIL front-end will not create IBIS-2 format files, but will be able to read them. This is because several IBIS programs currently try to get around the limitations of the old IBISFIL routines by directly manipulating the IBIS files using VICAR xvread, xvwrit calls, and so will have to be re-written before they will be able to handle the new formats. Until these programs have been made forward-compatible, the IBISFIL front-end will not create IBIS-2 files by default.

The second stage of porting IBIS program will involve actually re-writing IBIS programs to use the IBIS2 calls directly -- there will be no IBISFIL routines in the p2sub library. At the very least, the new program should take advantage of the IBIS2 buffering routines which free the program from having to store an entire IBIS file in memory, and should exit gracefully if one of the extended type IBIS files is encountered (for example, IBIS files with non-4 byte columns). This step will, of course also include the usual steps for porting any VICAR program to UNIX. Note that this means that EQUIVALENCE statements for converting between real and integer columns is not permitted (they may be used only to conserve memory).

The final stage of porting an IBIS file should include the extension of the program to handle the full range of new IBIS files, such as DOUBle formats and A256 columns. Some examples of such programs will be provided to get the programmer started.

3.4.4 Making code "GRAPHICS-1 Friendly"

Special consideration must be given to code that will or might be passed old GRAPHICS-1 format files. The reason for this is that the GRAPHICS-1 file contains no information about its actual dimension (the NC attribute), and there is no default dimension for these files (both two and three dimensional files are very common) and so the dimension parameter must be supplied by the user.

To make IBIS programs "GRAPHICS-1 Friendly" it is suggested that during the process of porting IBIS-1 code written for interface files only, or while writing brand-new code, that a "G1DIM" parameter be provided in the PDF file, so that the user may specify the dimension of graphics files they are processing. Old GRAPHICS-1 programs already have a "DIM" parameter for this purpose, and so they may retain this parameter name. This parameter value may then be passed in to IBISFileUnit or IBISFileOpen to tell the library the NC size of the file.