Help for MARSRFILT

PURPOSE:
MARSRFILT does range-based filtering on an XYZ image, returning a new XYZ
image.  This is intended to compensate for XYZ's having an error (in the
range direction) that is often much greater than the cross-range resolution.
This difference causes "spikes" in the XYZ data.  These spikes are greadly
reduced or eliminated by this program

EXECUTION:
marsrfilt inp=xyz_data.img out=filtered_xyz_data.img
where:
xyz_data.img is an input 3-band image of type REAL with the X,Y and Z values
at that pixel in meters.  The program also acceps separate X, Y, and Z images,
although that's not the normal usage mode.
filtered_xyz_data.img is the output 3-band image in the same format.  Multiple
one-band images are not supported.


BACKGROUND:

The stereo range error equation is:

error = range^2 * ifov * corr / baseline

where range is the distance from the camera to the object, ifov is the
instantaneous field of view (angular size of one pixel, generally at the
center of the image), corr is the correlation accuracy (typically 0.25 to 0.5),
and baseline is the stereo baseline (separation between the cameras, in meters).

Note that actually, the above is correct for perpendicular range (the distance
from the camera plane to the point), not the actual range (the distance from
the lens to the point).  This program works with actual range.  Dividing by
the sine of the angle between the view ray and the baseline converts the
error to actual range error.  However, this sin term cancels out in the ratio
equation (below) and so we ignore the difference in this program.  The only
place it might matter is in the error computed for the proximity filter, where
it's a second order effect at best.

The size of a pixel (measured horizontally, cross-range) is:

size = range * ifov

(really, tan(ifov) but the small angle approximation applies).

For most real stereo cameras, the stereo range error is much bigger than
the cross-range pixel size.  For example, on MSL the navcam has ifov=0.00082
radians and baseline=0.424 meters, while the Mastcam (M34) has ifov=0.00022
radians and baseline=0.245 meters.  If we plug in those values (assuming
0.25 correlation accuracy, a very aggressive assumption to minimize the
error), we get:

Range	Navcam Error	Navcam Size	Mastcam Error	Mastcam Size
1	0.0483 cm	0.082 cm	0.022 cm	0.022 cm
5	1.21 cm		0.41 cm		0.56 cm		0.11 cm
10	4.83 cm		0.82 cm		2.24 cm		0.22 cm
50	121 cm		4.1 cm		56.1 cm		1.10 cm

While these are comparable close in (1m), the ratio error/size quickly gets
quite large; up to 51 for the Mastcam at 50m.

This ratio is key.  It says that the along-range resolution is 51 times worse
than the cross-range resolution.  In practical terms, it means the noise in
the along-range direction is 51 times the noise in the cross-range direction.

This discrepancy leads to pronounced "spikes" in the XYZ data.  These are
most visible when looking at a terrain mesh.  The farther out you go, the
more prevalent these spikes (pointing back at the camera, or away from it) are.

However, the spikes are pure noise, caused by correlation uncertainty.

This program significantly reduces this noise by converting the XYZ to range
data (thus separating the noisy along-range data from the non-noisy cross-range
data), filtering the range, and converting it back to XYZ.  Filtering is
based on the error formula, and is adaptive, so the proper amount of filtering
is done at each point.


METHOD:

In a nutshell, the program implements a weighted plane fit to the range
data around the given pixel, evaluating the plane at the given pixel.
This has a similar effect to averaging, but gets around issues with biasing
the results as the window slides off the edge of the image.

First, the program reads the XYZ data and converts it (if necessary) to the
specified coordinate system.

It then does a lowpass filter (standard boxcar filter, excluding 0s) on the
range data.  This is used as the range for determining the window size later.
The boxcar filter prevents the window size from gyrating wildly from pixel to
pixel, as it would do if we used the raw range.  This filter does suffer from
edge effects and other issues, but having an accurate range here is not that
critical, as it is only used to size the filter, and small inaccuracies in
the filter size have negligible effect.

The range filter itself proceeds as follows.  Each pixel is computed
independently, and if OMP_ON is enabled, it is parallelized on a line-by-line
basis.  (see the help for e.g. marsmap for more on OMP).

First, the nominal size of the filter half-window is determined.  This is based
on the ratio of error to size, and ends up being simply:

size = 0.5 * range * corr / baseline

The correlation quality defaults to 0.25 so as to err on the side of less
filtering.  The range in this equation is the smoothed range discussed above.

The stereo baseline is obtained from the image label (it is output from
marsxyz, in STEREO_BASELINE), and the ifov (used in the proximity filter,
below) is obtained from the camera model.  Both can be overridden on the
command line.

The actual half-window width is computed by multiplying the nominal width
by WFACTOR.  This allows larger or smaller windows to be selected.

The half-window height is derived from the width using the ASPECT_RATIO
parameter.  This allows the kernel to be shrunk vertically (e.g. with a ratio
of 0.5, the default), because there is typically much more range variation
in the vertical direction than horizontal, for most in-situ scenes.

If the computed window height or width is less than the MIN_WINDOW parameter
(really, half size < (MIN_WINDOW-1)/2 ), the half-size is set to that.  This
means that some filtering is always done.  If MIN_WINDOW is 0, anything
smaller than 3 pixels (half size of 1) will be passed through unchanged, with
no filtering.

The width of the Gaussian itself is determined by the NUM_SIGMA parameter.
The default is 1.0, meaning 1.0 sigmas fit into the half-window.  A larger
value will use a steeper Gaussian (thus favoring the center more); a value
close to 0 will emulate a box filter.

The program then loops over the window.  For each pixel in the window, the
weight is computed.  This weight is a combination of two factors: the
Gaussian kernel, and a proximity filter.  These two together implement
a form of bilateral filter.

The Gaussian kernel is simply:

weight = e ^ (-(x^2 / 2G_x^2 + y^2 / 2G_y^2))

where x and y are the pixel coordinates within the window (0,0 being the
center), and G_x, G_y are the Gaussian sigmas determined above.

The proximity filter is used to reject values that are too far away in
range space.  This helps particularly when there are gaps due to occlusions.
Without the proximity filter, the data on the far side of the occlusion (which
could be tens of meters away, say due to a ridgeline) is included in the plane
fit, which can bias the results, with the effect of the terrain "bending" to
fill the gap.  The proximity filter prevents this by de-weighting and
eventually eliminating data that is too far away.

The proximity filter is controlled by the PROX_MIN and PROX_MAX parameters.
The actual expected error in meters (not the ratio) is determined using the
actual central pixel (as mentioned in the introduction, the sin term related
to the angle the pixel makes with the baseline is ignored).  This error is
then multiplied by the PROX_MIN and PROX_MAX parameters (in other words, the
parameters are factors, so the actual min and max values vary dynamically
with range).  The range of the pixel being considered in the window is then
compared to the (actual) range of the central pixel.  If the difference is
within +/- of the minimum value, the Gaussian weight is passed through
unchanged.  If it is bigger than the maximum value, the pixel is discarded.
If it is in between, a proximity weight is applied, which is linearly scaled
from 1.0 at the min distance to 0.0 at the max distance.  This has the effect
of gradually rolling off the proximity window, which helps to avoid
discontinuities in the data.

Once the final weight is determined for each pixel in the window, a plane fit
is performed.  The plane equation is:

z = Ax + By + C

and the solution matrix equation is:

[  xx  xy  x  ] [ A ]   [ xz ]
[  xy  yy  y  ] [ B ] = [ yz ]
[  x   y   n  ] [ C ]   [ z  ]

where each cell is the sum of xx, or xy, etc. throughout the window.  The
weight is applied to each element before summing (i.e. the upper left corner
is the sum of wt * (x*x) not (wt*x) * (wt*x) ), and "n" is the sum of all
the weights.

The matrix is solved using Cramer's rule, and then evaluated at the center
to get the final range.  Because the center is x=0 y=0, we actually only have
to solve for C, so much of the plane fit solution is commented out in the
code.

After the range filter, an optional spike filter removes "spikes" in the
same way that the spike filter in marsxyz does.  It computes an average range
(after filtering; not the same as the initial average range discussed above)
for an area of SPIKE_WINDOW centered on the pixel.  It then takes the
difference between the range value and the average range, and compares it to
a threshold.  If the value exceeds the threshold, the pixel is rejected as
a spike.  The threshold is determined by the SPIKE_FACTOR parameter.  This
is used to calculate the spike per range squared as follows:

spike_per_rangesq = spike_factor * ifov * corr / baseline

which is closely related to the range error equation.  This value is multiplied
by the range squared to get the threshold to use.

This final range value is then projected back into XYZ using the camera model.

There is one subtlety in this final projection.  The marsxyz program works by
projecting rays out of both eyes of the stereo camera models.  Where they
intersect, is the XYZ position.  However, due to noise and other errors,
they rarely actually intersect.  Instead, the reported XYZ position is the
midpoint at closest approach of the rays.  This means the XYZ position is
in general not on the actual ray projected from the center of the pixel.

When we reconstruct XYZ from range in this program, there are two options
on what to do.  The default option (-CMOD) uses the nominal projected ray from
the camera model - not the slightly skewed value from the XYZ.  This is the
default because for good camera models, the deviation from the nominal ray
is largely based on noise in the results - noise that is significantly
reduced by this program.  It is therefore more likely that the true value
is along the ray, than it is along the noise-induced skewed ray.

However, for poor-quality camera models, this is not necessarily the case.
Poor models can have significant distance (called the miss distance) between
the left and right rays.  Furthermore, this miss distance has a large
systematic component due to the camera model errors.  In these cases, the
systematic component can far outweigh the noise.  The -RAY option will
cause it to project along the original vector (which generally points to the
midpoint at closest approach) instead.

The practical upshot of -CMOD is that XYZ values run using the left as a
reference and the right as a reference will not necessarily match, unlike
the raw marsxyz results.  With -RAY, they may match much closer.

The minimum, maximum, and average of the nominal window widths throughout the
program are saved in labels.  This allows a mesh program to use downsampling
in cases where there is no full-res (unmodified) data.

It should be noted that there is one potential bias that is not accounted
for in this program.  As you get closer to the horizon in a typical in-situ
scene, pixels get farther and farther apart, as the tangent of the view
angle.  This means that typically, pixels in the window below the central
pixel will be closer to the central pixel in range space than pixels an
equal number of lines above the central pixel.  This will have the effect
of biasing the results slightly away from the camera.  However, this is a
second-order effect that is not expected to be an issue for real images.
It is partially mitigated for two reasons: the proximity filter will be
more likely to cut off the distant pixels, and correlation tends not to
work well when the view angle is close to tangent anyway.


HISTORY:
2019-10-04 rgd	Initial version
COGNIZANT PROGRAMMER: Bob Deen

PARAMETERS:

INP

Input images. Must be 1 3-band file or (x,y,z) triplet.

OUT

Output files Will be 1 3-band file.

NAVTABLE

Nav file for pointing correction.

ORIGIN

Override of 3D Point from which to compute range.

CORR

Correlation accuracy.

BASELINE

Stereo baseline override (meters).

ASPECT_RATIO

Controls roundness of Gaussian kernel.

NUM_SIGMA

Controls steepness of Gaussian function

WFACTOR

Controls size of window.

WINDOW

Window size for initial range smoothing.

IFOV

IFOV override (radians).

PROX_MIN

Min factor for proximity filter.

PROX_MAX

Max factor for proximity filter.

CAST

How to reconstruct the XYZ.

MIN_WINDOW

Min window size for range filter.

SPIKE

Turns on spike filter

SPIKE_WINDOW

Window size for spike filter

SPIKE_RANGESQ

Threshold for spike filter.

CONFIG_PATH

Path used to find configuration/calibration files.

POINT_METHOD

Specifies a mission- specific pointing method to use

NOSITE

Disables coordinate system sites.

RSF

Rover State File(s) to use.

DEBUG_RSF

Turns on debugging of RSF parameter.

COORD

Coordinate system to use.

COORD_INDEX

Coordinate system index for some COORD/mission combos.

FIXED_SITE

Which site is FIXED for rover missions.

SOLUTION_ID

Solution ID to use for COORD_INDEX

DATA_SET_NAME

Specifies the full name given to a data set or a data product.

DATA_SET_ID

Specifies a unique alphanumeric identifier for a data set or data product.

RELEASE_ID

Specifies the unique identifier associated with the release to the public of all or part of a data set. The release number is associated with the data set, not the mission.

PRODUCT_ID

Specifies a permanent, unique identifier assigned to a data product by its producer.

PRODUCER_ID

Specifies the unique identifier of an entity associated with the production a data set.

PRODUCER_INST

Specifies the full name of the identity of an entity associated with the production of a data set.