Sunday 5 February 2023

SVD-based inversion of geological surfaces from topographic traces: preliminary assessment of reliability via eigenvalue log ratios


Topographic traces of geological surfaces store information about the surface attitudes. When considering the local scale, in the majority of the cases the surface can be approximated by a geometric plane.

Singular Value Decomposition (SVD) is one of the available methods for inverting the topographic traces, i.e., deriving the local, geological surface planar attitude.

SVD input and output matrices

Input data for the SVD methodology are expressed as an m x n matrix, here named A. When successful, the SVD method produces 3 matrices, named U, S and V* (* means transposed), so that A = U x S x V*. In the general case, U has dimensions m x m, S is an m x n diagonal matrix (i.e., all values are zero except for those on the main diagonal), while V* is an n x n matrix.

Input and output data for topographic point inversion

For topographic points inversions, m, i.e.. the number of matrix rows, is equal to the number of input points while n, the number of matrix columns, is equal to 3, since we are considering a 3D Cartesian space. The U matrix dimensions, i.e., m x m, are equal to the number of input points. S is an m x 3 diagonal matrix, whose main diagonal values stores the eigenvalues in descending magnitude order (s1 >= s2 >= s3).
V* is a 3 x 3 matrix that stores the eigenvectors of the A matrix.
In particular, the last row of the V* matrix, that is associated with the minimum value eigenvalue s3, contains the Cartesian components of the vector that is normal to the potential best-fit-plane.

For our purpose, only V* and S are needed for getting the estimated best-fit-plane (by using the last V* matrix row) and evaluating its reliability (by considering S matrix diagonal values).

What about the results reliability?

Some questions are: how can we be sure that the input points can be approximated by a plane? How can we evaluate how good is the fit between the computed result and the real plane? Can a quantitative metric of the result reliability therefore be defined?

To try to answer these questions, at least in a preliminary way, I've created a few simulations of points with different spatial distributions, in order to invert them and check the degree of concordance between the computed plane-normal eigenvector and the theoretical surface attitude. The eigenvalues has been used to try to infer the degree of result reliability.

To better characterize the eigenvalues, their negative log ratios were calculated:

nlr21 = - log (s2 / s1)
nlr31 = - log (s3 / s1)
nlr32 = - log (s3 / s2)


The negative signs before the log operators are used to avoid dealing with negative values. Since the numerator is always equal or lower than the denominator, the resulting negative log ratios are between 0 (for numerator equal to denominator) to infinity (for numerator equal to zero). Nan results can arise when both numerator and denominator eigenvalues are zero.

Software used for simulations and inversions

Both the simulations and the inversions were performed using the programs in the geoSurfDEM repository (currently available only in the dev branch). The SVD inversions are mainly dealt with using FORTRAN and linked Blas/Lapack libraries. C++ code is wrapped around the inversion FORTRAN routines and takes care of data input and output for both the simulations and the inversions.

Basic simulations

To start with, two very simple cases were considered, in order to get a feeling for the resulting eigenvalue distributions.

For these simple distributions, the specific points were directly written into the input files, as rows of x-y-z values (files in 'geoSurfDEM/test_data/InvertPoints' folder, named 'points_01.csv' to 'points_04.csv'). The different cases were then inverted with the code in the 'geoSurfDEM/InvertPoints' folder, specifically the compiled 'InvertPoints.cpp' program).

Co-planar points

These simulations consider perfectly co-planar points (just 4 for each plane dip subcase), lying in a horizontal (a), vertical (b) or 45°-dipping (c) plane.

Results 

The third eigenvector in the resulting V* matrix is always perfectly normal to the simulated plane. The s1 and s2 eigenvalues are equal or of the same order of magnitude (e.g., s1 = 2 and s2 = 1.41), while the s3 eigenvalue is always zero.
The nlr21 values are between 0 and 0.15 for all three cases (i.e., s1 and s2 equal or almost equal). Both nlr31 and nlr32 indices are infinity (since s3 is zero).
So for co-planar points we expect s1 = s2 >> s3, with nlr12 values around zero and very high values of both nlr31 and nlr32.

Collinear points

Three perfectly collinear, horizontal points were used as input data. Obviously no meaningful best-fit-plane exists.

Results

s1 is 1.4142 while s2 and s3 are both zero. So both nlr21 and nlr31 are infinity while nlr32 is Nan. 

Collinear points produces eigenvalues where s1 >> s2 = s3 and very large values of nlr21 and nlr31. The nlr32 index should be around 0 (apart when s2 and s3 are both zero).

Collinear and co-planar log ratios summary

The current results for the co-planar and collinear cases can be summarized as in the following diagram:

  • co-planar data have very high nlr31 and nlr32 indices. 
  • collinear points have very high nlr31 index and near zero nlr32 index.

 

Going uniform

When points have no preferred orientation, what are the resulting eigenvalues?

To answer this question, I've created uniform spherical distributions of point. I used a slightly modified C++ code published by Cory Simon in http://corysimon.github.io/articles/uniformdistn-on-sphere/. As before, the code is available in the geoSurfDEM repository (currently only in the dev branch and I have to check that code again since I've made modifications to the FORTRAN inversion routine...).

The simulated data to invert were created as such: 1000 random simulations were generated for 6 set of generated point number: 3, 4, 5, 10, 100 and 1000 points.
The files storing 3 points correspond in general to a unique well-defined inverted plane, apart from the cases where the points tend to collinearity (the case of three coincident points was excluded in the code). 

In the Addendum, all statistical plots for these simulations are presented. Here I show only the most significant plot, that complements the previous results.

When considering the nlr31 vs. nlr32, we see that perfectly co-planar data (n = 3) have high values of both these indices.

Spatially uniform data (simulations with n >> 3) have almost equal eigenvalues, so in the graph they are clustered towards the axes origin.

 

Conclusions

We can add together all the previous results in this summary sketch:


Where is the exact limit between almost co-planar and random points? And between collinear and co-planar points (undulating blue line)?

Unfortunately these simulations do not give us a quantitative answer. They say us when it is surely random, when surely co-planar or collinear, but further analyses where a quantitative measure of adherence of the empirical data to the theoretical (i.e., planar) model are needed. 

However, prior to applying other models, a check that our data are not random or collinear could help us to avoid producing false-positive results.



Addendum: Statistical properties of spherically uniform random points.

These are all the statistical plots for the spherical simulations.

The plots were generated with R and ggplot2. 

Simulations are subdivided by number of simulated points (from 3 to 1000).

The plots are not commented.