Conversions from points to lines: which methods?

(english version of post Conversione da punti a linee: come?”, published 2010-11-09)

One of the first concepts presented in GIS courses is that vector data have point, line and polygon geometries. And we can switch between them without significant problems. However, such changes can be problematic in practice.

The actual case? I have a polygon layer of the geology in an area of the Nera Valley (Umbria, east of Foligno). I want to extract faults into a new line layer, using the contacts that already exist between the geological formation and geological formation (I ignore the faults developed entirely inside a formation). By converting polygon to line geometries, I face two main drawbacks. Line editing to delete unwanted stratigraphic limits is error prone. The latter is that line orientations are often inconsistent along a same fault.

In practice, it is more convenient to convert the lines into points, clear all unnecessary points, assign a common identifier to all the points representing a single fault, and then convert the geometry from point to line, based on fault identifiers.

While point editing is just boring, the last step can be tricky. I try with some software tools, and the result is not the expected one. There is an abundance of redundant lines, connecting points that should not be connected. It may depend on the inconsequential order of the original lines and also the presence of duplicated points generated by the polygon-to-line conversion.

Good opportunity for programming with Python. I try to create a tool ensuring proper processing ofpoint ? line conversion. Apart from removing duplicated points, we need a successful logic of conversion point ? line. It could be quite straightforward, considering that fault traces have an almost straight alignment. Therefore, a tentative algorithm for a single fault could consist in finding the two extremes of the fault trace, then, starting from an extreme, connect to each point the closest, unused one.

Can a straightforward strategy as the one depicted convert effectively points to elementary linear geometries? How to find quickly, from the execution time point of view, the most distant points, or vice versa the closest ones?

The second question can be approached via a distance matrix, which stores the distances between each point couple. So the largest distance can be easily picked, or alternatively the shortest distance between a point and a set of other points. Assuming that the largest distance is due to the two fault ends, we can find them. From one of the ends, we can then select the shortest distance with another point and connect the two points. Then, we mark the starting point as used (i.e., set all its distances to infinity) and re-apply the same logic until we use all the points. For this processing, the numpy Python module is suitable.

The answer to the first question is positive, at least in this case. This simple strategy allows an accurate conversion to line geometry. Obviously for more complex and general cases, a more sophisticated strategy would be needed.

The Python script and the used data are available in this zipped file. Input and output data format is shapefile. Required Python modules are numpy and ogr: I used Python (x, y), which already incorporates these modules. The FWTools software has the older Numeric module instead of the recent numpy. Therefore, the script should not work in FWTools, or you should modify from numpy to Numeric terminology. Other Python distributions with numpy ogr and could work without problems. The program run from the shell with the following command line:

python par.txt

where is the script name and par.txt is the text file that contains the parameters (name of the input point shapefile, name of the output line shapefile and name of the field that contains the numeric identifier of each line). An example is: