Segmentation and graph-based region growing


Region growing is a powerful method for clustering objects in spatial data, and in image analysis this methods have been used in many applications. Because of the prohibitive data quantities in remote sensing, however, up to now region-based have rarely reached end-user analysis systems. With the rapidly advancing computer technology there has been a renewed interest in segmentation methods and there are now commercial systems that uses stochastic optimization (e.g. markov fields) and other more advanced methods for information extraction and image enhancement (note that "regions" and "segments" are used interchangeably here. In terms of graph theory, the "regions" are nodes and edges are "neighbours")

The method for region growing described here is based on the following idea, inspired by the work of Olle Hagner and Bengt Näsholm (now at Foresys) at the Swedish university of agricultural sciences in Umeå. Given an initial set of regions, where each region has adjacent neighbouring regions, merge neighbours that are "close" to the center region.The closeness criterion used in this work is a simple version of the classical statistical t-ratio test, extended for several variables [Sqrt(Sum(Power(t(i),2))), where t(i) are t-ratios for each band. This degenerates to the Euclidian distance if the variance is zero]. The features used in practice are the bands of the data set (e.g. RGB + texture), but data from different sensors can also be used. The regions are merged in descending order of statistical similarity (i.e.low t-ratios are merged forst), new feature vectors are derived from the new (and larger) regions, and the process is repeated. The halting criterion is set by a threshold or by the maximum number of iterations. In a recent paper by Agustin Lobo in size="2">IEEE trans. geosc. rem. sens , "Image segmentation and Discriminant Analysis for Identification of Landcover Units in Ecology", sept. 1997, vol. 35, no. 5, pp. 1136-1145 , there is a description of a segmentation algorithm almost identical to this one and the one by Hagner/Näslund (their implementation is adapted for personal computers). This work has now been nicely extended by Anna Rydberg (formerly at Centre for Image Analysis), who uses information about region shape and line-segments to extract regions. The extension works well even for aerial imagery, and I'm considering implementing some of her ideas here.

In the current implementation, one first extracts an edge image from the multiband image. This can be done in many ways, here the image bands are wimply averaged, weighted uniformly. This is a rather simplistic method, and depending on the image bands other methods should probably be used. Secondly, the gradient of the resulting is extracted by using crossed 1-D gaussian filter kernels, usuxdally between scaled 0.7 - 1.5. This is used as input to the classical Vincent/Soille Watershedding algorithm, the output of which is the starting point for the region growing as follows:

ImageI(nbands);
FImage aver = (Im(1)+Im(2)+Im(3))/3.0; // For a 3-band image
FImage S = WaterShedding(GradMag(((aver).GaussRescaleU8()));
S.NearestClean()
While (!done) {
Features F(I,S);
Graph G(F,S);
Merge M(G,S,crit);
done = (G.Size(n)==G.Size(n-1)) OR (n>maxiter);
}

The feature vectors "F(i)",i=1,2..#regions are calculated from a segmented image "S" , a straightforward and fast operation. The segments in S are represented both as a raster image and as a sorted list with region number i as key {x,y,i} where i = S(x,y). This duplicates the amount of data which needs to be stored, but enables immediate access to individual regions S(x,y). The "Graph G" operation is the computationally most expensive operation and amounts to tracking the border of each region, counting its neighbours, and calculating the features and t-ratios for the region and its neighbours. The adjacency graph is represented as a list whose members are sorted in descending t-ratio order:

for (i=1 to F.Size()) {
Clique q = GetNeighbours(trackBorde(S,region(i)));
if (q.t-ratio(neigh=0)>threshold) G.push_back(q);
}
sort(G.begin(),G.end(),T-ratio);

"Clique q" denotes an element in the adjacency graph, which consists of a region with its neighbours and computed feature vectors. The "Merge M" step amounts to iterating over the regions in the graph, tag all merged regions,and physcially renumber the regions in S:


for (i=1 to G.size()) {
for (n=1 to G(i).nofNeighs())
if (merge(region(i),n,Params)
renumber(S,i.n)
}
renumberAllRegion(S);


The most important input parameters to the region growing are the initial t-ratio threshold, final t-ratio (changed linearly with each iteration), the maximum number of iterations, and the the size of the smallest allowable regions. A critical value for the speed of the algorithm is the number of regions, and of course the size of the image. The number of regions is controlled by the size of the filter kernel used as input to the watershedding segmentation; the higher the variance, the less the number of regions. Unless a segmentation of the whole image is desired, a good strategy is to threshold interesting segments with signatures or a simple greyscale interval for a given band, and only build the graph for these segments.

The total processing time is heavily dependent on the number of initial segments from the wathershedding. To reduce the number of calculations the parameter "minarea" in the parvec0 file can be used, setting this to 5-10 excludes areas < in the merging. In a post-processing step the remaining smaller areas are merged with the resulting segments saving both memory (above all) and some time.
Important parameters are:
mergetype:

AND if the resulting T-value < start_t = final_t
minarea: 5-10 pixels)
start_t: 5-15 (relevant for T-test)
stop_t: 5-15 (must be > start_t. Recommended use: same value as start_t)
The region merging is implemented as a graph and the regions with the highest statistical likeness (low T-values) are merged first.
I usually use the segmerge program in 3 steps:
1. Watershedding segmentation (parameter file "parvec0").
2. Region merging using the "pure" T-test criterion and area pruning ("parvec1").
3. A generalisation test that produces rounder areas ("mergetype 5" in "parvec2").

With these steps, a 1000x1000 image on my 2.66 Ghz Pentium IV (1.5 Gb RAM) was processed in
1. 15 s
2. 2 min 36 s
3. 1 min 7 s
which sums up to approximately 4 minutes.
Here is a visual caption of the results for an optical image (SPOT-XS, 3bands) and a multitemporal synthetic aperture radar image (ERS-2, 3 images taken with approx. 1 month intervals)