Clustering methods are exploratory data analysis techniques and are mostly related to the analysis of multivariate data sets. For prospect appraisal, discriminant analysis is useful, especially if probability estimates can be obtained on the basis of some geological input variables. As there are more textbooks on the subject than one could read in a lifetime, we discuss here only one method that is fairly easy to understand. While clustring of data in m-dimensional space may be started by assuming a fixed number of k clusters, it is also possible not to specify how many clusters there should be. Such a method is non-linear mapping.

Non-linear mapping is a technique that facilitates to view complicated multi-dimensional data-structures in a 2-dimensional graph. The earth is roughly a sphere in three dimensions. Maps on a sheet of paper are two-dimensional and are necessarily a little, or even significantly distorted with regard to the interdistances between points. Many projections have been proposed, of which the Mercator projection is possibly the most used. Distortions are minimized as much as possible and dependent of the intended use of the resulting map. This principle can be extended to more dimensions. A useful technique for data analysis if the number of selected variables exceeds 2, and allows "to take a 2-D picture" of a multidimensional space.

The way the non-linear mapping is done involves the following steps:

- X-variables are normalized by substracting their mean and dividing by their standard deviation.
- The correlation matrix of the X-variables is used to find the pair of X-variables that have the smallest (absolute) correlation coefficient. This pair is assumed to give the best starting point for the iterative mapping.
- The data are mapped on the plane determined by the two chosen variables. The total mapping error is calculated as the sum of squared differences between the 2-D map distances and the M-dimensional Euclidian distances of all inter-distances (½n(n-1).
- Steepest descend method is used to iteratively reduce the total mapping error by small changes of the point coordinates on the map.
- Usually in less than 100 iterations, a satisfactory map is obtained. If convergent, the error cannot be reduced more in one step than the "map tolerance", usually 0.000001. The algorithm is described by Sammon (1969).

If the data structure extends in more than two dimensions, there is always a irreducible mapping error. The remaining mapping error can give an indication of the dimensionality of the data set. The dimensionality may be less than the number of X-variables if data form a hyperplane of less than m in the m-dimensional space.

This example shows a perfect separation of the classes. There is only one join between the groups in the MST. The number and types of joins in the MST can be used as a significance test for the separation of classes but the parametric tests are more powerful in detecting significant classification. Such tests are based on all the euclidian distances between points.