23º SINAPE - Simpósio Nacional de Probabilidade e Estatística

Dados do Trabalho


Título

A KERNEL K-MEANS CLUSTERING ALGORITHM BASED ON AN ADAPTIVE MAHALANOBIS KERNEL

Resumo

Clustering method are useful tools for exploiting structures in datasets and have been widely used for unsupervised pattern recognition. Clustering means organizing a set of observations (objects, individuals, genes, pixels, etc.) into cluster so that observations belonging to a given cluster have a high degree of similarity, whereas observations belonging to different cluster have a high degree of dissimilarity. Euclidean distance is the most commonly used in clustering methods. However, methods based on this distance have good performance in data whose cluster are approximately hyperspherical and linearly separable. Because of this limitation, several methods capable of dealing with data whose structure is complex have been proposed, among which, kernel based clustering methods whose essence involves realization $\Phi$ arbitrary nonlinear mapping of the original space $p$- dimensional $X\subset\mathbb{R}^p$ for a dimension space higher (possibly infinite), called the feature space, $\mathcal{F}$. The most commonly used kernel function is the Gaussian. Despite its good characteristics, this kernel is based on the Euclidean distance, that is, it assumes that the observations are more likely to be distributed in a hyperspherical region (that is, equal variances and zero covariance). However, examples in two different cluster are more likely to be distributed inside two hyperelipsoid regions diferente. The distance from Mahalanobis, which takes into account the correlations between variables and is invariant in scale, is a better choice for dealing with hyperelipsoid regions. We propose, under the approach kernelization of the metric, a algorithm kernel K-means based on an adaptive Mahalanobis kernel. The effectiveness of the proposed algorithm was demonstrated by experiments with simulated data.

Palavras-chave

Clustering, Kernel, Mahalanobis.

Área

Dados Funcionais, Dados em Alta Dimensão e Aprendizado Estatístico de Máquinas

Autores

Fernanda Florencio Costa, Marcelo Rodrigo Portela Ferreira