Introduction to the evclust package

Thierry Denoeux

2016-05-21

The package evclust contains methods for evidential clustering. In evidential clustering, cluster membership uncertainty is represented by Dempster-Shafer mass functions. The user is invited to read the papers cited in the documentation to get familiar with the main concepts underlying evidential clustering. These papers can be downloaded from the author’s web site, at https://www.hds.utc.fr/~tdenoeux. In this document, we provide a guided tour of the main functions in the evclust package. You first need to install this package:

library(evclust)

Evidential c-means (ECM) algorithm

The Evidential \(c\)-means (ECM) algorithm is a \(c\)-means-like algorithm that minimizes a cost function by searching alternatively the space of prototypes and the space of credal partitions. Unlike the hard and fuzzy \(c\)-means algorithms, ECM associates a prototype not only to clusters, but also to sets of clusters. The prototype associated to a set of clusters is defined as the barycenter of the prototypes of each single cluster in the set. The cost function to be minimized insures that objects close to a prototype have a high mass assigned to the corresponding set of clusters.

Consider, for instance, the fourclass data. This dataset consist in four clusters in a two-dimensional space.

data(fourclass)
x<-fourclass[,1:2]
y<-fourclass[,3]
plot(x[,1],x[,2],pch=y,col=y,xlab=expression(x[1]),ylab=expression(x[2]))

We can run ECM with c=4 clusters on this data as follows:

clus<-ecm(x,c=4,type='full',alpha=1,beta=2,delta=sqrt(20),disp=FALSE)

The option type='full' is actually the default option. It means that mass functions in the credal partition will have 2c focal sets. You can get basic information about the credal partition using the method summary:

summary(clus)
## ------ Credal partition ------
## 4 classes,400 objects
## Generated by ecm
## Focal sets:
##       [,1] [,2] [,3] [,4]
##  [1,]    0    0    0    0
##  [2,]    1    0    0    0
##  [3,]    0    1    0    0
##  [4,]    1    1    0    0
##  [5,]    0    0    1    0
##  [6,]    1    0    1    0
##  [7,]    0    1    1    0
##  [8,]    1    1    1    0
##  [9,]    0    0    0    1
## [10,]    1    0    0    1
## [11,]    0    1    0    1
## [12,]    1    1    0    1
## [13,]    0    0    1    1
## [14,]    1    0    1    1
## [15,]    0    1    1    1
## [16,]    1    1    1    1
## Value of the criterion=265.38
## Nonspecificity=0.32
## Prototypes:
##              x1         x2
## [1,]  4.4337100  4.6367630
## [2,]  4.3274090 -0.2237856
## [3,] -0.5339534  4.1789103
## [4,] -0.4611023 -1.0800053
## Number of outliers=1.00

We can restrict the focals set to pairs by changing the type option.

clus<-ecm(x,c=4,type='pairs',alpha=1,beta=2,delta=sqrt(20),disp=FALSE)
summary(clus)
## ------ Credal partition ------
## 4 classes,400 objects
## Generated by ecm
## Focal sets:
##       [,1] [,2] [,3] [,4]
##  [1,]    0    0    0    0
##  [2,]    1    0    0    0
##  [3,]    0    1    0    0
##  [4,]    0    0    1    0
##  [5,]    0    0    0    1
##  [6,]    1    1    0    0
##  [7,]    1    0    1    0
##  [8,]    1    0    0    1
##  [9,]    0    1    1    0
## [10,]    0    1    0    1
## [11,]    0    0    1    1
## [12,]    1    1    1    1
## Value of the criterion=308.06
## Nonspecificity=0.25
## Prototypes:
##              x1          x2
## [1,] -0.4754561  4.06107082
## [2,] -0.3460047 -0.89594474
## [3,]  4.1999174 -0.07700027
## [4,]  4.3160419  4.56227524
## Number of outliers=1.00

A plot of the credal partition can be generated as follows:

clus<-ecm(x,c=4,type='pairs',alpha=1,beta=2,delta=sqrt(20),disp=FALSE)
plot(clus,x,mfrow=c(2,2),ytrue=y,approx=2)

In this plot, the lower and upper approximations of each cluster are plotted as solid and interrupted lines, respectively. Since we selected approx=2, the lower and upper approximations are defined as follows. Let \(A_i\) be the focal set of mass function \(m_i\) with the highest mass. The lower approximation of cluster \(\omega_k\) is the set of objects such that \(A_i=\{\omega_k\}\), while the upper approximation is the set of objects such that \(\omega_k \in A_i\). The outliers are the objects such that \(A_i=\emptyset\). They are displayed as circles in the figure above.

RECM

The Relational Evidential c-means algorithm (RECM)