DSLib: An open source library for the dominant set clustering method
Abstract
DSLib is an open source implementation of the Dominant Set (DS) clustering algorithm written entirely in Matlab^{®} . The DS method is a graphbased clustering technique rooted in the evolutionary game theory that starts gaining lots of interests in the computer science community. It has been originally introduced in [4, 5] and, thanks to its duality with game theory and its strict relation to the notion of maximal clique, has been explored in several directions not only related to clustering problems. Applications in graph matching, segmentation, classification and medical imaging are common in literature.
This package provides an implementation of the original DS clustering algorithm, since no code has been officially released yet, together with a still growing collection of methods and variants related to it.
Our library is integrable into a Matlab^{®}pipeline without dependencies, it is simple to use and easily extendable for upcoming works.
The latest source code, the documentation and some examples can be downloaded from https://xwasco.github.io/DominantSetLibrary
Keywords: Dominant set, clustering, graph, maximal clique, game theory
/SetBgContents
Source code available: https://github.com/xwasco/DominantSetLibrary
1 Introduction
Clustering is a relevant task in the domain of pattern recognition.
It requires to group
elements into meaningful subsets (e.g., points which are close in a Euclidean space) exploiting the underlying hidden structure of the data.
It is in general an illposed problem
since different partitioning of the same dataset could end up into a meaningful subdivision of the data.
Two properties characterize, in general, every clustering algorithms: each subset (cluster) should have a high internal coherence and a high external incoherence this means that items belonging to the same cluster should have a high similarity while items of different clusters should be more dissimilar as possible. These properties are essential to achieving a good separation of the subsets and represent the foundation of many clustering algorithms
including the dominant set clustering method.
The dominant set (DS) algorithm has been formalized in the domain of graph theory, with a strict relationship to the notion of maximal clique
In the DS algorithm, a dataset is represented as an edgeweighted graph where nodes are data points, and edges encode the relationship between pairs of points in terms of a similarity weight.
The graph, encoded as an affinity matrix, is given in input to the algorithm.
The clusters (dominant set) are then iteratively extracted from the graph using a dynamical system that mimics a natural selection process. Three different dynamical systems are implemented into this library.
In particular, using this clustering technique is beneficial when i) the number of clusters to be found is totally unknown, ii) the pairwise similarity function is not symmetric, iii) a quantification of the quality of cluster is needed (i.e. to detect outliers) and iv) a notion of centrality of an element with respect to the cluster (centroid) is required. The aforementioned properties are hard to find in a traditional clustering algorithm (i.e. kmeans, spectral clustering and so on). Furthermore, the DS algorithm proved to be particularly suitable when the datasets are noisy and corrupted by outliers thanks to its nature of finding highly compact and coherent structures in the graph.
The remaining of the paper is organized as follow: in Sec 1.1 a brief introduction to the dominant set method and evolutionary game theory are provided, in Sec 2 the software package is explained with a practical example, in Sec 3 some usecases are given and, finally, in Sec 4 conclusions are drawn.
1.1 Dominant Sets
A dataset is represented as an edgeweighted graph with no selfloops, where is the set of nodes, is the set of edges connecting pairs of nodes and is a weighting function mapping edges to nonnegative weights. One of the possible way of defining is through the following kernel where is a generic distance function (typically the Euclidean distance) between elements and (with ) and is the decay of the negative exponential (or the scale of the kernel). Since the DS method requires a graph in terms of its pairwise affinity matrix , other similarity functions can be plugged in by the user without any problem.
Finding a dominant set (a cluster) is achieved optimizing a quadratic problem over the standard simplex:
(1)  
s.t. 
where is the affinity matrix of size , the number of nodes in the graph (number of elements in the dataset) and is the standard simplex consisting of dimensional probability vectors.
A local solution of problem (1) can be found by iterating the following dynamics until convergence:
(2) 
Eq.(2) is a dynamical system, known as Replicator Dynamics(RD) [11], which mimics a Darwinian selection process over the nodes, letting the ones that mostly support each other (the ones that are more similar) to survive (retain a nonzero probability) to the detriment of the others. Two additional dynamical systems are proposed in this package, the Infection Immunization [7] and the Exponential Replicator Dynamics [11] which can be used for large graphs and for faster convergence, respectively.
At convergence of the chosen dynamical system, the elements that exceed a given probability threshold represent the dominant set. Those elements are removed from the original graph and the process is reiterated on the remaining nodes until all the nodes are grouped. This iterative peelingoff procedure to extract DSs is implemented in the library (see Sec 2.3).
2 Description of the package
In this section, we provide a short description of the library and how to integrate it into a Matlab^{®}pipeline. The inline documentation of the package provides all the information for the parameters.
2.1 Requisites
No particular requirements are needed, the library runs on any Matlab environment (Windows, Linux, Mac) and it has no dependencies. It has been tested on Matlab 2014b, 2016b and 2018b in Windows/Linux and MacOSX. Some components are written in Cmex for optimization purpose but equivalent Matlab code is available without the need of compiling it.
2.2 What’s inside the package?
2.3 Practical usage
In this section we show how to setup a basic running example to cluster a synthetic dataset.
Step 1: Generating data and build the affinity matrix :
Step 2: Choosing the evolutionary dynamic and the DS parameters:
Step 3: Call the clustering method
alternatively, the clustering method can be called providing only the similarity matrix and using the default parameter.
Step 4: Show the cluster results
The package comes with more complex examples and a deep inline documentation.
3 Case studies
The core described in this manuscript has been used in several studies. Here we highlight three applications in radically different domains to prove the transversality of the method:

Protein Clustering: In [6] the DS has been used to study the interaction of proteins during the neuronal plasticity. The tool provides unprecedented insight into the problem. Number of nodes in the graph .

Brain connectomics: In [2, 3] the DS has been used to study the neuronal connections between the hemisphere with the aims of automatically group fibers having the same shape. Furthermore, a matching procedure allows recovering the common structures across different subjects. Number of nodes in the graph .
4 Conclusion
The DS clustering method has proven its powerfulness in the last decade, leading to a variety of applications. Here we present a library implementing the original paper and some followup methods. The library is written in pure Matlab^{®} with Cmex components, is crossplatform, open source and does not have any dependency. Full documentation, tutorials, download, and the updated source code can be found at https://xwasco.github.io/DominantSetLibrary/.
Acknowledgments
We would like to acknowledge Dr. Luca Dodero, prof. Diego Sona for testing the library and giving us their invaluable support. Furthermore, we want to thanks the entire Student Branch @ DAIS of Ca’ Foscari University of Venice for their suggestions.
Footnotes
 A clique is a set of mutually connected nodes
References

Samuel Rota Bulò and Marcello Pelillo.
Dominantset clustering: A review.
European Journal of Operational Research, 262(1):1 – 13, 2017. 
L. Dodero, S. Vascon, L. Giancardo, A Gozzi, D. Sona, and V. Murino.
Automatic white matter fiber clustering using dominant sets.
In Pattern Recognition in Neuroimaging (PRNI), 2013
International Workshop on, pages 216–219, June 2013. 
Luca Dodero, Sebastiano Vascon, Vittorio Murino, Angelo Bifone, Alessandro
Gozzi, and Diego Sona.
Automated multisubject fiber clustering of mouse brain using
dominant sets.
Frontiers in Neuroinformatics, 8:87, 2015. 
M. Pavan and M. Pelillo.
A new graphtheoretic approach to clustering and segmentation.
In Computer Vision and Pattern Recognition, 2003. Proceedings.
2003 IEEE Computer Society Conference on, volume 1, pages I–145–I–152
vol.1, June 2003. 
Massimiliano Pavan and Marcello Pelillo.
Dominant sets and pairwise clustering.
IEEE Trans. Pattern Anal. Mach. Intell., 29(1):167–172,
January 2007. 
Francesca Pennacchietti, Sebastiano Vascon, Thierry Nieus, Christian Rosillo,
Sabyasachi Das, Shiva Tyagarajan, Alberto Diaspro, Alessio del Bue, Enrica
Maria Petrini, Andrea Barberis, and Francesca Cella Zanacchi.
Nanoscale molecular reorganization of the inhibitory postsynaptic
density is a determinant of gabaergic synaptic potentiation.
Journal of Neuroscience, 2017. 
Samuel Rota Bulo and Immanuel M. Bomze.
Infection and immunization: A new class of evolutionary game
dynamics.
Games and Economic Behavior, 71(1):193–211, January 2011. 
Sebastiano Vascon, Marco Cristani, Marcello Pelillo, and Vittorio Murino.
Using dominant sets for knn prototype selection.
In Alfredo Petrosino, editor, Image Analysis and Processing –
ICIAP 2013, volume 8157 of Lecture Notes in Computer Science, pages
131–140. Springer Berlin Heidelberg, 2013. 
Sebastiano Vascon, Eyasu Zemene Mequanint, Marco Cristani, Hayley Hung,
Marcello Pelillo, and Vittorio Murino.
Detecting conversational groups in images and sequences: A robust
gametheoretic approach.
Computer Vision and Image Understanding, 143:11–24, 2016. 
Sebastiano Vascon, Eyasu Zemene Mequanint, Marco Cristani, Hayley Hung,
Marcello Pelillo, and Vittorio Murino.
A gametheoretic probabilistic approach for detecting conversational
groups.
In Asian Conference in Computer Vision, Lecture Notes in
Computer Science, 2014. 
Jörgen W Weibull.
Evolutionary game theory.
The MIT press, 1995.