Title:A Computational Theory of Clustering

Name: Avrim Blum

               Carnegie Mellon University

Time:October 13th (Monday) 14:45-15:30
Location: Lecture Hall, FIT Building, Tsinghua University
Host Unit: ITCS, Tsinghua University




Abstract

There has been substantial work in many different fields on the problem of clustering data.  Theoretical work has generally been of two types: either on algorithms for (approximately) optimizing various distance-based objectives such as k-median, k-means, and min-sum, or on clustering under probabilistic "generative model" assumptions such as mixtures of Gaussian or related distributions. 

In this work we propose a new approach to analyzing the problem of clustering.  Building on models used in learning theory, we consider the goal of approximately recovering an unknown target clustering from a given similarity matrix (weighted graph), given only the assumption of certain natural properties that the similarity or weight function satisfies with respect to the desired clustering.  We find that if we are willing to relax our goals a bit (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if it contains a pruning that is close to the correct answer) then this leads to a number of interesting graph-theoretic and game-theoretic properties that are sufficient to cluster well.  We can also produce accurate clusterings under implicit assumptions made when considering approximation algorithms for distance-based objectives, even at values for which no efficient approximations exist.  This work can be viewed as defining a kind of PAC model for clustering.

This talk is based on work joint with Maria-Florina Balcan, Anupam Gupta, and Santosh Vempala.


Biography


Avrim Blum received his PhD from MIT in 1991 and is now Professor of Computer Science at Carnegie Mellon University.  His main research interests are in Machine Learning Theory, Approximation Algorithms, and Algorithmic Game Theory, and is also known for his work in AI Planning.  He has served as Program Chair for the IEEE Symposium on Foundations of Computer Science (FOCS) and the Conference on Learning Theory (COLT), and on the organizing committee for the National Academy of Sciences U.S. Frontiers of Science Symposium. He was recipient of the Sloan Fellowship and NSF National Young Investigator Awards, the ICML/COLT 10-year best paper award, and is a Fellow of the ACM.

 


Back