We consider the problem of finding a ranking of a set of elements that best represents a given set of input rankings of the elements; more precisely, we want to find a permutation that minimizes the Kendall-tau distance to the input rankings, where the Kendall-tua distance is defined as the sum over all input rankings of the number of pairs of elements that are in a different order in the input ranking than in the output ranking. This problem arises for example in building meta-search engines for Web search, aggregating viewers' rankings of movies, or giving recommendations to a user based on several different criteria, where we can think of having one ranking of the alternatives for each criterion. Many of the approximation algorithms and heuristITCS that have been proposed in the literature are either positional, comparison sort or local search algorithms. The rank aggregation problem is a special case of the feedback arc set problem, but in the feedback arc set problem we use only information about the preferred relative ordering of pairs of elements to find a ranking of the elements, whereas in the case of the rank aggregation problem, we have additional information in the form of the complete input rankings. The positional methods are the only algorithms that use this additional information. Since the rank aggregation problem is NP-hard, none of these algorithms is guaranteed to find the optimal solution, and different algorithms will provide different solutions. We give theoretical and practical evidence that a combination of these different approaches gives algorithms that are superior to the individual algorithms. Theoretically, we give lower bounds on the performance for many of the "pure" methods. Practically, we perform an extensive evaluation of the "pure" algorithms and combinations of different approaches. We give three recommendations for which (combination of) methods to use based on whether a user wants to have a very fast, fast or reasonably fast algorithm. (joint work with Anke van Zuylen)

Date:	15:50-16:30, Monday, March 23, 2009

Venue:	FIT Building 1-315, Tsinghua University

Title:	Rank Aggregation: Together We're Strong

Speaker:	Frans Schalekamp

Biography:	Frans Schalekamp (PhD 2007, Cornell University) is a Postdoctoral Researcher at the Institute for Theoretical Computer Science at Tsinghua University, Beijing. During the final stages of writing and after finishing his thesis under the supervision of David Shmoys, he worked at a Computational Biology startup company, Nature Source GenetITCS (NSG) in Ithaca, of which he was the first employee. At NSG his work focused on the design of algorithms to determine populations with good properties for commercial breeding programs. His research interests are in Combinatorial Optimization, Algorithms, Computational Biology, Applied Optimization and Simulation.

Abstract:	We consider the problem of finding a ranking of a set of elements that best represents a given set of input rankings of the elements; more precisely, we want to find a permutation that minimizes the Kendall-tau distance to the input rankings, where the Kendall-tua distance is defined as the sum over all input rankings of the number of pairs of elements that are in a different order in the input ranking than in the output ranking. This problem arises for example in building meta-search engines for Web search, aggregating viewers' rankings of movies, or giving recommendations to a user based on several different criteria, where we can think of having one ranking of the alternatives for each criterion. Many of the approximation algorithms and heuristITCS that have been proposed in the literature are either positional, comparison sort or local search algorithms. The rank aggregation problem is a special case of the feedback arc set problem, but in the feedback arc set problem we use only information about the preferred relative ordering of pairs of elements to find a ranking of the elements, whereas in the case of the rank aggregation problem, we have additional information in the form of the complete input rankings. The positional methods are the only algorithms that use this additional information. Since the rank aggregation problem is NP-hard, none of these algorithms is guaranteed to find the optimal solution, and different algorithms will provide different solutions. We give theoretical and practical evidence that a combination of these different approaches gives algorithms that are superior to the individual algorithms. Theoretically, we give lower bounds on the performance for many of the "pure" methods. Practically, we perform an extensive evaluation of the "pure" algorithms and combinations of different approaches. We give three recommendations for which (combination of) methods to use based on whether a user wants to have a very fast, fast or reasonably fast algorithm. (joint work with Anke van Zuylen)