Clustering in trees: Optimizing cluster sizes and number of subtrees

Summary

Summary: This paper considers partitioning the vertices of an n-vertex tree into p disjoint sets C1, C2, . . . , Cp, called clusters so that the number of vertices in a cluster and the number of subtrees in a cluster are minimized. For this NP-hard problem we present greedy heuristics which differ in (i) how subtrees are identified (using either a best-fit, good-fit, or first-fit selection criteria), (ii) whether clusters are filled one at a time or simultaneously, and (iii) how much cluster sizes can differ from the ideal size of c vertices per cluster, n = cp. The last criteria is controlled by a constant $\alpha , 0 \leq \alpha < 1$, such that cluster Ci satisfies ($1 - \alpha 2$)c $\leq$|Ci| $\leq c(1 + \alpha ), 1 \leq i \leq p$. For algorithms resulting from combinations of these criteria we develop worst-case bounds on the number of subtrees in a cluster in terms of c, $\alpha$, and the maximum degree of a vertex. We present experimental results which give insight into how parameters c, $\alpha$, and the maximum degree of a vertex impact the number of subtrees and the cluster sizes. Communicated by G. Liotta: submitted November 1999, revised August 2000.