May – 2016 – RICHARD R. Tromacek

ClusterAnalysis Below is an R implementation of a k-means clustering algorithm written recently for recreational purposes. The algorithm will accept an arbitrary bivariate data set, x, and any integer greater than 1 (the k means) as arguments. The algorithm uses the classic optimizing mechanism:

Begin by randomly choosing k centers among the n points.
Group all points to the nearest of these randomly chosen centers.
Find k new centers as the average of each of the partitions created in step 2.
Repeat this process until stable.

This process will result in a partition of the original data set that minimizes the sum of square distances between the original n points and the final k means. The code includes a sample data set with 4 obvious clusters. This is only meant as an exercise in demonstrating how intuitive this algorithm actually is. Please defer to kmeans() for all your actual k-means clustering needs. Continue reading →

RICHARD R. Tromacek

+ − × ÷ dat

Monthly Archives: May 2016

K-Means Clustering Algorithm