K-Means Clustering Algorithm

ClusterAnalysisBelow is an R implementation of a k-means clustering algorithm written recently for recreational purposes.  The algorithm will accept an arbitrary bivariate data set, x, and any integer greater than 1 (the k means) as arguments.  The algorithm uses the classic optimizing mechanism:

  1. Begin by randomly choosing k centers among the n points.
  2. Group all points to the nearest of these randomly chosen centers.
  3. Find k new centers as the average of each of the partitions created in step 2.
  4. Repeat this process until stable.

This process will result in a partition of the original data set that minimizes the sum of square distances between the original n points and the final k means.  The code includes a sample data set with 4 obvious clusters.  This is only meant as an exercise in demonstrating how intuitive this algorithm actually is.  Please defer to kmeans() for all your actual k-means clustering needs. Continue reading


Monte Carlo Markov Chain Solution to the 0/1 Knapsack Problem

MCMCBelow is an R program that will optimize a particular knapsack using the Metropolis-Hasting algorithm, a Monte Carlo Markov chain.  The beautiful MH algorithm has been a recent focus of mine and I am finding that it’s applications are basically limitless.  I’ve also posted this one to my hub. The knapsack problem (in this case, a 0/1 knapsack problem) is a classic optimization problem in computer science. Many approaches exist for solving problem, but I haven’t seen many that exceed MCMC in terms of sheer efficiency and elegance. Continue reading