weka.clusterers
Class CascadeSimpleKMeans

java.lang.Object
  extended by weka.clusterers.AbstractClusterer
      extended by weka.clusterers.RandomizableClusterer
          extended by weka.clusterers.CascadeSimpleKMeans
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, Clusterer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class CascadeSimpleKMeans
extends RandomizableClusterer
implements Clusterer, TechnicalInformationHandler

cascade simple k means, selects the best k according to calinski-harabasz criterion analogous to: http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/cascadeKM.html see Calinski, T. and J. Harabasz. 1974. A dendrite method for cluster analysis. Commun. Stat. 3: 1-27. quoted in German: http://books.google.com/books?id=-f9Ox0p1-D4C&lpg=PA394&ots=SV3JfRIkQn&dq=Calinski%20and%20Harabasz&hl=de&pg=PA394#v=onepage&q&f=false

Author:
Martin Gütlein (martin.guetlein@gmail.com)
See Also:
Serialized Form

Constructor Summary
CascadeSimpleKMeans()
           
 
Method Summary
 void buildClusterer(Instances data)
           
 int clusterInstance(Instance instance)
           
 java.lang.String distanceFunctionTipText()
           
 double[] distributionForInstance(Instance instance)
           
 Capabilities getCapabilities()
           
 DistanceFunction getDistanceFunction()
           
 boolean getInitializeUsingKMeansPlusPlusMethod()
          Get whether to initialize using the probabilistic farthest first like method of the k-means++ algorithm (rather than the standard random selection of initial cluster centers).
 int getMaxIterations()
           
 int getMaxNumClusters()
           
 int getMinNumClusters()
           
 java.lang.String[] getOptions()
          Gets the current settings of SimpleKMeans.
 int getRestarts()
           
 java.lang.String getRevision()
          Returns the revision string.
 TechnicalInformation getTechnicalInformation()
           
 java.lang.String globalInfo()
          Returns a string describing this clusterer.
 java.lang.String initializeUsingKMeansPlusPlusMethodTipText()
          Returns the tip text for this property.
 boolean isManuallySelectNumClusters()
           
 boolean isPrintDebug()
           
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Main method for executing this class.
 java.lang.String manuallySelectNumClustersTipText()
           
 java.lang.String maxIterationsTipText()
           
 java.lang.String maxNumClustersTipText()
           
 java.lang.String minNumClustersTipText()
           
 int numberOfClusters()
           
 java.lang.String printDebugTipText()
           
 java.lang.String restartsTipText()
           
 void setDistanceFunction(DistanceFunction distanceFunction)
           
 void setInitializeUsingKMeansPlusPlusMethod(boolean k)
          Set whether to initialize using the probabilistic farthest first like method of the k-means++ algorithm (rather than the standard random selection of initial cluster centers).
 void setManuallySelectNumClusters(boolean manuallySelectNumClusters)
           
 void setMaxIterations(int maxIterations)
           
 void setMaxNumClusters(int maxNumClusters)
           
 void setMinNumClusters(int minNumClusters)
           
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPrintDebug(boolean printDebug)
           
 void setRestarts(int restarts)
           
 java.lang.String toString()
           
 
Methods inherited from class weka.clusterers.RandomizableClusterer
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.clusterers.AbstractClusterer
forName, makeCopies, makeCopy, runClusterer
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

CascadeSimpleKMeans

public CascadeSimpleKMeans()
Method Detail

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Specified by:
getTechnicalInformation in interface TechnicalInformationHandler

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

globalInfo

public java.lang.String globalInfo()
Returns a string describing this clusterer.

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

buildClusterer

public void buildClusterer(Instances data)
                    throws java.lang.Exception
Specified by:
buildClusterer in interface Clusterer
Specified by:
buildClusterer in class AbstractClusterer
Throws:
java.lang.Exception

clusterInstance

public int clusterInstance(Instance instance)
                    throws java.lang.Exception
Specified by:
clusterInstance in interface Clusterer
Overrides:
clusterInstance in class AbstractClusterer
Throws:
java.lang.Exception

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Specified by:
distributionForInstance in interface Clusterer
Overrides:
distributionForInstance in class AbstractClusterer
Throws:
java.lang.Exception

numberOfClusters

public int numberOfClusters()
                     throws java.lang.Exception
Specified by:
numberOfClusters in interface Clusterer
Specified by:
numberOfClusters in class AbstractClusterer
Throws:
java.lang.Exception

getCapabilities

public Capabilities getCapabilities()
Specified by:
getCapabilities in interface Clusterer
Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class AbstractClusterer

minNumClustersTipText

public java.lang.String minNumClustersTipText()

getMinNumClusters

public int getMinNumClusters()

setMinNumClusters

public void setMinNumClusters(int minNumClusters)

maxNumClustersTipText

public java.lang.String maxNumClustersTipText()

getMaxNumClusters

public int getMaxNumClusters()

setMaxNumClusters

public void setMaxNumClusters(int maxNumClusters)

restartsTipText

public java.lang.String restartsTipText()

getRestarts

public int getRestarts()

setRestarts

public void setRestarts(int restarts)

printDebugTipText

public java.lang.String printDebugTipText()

isPrintDebug

public boolean isPrintDebug()

setPrintDebug

public void setPrintDebug(boolean printDebug)

distanceFunctionTipText

public java.lang.String distanceFunctionTipText()

getDistanceFunction

public DistanceFunction getDistanceFunction()

setDistanceFunction

public void setDistanceFunction(DistanceFunction distanceFunction)

maxIterationsTipText

public java.lang.String maxIterationsTipText()

getMaxIterations

public int getMaxIterations()

setMaxIterations

public void setMaxIterations(int maxIterations)

manuallySelectNumClustersTipText

public java.lang.String manuallySelectNumClustersTipText()

isManuallySelectNumClusters

public boolean isManuallySelectNumClusters()

setManuallySelectNumClusters

public void setManuallySelectNumClusters(boolean manuallySelectNumClusters)

initializeUsingKMeansPlusPlusMethodTipText

public java.lang.String initializeUsingKMeansPlusPlusMethodTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setInitializeUsingKMeansPlusPlusMethod

public void setInitializeUsingKMeansPlusPlusMethod(boolean k)
Set whether to initialize using the probabilistic farthest first like method of the k-means++ algorithm (rather than the standard random selection of initial cluster centers).

Parameters:
k - true if the k-means++ method is to be used to select initial cluster centers.

getInitializeUsingKMeansPlusPlusMethod

public boolean getInitializeUsingKMeansPlusPlusMethod()
Get whether to initialize using the probabilistic farthest first like method of the k-means++ algorithm (rather than the standard random selection of initial cluster centers).

Returns:
true if the k-means++ method is to be used to select initial cluster centers.

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class RandomizableClusterer
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -N <num>
  number of clusters.
  (default 2).
 -P
  Initialize using the k-means++ method.
 
 -V
  Display std. deviations for centroids.
 
 -M
  Replace missing values with mean/mode.
 
 -A <classname and options>
  Distance function to use.
  (default: weka.core.EuclideanDistance)
 -I <num>
  Maximum number of iterations.
 
 -O
  Preserve order of instances.
 
 -fast
  Enables faster distance calculations, using cut-off values.
  Disables the calculation/output of squared errors/distances.
 
 -S <num>
  Random number seed.
  (default 10)

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class RandomizableClusterer
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of SimpleKMeans.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class RandomizableClusterer
Returns:
an array of strings suitable for passing to setOptions()

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class AbstractClusterer
Returns:
the revision

main

public static void main(java.lang.String[] args)
Main method for executing this class.

Parameters:
args - use -h to list all parameters