SYNOPSIS

Public Member Functions

RefinedStart (const size_t samplings=100, const double percentage=0.02)

Create the RefinedStart object, optionally specifying parameters for the number of samplings to perform and the percentage of the dataset to use in each sampling. template<typename MatType > void Cluster (const MatType &data, const size_t clusters, arma::Col< size_t > &assignments) const

Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper. double Percentage () const

Get the percentage of the data used by each subsampling. double & Percentage ()

Modify the percentage of the data used by each subsampling. size_t Samplings () const

Get the number of samplings that will be performed. size_t & Samplings ()

Modify the number of samplings that will be performed.

Private Attributes

double percentage

The percentage of the data to use for each subsampling. size_t samplings

The number of samplings to perform.

Detailed Description

A refined approach for choosing initial points for k-means clustering.

This approach runs k-means several times on random subsets of the data, and then clusters those solutions to select refined initial cluster assignments. It is an implementation of the following paper:

{bradley1998refining, title={Refining initial points for k-means clustering}, author={Bradley, Paul S and Fayyad, Usama M}, booktitle={Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998)}, volume={66}, year={1998} }

Definition at line 47 of file refined_start.hpp.

Constructor & Destructor Documentation

mlpack::kmeans::RefinedStart::RefinedStart (const size_tsamplings = \fC100\fP, const doublepercentage = \fC0.02\fP)\fC [inline]\fP

Create the RefinedStart object, optionally specifying parameters for the number of samplings to perform and the percentage of the dataset to use in each sampling.

Definition at line 55 of file refined_start.hpp.

Member Function Documentation

template<typename MatType > void mlpack::kmeans::RefinedStart::Cluster (const MatType &data, const size_tclusters, arma::Col< size_t > &assignments) const

Partition the given dataset into the given number of clusters according to the random sampling scheme outlined in Bradley and Fayyad's paper.

Template Parameters:

MatType Type of data (arma::mat or arma::sp_mat).

Parameters:

data Dataset to partition.

clusters Number of clusters to split dataset into.

assignments Vector to store cluster assignments into. Values will be between 0 and (clusters - 1).

double mlpack::kmeans::RefinedStart::Percentage () const\fC [inline]\fP

Get the percentage of the data used by each subsampling.

Definition at line 80 of file refined_start.hpp.

References percentage.

double& mlpack::kmeans::RefinedStart::Percentage ()\fC [inline]\fP

Modify the percentage of the data used by each subsampling.

Definition at line 82 of file refined_start.hpp.

References percentage.

size_t mlpack::kmeans::RefinedStart::Samplings () const\fC [inline]\fP

Get the number of samplings that will be performed.

Definition at line 75 of file refined_start.hpp.

References samplings.

size_t& mlpack::kmeans::RefinedStart::Samplings ()\fC [inline]\fP

Modify the number of samplings that will be performed.

Definition at line 77 of file refined_start.hpp.

References samplings.

Member Data Documentation

double mlpack::kmeans::RefinedStart::percentage\fC [private]\fP

The percentage of the data to use for each subsampling.

Definition at line 88 of file refined_start.hpp.

Referenced by Percentage().

size_t mlpack::kmeans::RefinedStart::samplings\fC [private]\fP

The number of samplings to perform.

Definition at line 86 of file refined_start.hpp.

Referenced by Samplings().

Author

Generated automatically by Doxygen for MLPACK from the source code.