weka.filters.unsupervised.instance
Class Denormalize

java.lang.Object
  extended by weka.filters.Filter
      extended by weka.filters.unsupervised.instance.Denormalize
All Implemented Interfaces:
java.io.Serializable, CapabilitiesHandler, OptionHandler, RevisionHandler, StreamableFilter, UnsupervisedFilter

public class Denormalize
extends Filter
implements UnsupervisedFilter, OptionHandler, StreamableFilter

An instance filter that collapses instances with a common grouping ID value into a single instance. Useful for converting transactional data into a format that Weka's association rule learners can handle. IMPORTANT: assumes that the incoming batch of instances has been sorted on the grouping attribute. The values of nominal attributes are converted to indicator attributes. These can be either binary (with f and t values) or unary with missing values used to indicate absence. The later is Weka's old market basket format, which is useful for Apriori. Numeric attributes can be aggregated within groups by computing the average, sum, minimum or maximum.

Valid options are:

 -G <index | name | first | last>
  Index or name of attribute to group by. e.g. transaction ID
  (default: first)
 -B
  Output instances in Weka's old market basket format (i.e. unary attributes with absence indicated
   by missing values.
 -S
  Output sparse instances (can't be used in conjunction with -B)
 -A <Average | Sum | Maximum | Minimum>
  Aggregation function for numeric attributes.
  (default: sum).

Version:
$Revision: 8109 $
Author:
Mark Hall (mhall{[at]}pentaho{[dot]}com)
See Also:
Serialized Form

Nested Class Summary
static class Denormalize.NumericAggregation
          Enumeration of the aggregation methods for numeric attributes
 
Field Summary
static Tag[] TAGS_SELECTION
          tags
 
Constructor Summary
Denormalize()
           
 
Method Summary
 java.lang.String aggregationTypeTipText()
          Returns a description of this option suitable for display as a tip text in the gui.
 boolean batchFinished()
          Signify that this batch of input to the filter is finished.
 SelectedTag getAggregationType()
          Get the type of aggregation to use on numeric values withn a group.
 Capabilities getCapabilities()
          Returns the Capabilities of this filter.
 java.lang.String getGroupingAttribute()
          Get the name/index of the attribute to be used for grouping rows (tranasactions).
 java.lang.String[] getOptions()
          Gets the current settings of the filter.
 java.lang.String getRevision()
          Returns the revision string.
 boolean getUseOldMarketBasketFormat()
          Gets whether data is to be output in Weka's old market basket format.
 boolean getUseSparseFormat()
          Get whether sparse data is to be output.
 java.lang.String globalInfo()
          Returns a string describing this associator
 java.lang.String groupingAttributeTipText()
          Returns a description of this option suitable for display as a tip text in the gui.
 boolean input(Instance instance)
          Input an instance for filtering.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Main method for testing this class.
 void setAggregationType(SelectedTag d)
          Set the type of aggregation to use on numeric values within a group.
 void setGroupingAttribute(java.lang.String groupAtt)
          Set the name or index of the attribute to use for grouping rows (transactions).
 boolean setInputFormat(Instances instanceInfo)
          Sets the format of the input instances.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setUseOldMarketBasketFormat(boolean m)
          Set whether to output data in Weka's old market basket format.
 void setUseSparseFormat(boolean s)
          Set whether to output sparse data.
 java.lang.String useOldMarketBasketFormatTipText()
          Returns a description of this option suitable for display as a tip text in the gui.
 java.lang.String useSparseFormatTipText()
          Returns a description of this option suitable for display as a tip text in the gui.
 
Methods inherited from class weka.filters.Filter
batchFilterFile, filterFile, getCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputPeek, runFilter, toString, useFilter, wekaStaticWrapper
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

TAGS_SELECTION

public static final Tag[] TAGS_SELECTION
tags

Constructor Detail

Denormalize

public Denormalize()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this associator

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

getCapabilities

public Capabilities getCapabilities()
Returns the Capabilities of this filter.

Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class Filter
Returns:
the capabilities of this object
See Also:
Capabilities

setInputFormat

public boolean setInputFormat(Instances instanceInfo)
                       throws java.lang.Exception
Sets the format of the input instances.

Overrides:
setInputFormat in class Filter
Parameters:
instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
Returns:
true if the outputFormat may be collected immediately
Throws:
java.lang.Exception - if the inputFormat can't be set successfully

input

public boolean input(Instance instance)
              throws java.lang.Exception
Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output.

Overrides:
input in class Filter
Parameters:
instance - the input instance
Returns:
true if the filtered instance may now be collected with output().
Throws:
java.lang.IllegalStateException - if no input format has been defined.
java.lang.Exception

batchFinished

public boolean batchFinished()
                      throws java.lang.Exception
Signify that this batch of input to the filter is finished.

Overrides:
batchFinished in class Filter
Returns:
true if there are instances pending output
Throws:
java.lang.IllegalStateException - if no input structure has been defined
java.lang.Exception

groupingAttributeTipText

public java.lang.String groupingAttributeTipText()
Returns a description of this option suitable for display as a tip text in the gui.

Returns:
description of this option

setGroupingAttribute

public void setGroupingAttribute(java.lang.String groupAtt)
Set the name or index of the attribute to use for grouping rows (transactions). "first" and "last" may also be used.

Parameters:
groupAtt - the name/index of the attribute to use for grouping

getGroupingAttribute

public java.lang.String getGroupingAttribute()
Get the name/index of the attribute to be used for grouping rows (tranasactions).

Returns:
the name/index of the attribute to use for grouping.

setUseOldMarketBasketFormat

public void setUseOldMarketBasketFormat(boolean m)
Set whether to output data in Weka's old market basket format. This format uses unary attributes and missing values to indicate absence. Apriori works best on market basket type data in this format.

Parameters:
m - true if data is to be output in Weka's old market basket format.

getUseOldMarketBasketFormat

public boolean getUseOldMarketBasketFormat()
Gets whether data is to be output in Weka's old market basket format.

Returns:
true if data is to be output in Weka's old market basket format.

useOldMarketBasketFormatTipText

public java.lang.String useOldMarketBasketFormatTipText()
Returns a description of this option suitable for display as a tip text in the gui.

Returns:
description of this option

setUseSparseFormat

public void setUseSparseFormat(boolean s)
Set whether to output sparse data. Only one or the other of this option or oldMarketBasketFormat can be used.

Parameters:
s - true if sparse data is to be output.

getUseSparseFormat

public boolean getUseSparseFormat()
Get whether sparse data is to be output.

Returns:
true if sparse data is to be output.

useSparseFormatTipText

public java.lang.String useSparseFormatTipText()
Returns a description of this option suitable for display as a tip text in the gui.

Returns:
description of this option

setAggregationType

public void setAggregationType(SelectedTag d)
Set the type of aggregation to use on numeric values within a group. Available choices are: Sum, Average, Min and Max.

Parameters:
d - the type of aggregation to use for numeric values.

getAggregationType

public SelectedTag getAggregationType()
Get the type of aggregation to use on numeric values withn a group. Available choices are: Sum, Average, Min and Max.

Returns:
the type of aggregation to use for numeric values.

aggregationTypeTipText

public java.lang.String aggregationTypeTipText()
Returns a description of this option suitable for display as a tip text in the gui.

Returns:
description of this option

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -G <index | name | first | last>
  Index or name of attribute to group by. e.g. transaction ID
  (default: first)
 -B
  Output instances in Weka's old market basket format (i.e. unary attributes with absence indicated
   by missing values.
 -S
  Output sparse instances (can't be used in conjunction with -B)
 -A <Average | Sum | Maximum | Minimum>
  Aggregation function for numeric attributes.
  (default: sum).

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the filter.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class Filter
Returns:
the revision

main

public static void main(java.lang.String[] args)
Main method for testing this class.

Parameters:
args - should contain arguments to the filter: use -h for help