mulan.data
Class MultiLabelInstances

java.lang.Object
  extended by mulan.data.MultiLabelInstances
All Implemented Interfaces:
Serializable

public class MultiLabelInstances
extends Object
implements Serializable

Implements multi-label instances data set. Multi-label data are stored in Weka's Instances. The class is a convenient wrapper. The data are loaded form data file, checked for valid format. If hierarchy for labels is specified via XML meta-data file, the data file is cross-checked with XML for consistency.

Applied rules:

- label names must be unique

- all labels in XML meta-data must be defined also in ARFF data set

- each label attribute must be nominal with binary values

- if labels has hierarchy, then if child labels indicates true of some data instance, then all its parent labels must indicate also true for that instance

Author:
Jozef Vilcek
See Also:
Serialized Form

Constructor Summary
MultiLabelInstances(InputStream arffDataStream, InputStream xmlLabelsDefStream)
          Creates a new instance of MultiLabelInstances data from the supplied InputStream data source.
MultiLabelInstances(InputStream arffDataStream, int numLabelAttributes)
          Creates a new instance of MultiLabelInstances data from the supplied InputStream data source.
MultiLabelInstances(Instances dataSet, LabelsMetaData labelsMetaData)
          Creates a new instance of MultiLabelInstances data from existing Instances and LabelsMetaData.
MultiLabelInstances(Instances data, String xmlLabelsDefFilePath)
          Creates a new instance of MultiLabelInstances data.
MultiLabelInstances(String arffFilePath, int numLabelAttributes)
          Creates a new instance of MultiLabelInstances data.
MultiLabelInstances(String arffFilePath, String xmlLabelsDefFilePath)
          Creates a new instance of MultiLabelInstances data.
 
Method Summary
 MultiLabelInstances clone()
          Returns a deep copy of the MultiLabelInstances instance.
 double getCardinality()
          Gets the cardinality of the dataset
 Instances getDataSet()
          Gets underlying Instances, which contains all data.
 int getDepth(String labelName)
          Calculates the depth of a label, in the Hierarchy of the tree of labels.
 Set<Attribute> getFeatureAttributes()
          Gets the Set of feature Attribute instances of this MultiLabelInstances instance.
 int[] getFeatureIndices()
          Gets the array with indices of feature attributes stored in underlying Instances data set.
 Set<Attribute> getLabelAttributes()
          Gets the Set of label Attribute instances of this MultiLabelInstances instance.
 HashMap<String,Integer> getLabelDepth()
          Create a HashMap that contains every label, with its depth in the Hierarchical tree
 int[] getLabelDepthIndices()
          Returns the depth of the labels
 int[] getLabelIndices()
           
 LabelsMetaData getLabelsMetaData()
          Gets the LabelsMetaData instance, which contains descriptive meta-data about label attributes stored in underlying Instances data set.
 Map<String,Integer> getLabelsOrder()
           
 int getNumInstances()
          Gets the number of instances
 int getNumLabels()
          Gets the number of labels (label attributes)
 boolean hasMissingLabels(Instance instance)
          Method that checks whether an instance has missing labels
 MultiLabelInstances reintegrateModifiedDataSet(Instances modifiedDataSet)
          If Instances data set are retrieved from MultiLabelInstances and post-processed, modified by custom code, it can be again reintegrated into MultiLabelInstances if needed.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultiLabelInstances

public MultiLabelInstances(String arffFilePath,
                           int numLabelAttributes)
                    throws InvalidDataFormatException
Creates a new instance of MultiLabelInstances data. The label attributes are assumed to be at the end of ARFF data file. The count is specified by parameter. Based on these attributes the LabelsMetaData are created.

Parameters:
arffFilePath - the path to ARFF file containing the data
numLabelAttributes - the number of ARFF data set attributes which are labels.
Throws:
ArgumentNullException - if arrfFilePath is null
IllegalArgumentException - if numLabelAttribures is less than 2
InvalidDataFormatException - if format of loaded multi-label data is invalid
DataLoadException - if ARFF data file can not be loaded

MultiLabelInstances

public MultiLabelInstances(InputStream arffDataStream,
                           int numLabelAttributes)
                    throws InvalidDataFormatException
Creates a new instance of MultiLabelInstances data from the supplied InputStream data source. The data in the stream are assumed to be in ARFF format. The label attributes in ARFF data are assumed to be the last ones. Based on those attributes the LabelsMetaData are created.

Parameters:
arffDataStream - the InputStream data source to load data in ARFF format
numLabelAttributes - the number of last ARFF data set attributes which are labels.
Throws:
ArgumentNullException - if InputStream data source is null
IllegalArgumentException - if number of labels attributes is less than 2
InvalidDataFormatException - if format of loaded multi-label data is invalid
DataLoadException - if ARFF data can not be loaded

MultiLabelInstances

public MultiLabelInstances(Instances data,
                           String xmlLabelsDefFilePath)
                    throws InvalidDataFormatException
Creates a new instance of MultiLabelInstances data. The Instances object and labels meta-data are loaded separately. The load failure is indicated by DataLoadException. When data are loaded, validations are applied to ensure consistency between ARFF data and specified labels meta-data.

Parameters:
data - the Instances object containing the data
xmlLabelsDefFilePath - the path to XML file containing labels meta-data
Throws:
IllegalArgumentException - if input parameters refers to non-existing files
InvalidDataFormatException - if format of loaded multi-label data is invalid
DataLoadException - if XML meta-data of ARFF data file can not be loaded

MultiLabelInstances

public MultiLabelInstances(String arffFilePath,
                           String xmlLabelsDefFilePath)
                    throws InvalidDataFormatException
Creates a new instance of MultiLabelInstances data. The ARFF data file and labels meta-data are loaded separately. The load failure is indicated by DataLoadException. When data are loaded, validations are applied to ensure consistency between ARFF data and specified labels meta-data.

Parameters:
arffFilePath - the path to ARFF file containing the data
xmlLabelsDefFilePath - the path to XML file containing labels meta-data
Throws:
ArgumentNullException - if input parameters are null
IllegalArgumentException - if input parameters refers to non-existing files
InvalidDataFormatException - if format of loaded multi-label data is invalid
DataLoadException - if XML meta-data of ARFF data file can not be loaded

MultiLabelInstances

public MultiLabelInstances(InputStream arffDataStream,
                           InputStream xmlLabelsDefStream)
                    throws InvalidDataFormatException
Creates a new instance of MultiLabelInstances data from the supplied InputStream data source. The data in the stream are assumed to be in ARFF format. The labels meta data for ARFF data are retrieved separately from the different InputStream data source. The meta data are assumed to be in XML format and conform to valid schema. Data load load failure is indicated by DataLoadException. When data are loaded, validations are applied to ensure consistency between ARFF data and specified labels meta-data.

Parameters:
arffDataStream - the InputStream data source to load data in ARFF format
xmlLabelsDefStream - the InputStream data source to load XML labels meta data
Throws:
ArgumentNullException - if input parameters are null
IllegalArgumentException - if input parameters refers to non-existing files
InvalidDataFormatException - if format of loaded multi-label data is invalid
DataLoadException - if XML meta-data of ARFF data can not be loaded

MultiLabelInstances

public MultiLabelInstances(Instances dataSet,
                           LabelsMetaData labelsMetaData)
                    throws InvalidDataFormatException
Creates a new instance of MultiLabelInstances data from existing Instances and LabelsMetaData. The input parameters are not copied. Internally are stored only references.

The data set and labels meta data are validated against each other. Any violation of validation criteria result in InvalidDataFormatException.

Parameters:
dataSet - the data set with data instances in multi-label format
labelsMetaData - the meta-data about label attributes of data set
Throws:
IllegalArgumentException - if input parameters are null
InvalidDataFormatException - if multi-label data format is not valid
Method Detail

getNumLabels

public int getNumLabels()
Gets the number of labels (label attributes)

Returns:
number of labels

getNumInstances

public int getNumInstances()
Gets the number of instances

Returns:
number of instances

getCardinality

public double getCardinality()
Gets the cardinality of the dataset

Returns:
dataset cardinality

getLabelIndices

public int[] getLabelIndices()
Returns:
an array with the indices of the label attributes inside the Instances object

getLabelsOrder

public Map<String,Integer> getLabelsOrder()
Returns:
a mapping of attribute names and their indices Instances object

getLabelAttributes

public Set<Attribute> getLabelAttributes()
Gets the Set of label Attribute instances of this MultiLabelInstances instance.

Returns:
the Set of label Attribute instances

getFeatureIndices

public int[] getFeatureIndices()
Gets the array with indices of feature attributes stored in underlying Instances data set.

Returns:
an array with the indices of the feature attributes

getFeatureAttributes

public Set<Attribute> getFeatureAttributes()
Gets the Set of feature Attribute instances of this MultiLabelInstances instance.

Returns:
the Set of feature Attribute instances

getLabelsMetaData

public LabelsMetaData getLabelsMetaData()
Gets the LabelsMetaData instance, which contains descriptive meta-data about label attributes stored in underlying Instances data set.

Returns:
descriptive meta-data about label attributes

getDataSet

public Instances getDataSet()
Gets underlying Instances, which contains all data.

Returns:
underlying Instances object which contains all data

reintegrateModifiedDataSet

public MultiLabelInstances reintegrateModifiedDataSet(Instances modifiedDataSet)
                                               throws InvalidDataFormatException
If Instances data set are retrieved from MultiLabelInstances and post-processed, modified by custom code, it can be again reintegrated into MultiLabelInstances if needed. The underlying LabelsMetaData are modified to reflect changes in data set. The method creates new instance of MultiLabelInstances with modified data set and new meta-data.

The supported changes are:

- remove of label Attribute to the existing Instances

- add/remove of Instance from the existing Instances

- add/remove of feature/predictor Attribute to the existing Instances

Parameters:
modifiedDataSet - the modified data set
Returns:
the modified data set
Throws:
IllegalArgumentException - if specified modified data set is null
InvalidDataFormatException - if multi-label data format with specified modifications is not valid

clone

public MultiLabelInstances clone()
Returns a deep copy of the MultiLabelInstances instance.

Overrides:
clone in class Object

getLabelDepth

public HashMap<String,Integer> getLabelDepth()
Create a HashMap that contains every label, with its depth in the Hierarchical tree

Returns:
a HashMap that contains every label with its depth in the Hierarchical tree

getDepth

public int getDepth(String labelName)
Calculates the depth of a label, in the Hierarchy of the tree of labels. Returns the counter of every level. We define the root node label that has the depth 1

Parameters:
labelName -
Returns:
the depth of a label

getLabelDepthIndices

public int[] getLabelDepthIndices()
Returns the depth of the labels

Returns:
the depth of the labels

hasMissingLabels

public boolean hasMissingLabels(Instance instance)
Method that checks whether an instance has missing labels

Parameters:
instance - one instance of this dataset
Returns:
true if the instance has missing labels