mulan.data
Class Statistics

java.lang.Object
  extended by mulan.data.Statistics
All Implemented Interfaces:
Serializable, TechnicalInformationHandler

public class Statistics
extends Object
implements Serializable, TechnicalInformationHandler

Class for calculating statistics of a multi-label dataset. For more information, see

Tsoumakas, Grigorios, Katakis, Ioannis, Vlahavas, Ioannis: Mining Multi-Label Data. In Maimon, Oded and Rokach, Lior, editors, Data Mining and Knowledge Discovery Handbook, 667-685, 2010.

BibTeX:

 @incollection{Tsoumakas2010,
    author = {Tsoumakas, Grigorios and Katakis, Ioannis and Vlahavas, Ioannis},
    booktitle = {Data Mining and Knowledge Discovery Handbook},
    edition = {2nd},
    editor = {Maimon, Oded and Rokach, Lior},
    pages = {667-685},
    publisher = {Springer},
    title = {Mining Multi-Label Data},
    year = {2010}
 }
 

Version:
2012.02.06
Author:
Grigorios Tsoumakas, Robert Friberg
See Also:
Serialized Form

Constructor Summary
Statistics()
           
 
Method Summary
 double[][] calculateCoocurrence(MultiLabelInstances mdata)
          This method calculates and prints a matrix with the coocurrences of
pairs of labels
 double[][] calculatePhi(MultiLabelInstances dataSet)
          Calculates phi correlation
 void calculateStats(MultiLabelInstances mlData)
          calculates various multilabel statistics, such as label cardinality,
label density and the set of distinct labels along with their frequency
 double cardinality()
          returns the label cardinality of the dataset
 double density()
          returns the label density of the dataset
 double[] getPhiHistogram()
          Calculates a histogram of phi correlations
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 String globalInfo()
          Returns a string describing this class.
 HashMap<LabelSet,Integer> labelCombCount()
          returns the HashMap containing the distinct labelsets and their frequencies
 int labelFrequency(LabelSet x)
          returns the frequency of a labelset in the dataset
 Set<LabelSet> labelSets()
          returns a set with the distinct labelsets of the dataset
 void printPhiCorrelations()
          Prints out phi correlations
 void printPhiDiagram(double step)
          This method prints data, useful for the visualization of Phi per dataset.
 double[] priors()
          returns the prior probabilities of the labels
 int[] topPhiCorrelatedLabels(int labelIndex, int k)
          Returns the indices of the labels that have the strongest phi correlation with the label which is given as a parameter.
 String toString()
          returns various multilabel statistics in textual representation
 int[] uncorrelatedLabels(int labelIndex, double bound)
          returns the indices of the labels whose phi coefficient values lie between -bound <= phi <= bound
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Statistics

public Statistics()
Method Detail

labelCombCount

public HashMap<LabelSet,Integer> labelCombCount()
returns the HashMap containing the distinct labelsets and their frequencies

Returns:
HashMap with distinct labelsest and their frequencies

calculateCoocurrence

public double[][] calculateCoocurrence(MultiLabelInstances mdata)
This method calculates and prints a matrix with the coocurrences of
pairs of labels

Parameters:
mdata - a multi-label data set
Returns:
a matrix of co-occurences

calculateStats

public void calculateStats(MultiLabelInstances mlData)
calculates various multilabel statistics, such as label cardinality,
label density and the set of distinct labels along with their frequency

Parameters:
mlData - a multi-label dataset

calculatePhi

public double[][] calculatePhi(MultiLabelInstances dataSet)
                        throws Exception
Calculates phi correlation

Parameters:
dataSet - a multi-label dataset
Returns:
a matrix containing phi correlations
Throws:
Exception

printPhiCorrelations

public void printPhiCorrelations()
Prints out phi correlations


getPhiHistogram

public double[] getPhiHistogram()
Calculates a histogram of phi correlations

Returns:
an array with phi correlations

uncorrelatedLabels

public int[] uncorrelatedLabels(int labelIndex,
                                double bound)
returns the indices of the labels whose phi coefficient values lie between -bound <= phi <= bound

Parameters:
labelIndex -
bound -
Returns:
the indices of the labels whose phi coefficient values lie between -bound <= phi <= bound

topPhiCorrelatedLabels

public int[] topPhiCorrelatedLabels(int labelIndex,
                                    int k)
Returns the indices of the labels that have the strongest phi correlation with the label which is given as a parameter. The second parameter is the number of labels that will be returned.

Parameters:
labelIndex -
k -
Returns:
the indices of the k most correlated labels

printPhiDiagram

public void printPhiDiagram(double step)
This method prints data, useful for the visualization of Phi per dataset. It prints int(1/step) + 1 pairs of values. The first value of each pair is the phi value and the second is the average number of labels that correlate to the rest of the labels with correlation higher than the specified phi value;

Parameters:
step - the phi value increment step

toString

public String toString()
returns various multilabel statistics in textual representation

Overrides:
toString in class Object

priors

public double[] priors()
returns the prior probabilities of the labels

Returns:
array of prior probabilities of labels

cardinality

public double cardinality()
returns the label cardinality of the dataset

Returns:
label cardinality

density

public double density()
returns the label density of the dataset

Returns:
label density

labelSets

public Set<LabelSet> labelSets()
returns a set with the distinct labelsets of the dataset

Returns:
set of distinct labelsets

labelFrequency

public int labelFrequency(LabelSet x)
returns the frequency of a labelset in the dataset

Parameters:
x - a labelset
Returns:
the frequency of the given labelset

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

globalInfo

public String globalInfo()
Returns a string describing this class.

Returns:
a description suitable for displaying in a future gui