Mulan logo Mulan: A Java Library for Multi-Label Learning

Data Format

Mulan requires two files for the specification of a multi-label dataset. The first one is a text file in the ARFF format of Weka. The labels should be specified as nominal attributes with two values "0" and "1" indicating absence or existence of the label respectively.

Here is an example of an ARFF file with 3 features and 5 labels:

@relation MultiLabelExample @attribute feature1 numeric @attribute feature2 numeric @attribute feature3 numeric @attribute label1 {0, 1} @attribute label2 {0, 1} @attribute label3 {0, 1} @attribute label4 {0, 1} @attribute label5 {0, 1} @data 2.3,5.6,1.4,0,1,1,0,0

The second file is also a text file in XML format, specifying the labels and any hierarchical relationship among them. The XML schema specifying the format can be found here. The following example is an XML file corresponding to the ARFF file above:

<labels xmlns="http://mulan.sourceforge.net/labels"> <label name="label1"></label> <label name="label2"></label> <label name="label3"></label> <label name="label4"></label> <label name="label5"></label> </labels>

Note that the label attributes need not be the last X number of attributes in the ARFF file nor does their order matter both at the ARFF and the XML file. The following two files constitute also a perfectly acceptable representation of the same data.

ARFF file:

@relation MultiLabelExample @attribute feature1 numeric @attribute label3 {0, 1} @attribute feature2 numeric @attribute label1 {0, 1} @attribute feature3 numeric @attribute label4 {0, 1} @attribute label5 {0, 1} @attribute label2 {0, 1} @data 2.3,1,5.6,0,1.4,0,0,1

XML file:

<?xml version="1.0" encoding="utf-8"?> <labels xmlns="<nowiki>http://mulan.sourceforge.net/labels</nowiki>"> <label name="label5"></label> <label name="label1"></label> <label name="label3"></label> <label name="label2"></label> <label name="label4"></label> </labels>

Hierarchies of labels can be expressed in the XML file by nesting the label tag. A sample file corresponding to a hierarchy follows:

<?xml version="1.0" encoding="utf-8"?> <labels xmlns="http://mulan.sourceforge.net/labels"> <label name="sports"> <label name="football"></label> <label name="basketball"></label> </label> <label name="arts"> <label name="sculpture"></label> <label name="photography"></label> </label> </labels>

SourceForge.net Logo