Mulan logo Mulan: A Java Library for Multi-Label Learning

Extending Mulan

Implementing a new learner

In case you have an idea for a new multi-label learner and want to write one for Mulan, this section will help you developing it indicating what methods are to be implemented and giving general guidelines.

Determine the correct package

All learners are located in the mulan.classifier package. Details about this package and the different sub-packages and classes it contains can be found in the API Reference. After reading the API you will probably be able to decide which package is most suitable for your learner. In case none of the packages seems suitable for your learner you can create a new package.

Implement the learner's class

Each multi-label learner must implement the general interface MultiLabelLearner, which identifies all learners and defines basic operations such as building a learner model from MultiLabelInstances and making a prediction for a given test example. It enables performing general tasks on learners, such as evaluation, without any dependency on specific implementation or learner type. The basic implementation of this interface is provided by the MultiLabelLearnerBase class so your new learner's class can just extend this base class.

As you will notice, the sub-packages of the mulan.classifier package contain a class which derives from MultiLabelLearnerBase and serves as a base for multi-label learners that follow the paradigm of the specific package (e.g. TransformationBasedMultiLabelLearner, MultiLabelMetaLearner, MultiLabelKNN). If this is the case you can use this class as a base for your own class.

Basic methods: buildInternal, makePredictionInternal

Whatever your selection in the above step is, each learner's class must implement two basic methods: buildInternal and makePredictionInternal.

The first one is a learner specific implementation of building the model from a MultiLabelInstances training data set. It is called from build(MultiLabelInstances) method of the MultiLabelLearnerBase class, where behavior common across all learners is applied.

The second is a learner specific implementation for predicting on specified data based on a trained model. It is called from makePrediction(Instance) which guards for model initialization and applies common handling/behavior. The method returns the prediction of the learner for a given input instance. The prediction is in the form of a MultiLabelOutput object which is defined by the MultiLabelOutput class. It can carry information about the bipartition of the labels into relevant and irrelevant ones, a ranking of the labels and a confidence for the relevance of each label. Not all of these are mandatory for a valid MultiLabelOutput object. The presence of particular information depends on the capabilities of a learner. Based on the present information, we can deduce the capability of a learner to perform classification, ranking, or both, and calculate the appropriate evaluation measures.

Paper reference(s)

In order to make it easy to generate a bibliography of all the algorithms in Mulan, the paper references were extracted and placed in the code.

Classes that are based on some technical paper should implement the Weka's TechnicalInformationHandler interface and return a customized TechnicalInformation instance. The format used is based on BibTeX and the TechnicalInformation class can either return a plain text string via the toString() method or a real BibTeX entry via the toBibTex() method. This two methods are then used to automatically update the Javadoc (see Javadoc further down) of a class.

Relevant classes:
  • weka.core.TechnicalInformation
  • weka.core.TechnicalInformationHandler

Javadoc

Open-source software is only as good as its documentation. Hence, correct and up-to-date documentation is vital. We decided to automate the Javadoc generation proccess as much as possible following the paradigm of Weka. In the following you will see how to structure your Javadoc to reduce maintainance. For this purpose special comment tags are used, where the content in between will be replaced automatically by the classes listed in Relevant classes.

The indentation of the generated Javadoc depends on the indentation of the < of the starting comment tag.

This general layout order should be used for all classes:

  • class description Javadoc
  • globalinfo
  • bibtex - if available
General

The general description for all classes is produced, with the following method:

globalInfo()

The return value can be placed in the Javadoc, surrounded by the following comment tags:

<!-- globalinfo-start -->
will be automatically replaced
<!-- globalinfo-end -->

Paper reference(s)

If available, the paper reference should also be listed in the Javadoc. To list the full BibTeX documentation:

<!-- technical-bibtex-start -->
will be automatically replaced
<!-- technical-bibtex-end -->

Relevant classes
  • mulan.core.MulanJavadoc

This class uses weka's Javadoc auto-generation classes to generate Javadoc comments and replaces the content between certain comment tags.

Unit Tests

In order to make sure that your classifier applies to the Mulan criteria, you should add your classifier to the junit unit test framework, i.e., by creating a Test class. The test class should extend the MultiLabelLearnerTestBase class which contains a set of common tests that all learners should pass. In addition to these tests you can optionally add your own, to test learner specific behaviour. As you will notice we follow the same package structure for the placement of test classes as in the source folder. In general you should place your test class in the package that is respective to the package which contains the learner's source code.

Relevant classes
  • MultiLabelLearnerTestBase

For detailed instructions on how to run the tests for a specific learner in mulan click here

Implementing a new evaluation measure

Mulan already contains a large variety of measures for the evaluation of multi-label learners. In case you want to evaluate your learner' s performance in a different evaluation measure, not already implemented, you can always extend Mulan by implementing a new evaluation measure. This section will help you developing it, indicating what methods are to be implemented and giving general guidelines.

Determine the correct superclass

Each measure has its own class and resides in the mulan.evaluation.measure package. Details about the classes contained in this package can be found in the API Reference. The measures already in Mulan fall into one of the following categories:

  • Based on bipartitions (Example-based / Label-based)
  • Based on rankings

If your measure falls into one of these categories you can and should choose the appropriate superclass which already contains the common functionality for the measures of that category. In any case you should follow the general guidelines of the next paragraph.

Implement the measure's class

Each evaluation measure must implement the general interface Measure, which identifies all measures and defines basic operations such as updating a measure's value for a new prediction and reseting the measure's value. The basic implementation of this interface is provided by the abstract class MeasureBase so your new measure's class can just extend this base class.

As you will notice, the mulan.evaluation.measure package also contains other abstract classes which derive from MeasureBase and serve as a base for measures that follow the paradigm of the specific class (e.g. RankingMeasureBase, BipartitionMeasureBase, ExampleBasedBipartitionMeasureBase, ExampleBasedRankingMeasureBase). If this is the case you can use this class as a base for your own class.

Basic methods: updateInternal, getIdealValue

Whatever your selection in the above step is, each measure's class must implement two basic methods: updateInternal and getIdealValue.

The first one is a measure specific implementation of computing the value of a measure for the given predicted and true labels.It is called from update(MultiLabelOutput,boolean[]) method of the MeasureBase class, where behavior common across all measures is applied.

The second gets the 'ideal' value of a measure. 'Ideal' means, that the value represents the best achievable performance of a learner on this measure.

SourceForge.net Logo