IBLStreams MOA Extension

 

Abstract

IBLStreams is an instance-based learning algorithm for classification and regression on data streams. The method is able to handle large streams with low requirements in terms of memory and computational power. Moreover, it disposes of mechanisms for adapting to concept drift and concept shift. The implementation of IBLStreams is supposed to be used as an extension to the MOA (Massive Online Analysis) framework for data stream mining. Thus, prior to using the IBLStreams software, you need to install MOA, which can be downloaded directly from this link. Both MOA and its extensions are implemented in JAVA.

Documentation

 

Prediction Strategies


In instance-based learning, a prediction for the query instance is obtained by combining, in one way or the other, the outputs of the neighbors of this instance in the training data. The type of aggregation depends on the type of problem to be solved. We offer four different prediction schemes, namely the WeightedMode for classification, the WeightedMedian for ordinal classification, and the WeightedMean and LocalLinearRegression for regression problems.

Classification

In conventional classification, the target attribute has a nominal scale, i.e., the set of classes is simply a finite set. In ordinal (aka ordered) classification, the set of classes is finite, too, but equipped with a total order relation; that is, the class labels can be put in a natural order (e.g., hotel categories *, **,***, ****, *****). Ordered classification can be enabled by using the option "-s WMedianOClass"; in this case, the WeightedMedian prediction is used, which is suitable for minimizing the absolute error loss function (predicting the i-th class although the j-th class is correct yields an error of abs(i-j)). Leaving this option empty is equivalent to using the default value "-s WModeClass", in which case the WeightedMode is returned; this prediction is a proper risk minimizer for the standard 0/1 loss (i.e., the loss is 0 if the predicted class is correct and 1 otherwise).

Regression

In regression, the target attribute is numerical, and loss is typically measured in terms of the absolute or squared difference between predicted and true output. Corresponding prediction problems can be solved in two ways. First, the target value can be estimated by the weighted mean of the target values of the k neighbor instances; this prediction is obtained by using the option "-s WMeanReg", which sets the PredictionStrategy parameter to WeightedMean(Regression). Second, a prediction can be derived by means of locally weighted linear regression. In this case, a (local) linear regression model is fitted to the k nearest neighbors, and this model is used to make a prediction for the query instance. For this approach, the PredictionStrategy parameter must be set to LocalLinearRegression by using "-s LocLinReg".
 

Adaptation Strategies


IBLStreams offers two possibilities to adapt the size of the neighborhood, which has an important influence on the performance of an instance-based learner. Still, in case you don't want to allow the learner to adapt, the AdaptationStrategy parameter can be set to none by using the option "-a none". The two adaptation strategies are:

Adapting k

Here, the size of the neighborhood is controlled by the number of neighbors, k, and the algorithm adapts this value by continuously checking whether it appears beneficial to increase or decrease the current value by 1. To activate this adaptation method, you have to use "-a AdaptK", which is equivalent to setting the AdaptationStrategy parameter in the GUI to AdaptK. You can also set some other parameters related to k:

  • InitialKValue: The initial value for k. The default value is 16, while we recommend to set k equal to 4 times the number of attributes.
  • MinKValue: The smallest allowed value for k. The default value is 4, while we recommend 2 times the number of attributes.
  • MaxKValue: The largest allowed value for k. The default value is 40, while we recommend 10 times the number of attributes.
    Please note that setting both MinKValue and MaxKValue is meaningless unless AdaptK is used as an adaptation strategy.
  • UseDefaultK: Activation of this flag by using the "-g" option in the command line or enabling the corresponding checkbox in the GUI means that the initial value for k is set to 4*(num. dimensions); setting this flag overrides the InitialKValue parameter.

Adapting the kernel width

The second strategy controls the size of the neighborhood indirectly via the weighting function or, more specifically, the corresponding kernel width. Using this adaptation strategy only makes sense in combination with the Gaussian or the exponential kernel. To activate this adaptation strategy, use the option "-a AdaptSigma", i.e., AdaptSigma for the AdaptationStrategy parameter. Like in the case of AdaptK, the algorithm will then continuously check whether increasing or decreasing the kernel width sigma (by a certain percentage) appears to be beneficial. The following parameters are related to the kernel width:

  • InitialSigma: The initial value of sigma. The default value is 0.5. Usage "-q 0.6".
  • UseDefaultSigma: By activating this flag, sigma will be set to the recommended value sqrt(num. dimensions)/10. Setting this flag overrides the sigma value even if it was defined. Usage "-h".
     

Weighting Methods


We offer five different methods for weighting the influence of the k neighbors of the query instances:

  • equal: All neighbors are given equal weight, namely 1/k. Usage: "-w equal"
  • inverseDistance: The weight of an instance is proportional to 1/d, where d is its distance from the query. Usage: "-w inverseDistance"
  • linear: The weight function is a linearly decreasing function of the distance d. The slope is determined such that the neighbor with the highest distance has a small weight of 0.001 (while an instance with distance 0 would have a weight of 1). Usage: "-w linear"
  • GaussianKernel: Neighbors are weighted by centering a Gaussian kernel at the query instance. Usage: "-w GaussianKernel"
  • exponentialKernel: Neighbors are weighted by centering an exponential kernel at the query instance. Usage: "-w exponentialKernel"

Please note that, prior to the prediction, all weights are normalized so that the sum of weights equals 1.
 

Required Libraries


The required libraries can be downloaded with our package in the same compressed file.
MOA (Massive Online Analysis), the MOA framework under which our package is working.
XXL (eXtensible and fleXible Library) offers a rich collection of index structures.
JAMA (Java Matrix Package) is a basic linear algebra package for Java.
 

Additional Parameters


Two further parameters have not been explained so far:

  • InitialWidth defines the size of the initial set of instances on which IBLStreams produces an intial model (in a batch manner). After having seen this number of instances, the algorithm switches to its incremental mode of learning. The default value is 1000 instances.
  • MaxInstancebaseSize upper-bounds the size of the case base (training set); the default value is 5000 instances.
     



The following block shows how the help looks like:
 

-i InitialWidth (default: 1000)
Size of first Window for training learner.
-s PredictionStrategy (default: WeightedMode(Classification))
The way the target value is predicted.
    WModeClass ==> WeightedMode (Classification)
    WMedianOClass ==> WeightedMedian (OrdinalClassification)
    WMeanReg ==> WeightedMean (Regression)
    LocLinReg ==> Local Linear Regression
-a AdaptationStrategy (default: AdaptK)
The used adaptation Strategy.
    AdaptK ==> AdaptK
    AdaptSigma ==> AdaptSigma
    none ==> none
-w WeightingMethod (default: equal)
The used weighting method.
    equal ==> equal
    inverseDistance ==> inverseDistance
    linear ==> linear
    GaussianKernel ==> GaussianKernel
    exponentialKernel ==> exponentialKernel
-m MaxInstancebaseSize (default: 5000)
The maximum allowed size for the instancebase.
-j MinKValue (default: 4)
Minimum k value, recomended 2 times the number of attributes. This value is considered when the used adaptation strategy is AdaptK.
-k InitialKValue (default: 16)
Initial k value, recomended 4 times the number of attributes.
-l MaxKValue (default: 40)
Maximum k value, recomended 10 times the number of attributes. This value is considered when the used adaptation strategy is AdaptK.
-q InitialSigma (default : 0.5)
Initial sigma value. This value is considered when the used adaptation strategy is AdaptSigma.
-h UseDefaultSigma
In this case the default sigma is used, when a Gaussian or an exponential kernel is used. sig=sqrt(num. dimensions)/10.
-g UseDefaultK
In this case the default k. k=(num. dimensions)* 4.

 

Examples

  • Learning a binary classifier from the hyperplane generator, in 4 dimensional spaces.
        -The initial learning is done on a block on 1000 instances at the beginning of the whole process.
        -The adaptation strategy is to adapt k, while the prediction is done using the weighted mode. Equal weighting is used to weight instances.
        -The incremental learning and the evaluations is done on a streams of 120000 examples.
        -The stream is divided into blocks, where the classifier is trained incrementally on a block of 5000 instances and then evaluated (but no longer adapted) on the next 1000 instances, then again trained on the next 5000 and tested on the subsequent 1000 instances, and so forth.
        -The final results of every evaluation is logged into the file C:\HyperplaneBinary.csv

java -cp IBLStreams.jar;Jama-1.0.2.jar;xxl-core-2.0.beta.jar;moa.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -e BasicClassificationPerformanceEvaluator -l (LearnModel -l (moa.classifiers.IBLStreams -s WModeClass -a AdaptK -w equal -i 1000) -s (generators.HyperplaneGeneratorReg -m Binary -i 1 -a 4) -m 1000) -s (generators.HyperplaneGeneratorReg -m Binary -i 1 -a 4) -n 1000 -i 100000 -f 5000 -d C:\\HyperplaneBinary.csv"

    -In order to add a concept drift, we merge the data from two different hyperplane generators. The stream starts from one pure hyperplane, later on the concept drift starts after 50000 examples, this is when instances have a higher probability to be from the second hyperplane.

java -cp IBLStreams.jar;Jama-1.0.2.jar;xxl-core-2.0.beta.jar;moa.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -e BasicClassificationPerformanceEvaluator -l (LearnModel -l (moa.classifiers.IBLStreams -s WModeClass -a AdaptK -w equal -i 1000) -s (generators.HyperplaneGeneratorReg -m Binary -i 1 -a 4) -m 1000) -s (ConceptDriftStream -s (generators.HyperplaneGeneratorReg -m Binary -i 1 -a 4) -d (generators.HyperplaneGeneratorReg -m Binary -i 13 -a 4) -p 50000 -w 1000) -n 1000 -i 100000 -f 5000 -d C:\\HyperplaneBinary.csv"

  • Learning a regression model form a data generator which generates the distance from a point to a hyperplane, again the hyperplane generator is in 4 dimensional spaces. The stream starts from one hyperplane and ends with a different one.
        -The initial learning is done on a block on 1000 instances at the beginning of the whole process.
        -The adaptation strategy is to adapt k, while the prediction is done using the weighted mean. Linear kernel is used for the weighting.
        -The way the incremental learning is interleaved with the evaluation is similar to the previous case.
        -The final results of every evaluation is logged into the file C:\HyperplaneDistance.csv

java -cp IBLStreams.jar;Jama-1.0.2.jar;xxl-core-2.0.beta.jar;moa.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -e BasicRegressionPerformanceEvaluator -l (LearnModel -l (moa.classifiers.IBLStreams -s WMeanReg -a AdaptK -w linear -i 300) -s (generators.HyperplaneGeneratorReg -m Distance -i 1 -a 4) -m 1000) -s (ConceptDriftStream -s (generators.HyperplaneGeneratorReg -a 4 -m Distance) -d (generators.HyperplaneGeneratorReg -i 13 -a 4 -m Distance) -p 50000 -w 10000) -n 1000 -i 100000 -f 5000 -d C:\\HyperplaneDistance.csv"

  • This case is similar to the previous case except that the data is generated as the square distance from a point to a hyperplane
        -The adaptation strategy is to adapt the width of the Gaussian kernel and the prediction is done by fitting a local linear regression in the neighborhood instances.
        -The final results of every evaluation is logged into the file C:\HyperplaneSquareDistance.csv

java -cp IBLStreams.jar;Jama-1.0.2.jar;xxl-core-2.0.beta.jar;moa.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -e BasicRegressionPerformanceEvaluator -l (LearnModel -l (moa.classifiers.IBLStreams -s LocLinReg -a AdaptSigma -w GaussianKernel -i 300) -s (generators.HyperplaneGeneratorReg -m SquareDistance -i 1 -a 4) -m 1000) -s (ConceptDriftStream -s (generators.HyperplaneGeneratorReg -a 4 -m SquareDistance) -d (generators.HyperplaneGeneratorReg -i 13 -a 4 -m SquareDistance) -p 50000 -w 10000) -n 1000 -i 100000 -f 5000 -d C:\\HyperplaneSquareDistance.csv"

Publications

J. Beringer and E. Hüllermeier.
Efficient Instance-Based Learning on Data Streams.
Intelligent Data Analysis 11(6):627-650, 2007.
[Draft-PDF ]

A. Shaker and E. Hüllermeier.
Instance-Based Classification and Regression on Data Streams
In Learning in Non-Stationary Environmentsvolving Systems, Springer New York, pp 185-201, 2012
[ Publisher ]

A. Shaker and E. Hüllermeier.
IBLStreams: A System for Classification and Regression on Data Streams.
Evolving Systems, (2012).
[ Draft-PDF ] [ Publisher ]
  

Download

Extension (jar file) (updated 3.6.2013)
Extension with all required libraries (zip file) (updated 3.6.2013)

Further information: