PG: On-The-Fly Machine Learning

Automated Machine Learning (AutoML) aims at creating out-of-the-box software tools enabling non-expert users to create customized machine learning applications tailored to their problem at hand. Current state-of-the-art tools such as Auto-WEKA and auto-sklearn already optimize for a feature preprocessor and a classifier, including the respective hyperparameters. Unfortunately, these approaches expect the problem to be in feature representation, which cannot always be provided by the user. In particular, if inputs are (structured) objects such as images, audio, texts, etc., the right preprocessing of the data itself and the feature extraction are crucial for the performance of the entire so-called machine learning pipeline (ML pipeline).

Making algorithms from different libraries in various programming languages available as REST services which are then provided in a distributed On-The-Fly Market, we embed the AutoML problem into the domain of On-The-Fly Computing. In this scenario which we refer to as On-The-Fly Machine Learning (OTF-ML), we want to enable users to request ML pipelines on-the-fly, i.e. on demand and tailored to their problem at hand. There are three different possibilities concerning the provision of the ML pipeline:

  1. Prediction: The user only obtains predictions for some attached data.
  2. Predictor: The user obtains a fully configured, trained and ready-for-use ML pipeline provided as a REST service again.
  3. Learner: The user is provided access to a REST service which can be (re-)trained and used to make predictions.

In OTF-ML, various techniques and research areas are combined with each other, e.g.:

  • machine learning
  • heuristic search and planning
  • REST services, microservices
  • image processing


Desired Prerequisites:

  • Good programming skills in Java and/or Python
  • Prior knowledge in machine learning such as attendance of data science/machine learning courses (Data Mining, Machine Learning I, Machine Learning II) or Bachelor Thesis in the area of machine learning