Achtung:

Sie haben Javascript deaktiviert!
Sie haben versucht eine Funktion zu nutzen, die nur mit Javascript möglich ist. Um sämtliche Funktionalitäten unserer Internetseite zu nutzen, aktivieren Sie bitte Javascript in Ihrem Browser.

Foto: Judith Kraft Show image information

Foto: Judith Kraft

SFB 901 "On-The-Fly" Computing

This is a research project funded by the DFG (Deutsche Forschungsgemeinschaft) collaborative research center SFB 901 "On-The-Fly Computing".

The research center investigates individually and automatically configured IT services which are composed from and executed on a marked of world wide traded combinable services and execution platforms. Within this research center our research group investigates the the communication network as one important ingredient for such market platform. Our research covers the realization and optimization of overlays over real networks.

The long term goals of the research project are:

  • The creation of overlays with specific topological properties - for instance the overlays which are topological spanners of the underlying network.
  • The adaptation of overlays to assure topological properties over the dynamic underlying network.
  • The provision of mechanisms to estimate the effect of introduced or removed overlay edges on the current overlay.
  • The orchestration of co-existing or competing overlay structures with fair resource scheduling.
  • The adaptation of the underlying network to the demands of the overlay.
  • Solving pragmatic challenges to realize the required architectures and interfaces.

In the current project status we are working on the following challenges:

  • Construction of overlays over underlaying network graphs
    • Realization of Internet coordinate systems for more than just latency cost functions.
    • Constructing network spanners over node embeddings resulting from Internet coordinate systems.
    • Data structures which to efficiently encode a neighbor relation in Internet coordinate systems
  • Overlay routing in Internet coordinate systems based on the geographic routing paradigm
  • Realization of models and interfaces for implementing and evaluating topology control and routing over Internet coordinate systems
  • Empirical evaluation of the developed Internet coordinate systems, network spanners and routing mechanisms

We currently seek for two SHKs for setting up a real world testbed, implementing the developed coordinate systems, spanner constructions and routing mechanisms, and conducting empirical studies.

Refereed Publications


Open list in Research Information System

On Greedy Routing in Degree-bounded Graphs over d-Dimensional Internet Coordinate Embeddings

M. Autenrieth, H. Frey, in: Proceedings of the Conference on Networked Systems (NetSys), 2013, pp. 126-131

In this paper we will introduce a new d-dimensional graph for constructing geometric application layer overlay net-works. Our approach will use internet coordinates, embedded using the L∞ -metric. After describing the graph structure, we will show how it limits maintenance overhead by bounding each node’s out-degree and how it supports greedy routing using one-hop neighbourhood information in each routing step. We will further show that greedy routing can always compute a path in our graph and we will also prove that in each forwarding step the next hop is closer to the destination than the current node.


    Reactive Planar Spanner Construction in Wireless Ad Hoc and Sensor Networks

    M. Benter, F. Neumann, H. Frey, in: Proceedings of the 32nd IEEE International Conference on Computer Communications (INFOCOM), 2013, pp. 2193-2201

    Within reactive topology control, a node determines its adjacent edges of a network subgraph without prior knowledge of its neighborhood. The goal is to construct a local view on a topology which provides certain desired properties such as planarity. During algorithm execution, a node, in general, is not allowed to determine all its neighbors of the network graph. There are well-known reactive algorithms for computing planar subgraphs. However, the subgraphs obtained do not have constant Euclidean spanning ratio. This means that routing along these subgraphs may result in potentially long detours. So far, it has been unknown if planar spanners can be constructed reactively. In this work, we show that at least under the unit disk network model, this is indeed possible, by proposing an algorithm for reactive construction of the partial Delaunay triangulation, which recently turned out to be a spanner. Furthermore, we show that our algorithm is message-optimal as a node will only exchange messages with nodes that are also neighbors in the spanner. The algorithm’s presentation is complemented by a rigorous proof of correctness.


      A Local Heuristic for Latency-Optimized Distributed Cloud Deployment

      M. Keller, S. Pawlik, P. Pietrzyk, H. Karl, in: Proceedings of the 6th International Conference on Utility and Cloud Computing (UCC) workshop on Distributed cloud computing, 2013, pp. 429-434

      In Distributed Cloud Computing, applications are deployed across many data centres at topologically diverse locations to improved network-related quality of service (QoS). As we focus on interactive applications, we minimize the latency between users and an application by allocating Cloud resources nearby the customers. Allocating resources at all locations will result in the best latency but also in the highest expenses. So we need to find an optimal subset of locations which reduces the latency but also the expenses – the facility location problem (FLP). In addition, we consider resource capacity restrictions, as a resource can only serve a limited amount of users. An FLP can be globally solved. Additionally, we propose a local, distributed heuristic. This heuristic is running within the network and does not depend on a global component. No distributed, local approximations for the capacitated FLP have been proposed so far due to the complexity of the problem. We compared the heuristic with an optimal solution obtained from a mixed integer program for different network topologies. We investigated the influence of different parameters like overall resource utilization or different latency weights.


        Incorporating feedback from application layer into routing and wavelength assignment algorithms

        P. Wette, H. Karl, in: Proceedings of the 32nd IEEE International Conference on Computer Communications (INFOCOM), 2013, pp. 51-52

        Preemptive Routing and Wavelength Assignment (RWA) algorithms preempt established lightpaths in case not enough resources are available to set up a new lightpath in a Wavelength Division Multiplexing (WDM) network. The selection of lightpaths to be preempted relies on internal decisions of the RWA algorithm. Thus, if dedicated properties of the network topology are required by the applications running on the network, these requirements have to be known to the RWA algorithm.Otherwise it might happen that by preempting a particular lightpath these requirements are violated. If, however, these requirements include parametersknown only at the nodes running the application, the RWA algorithm cannot evaluate the requirements. For this reason an RWA algorithm is needed which incorporates feedback from the application layer in the preemption decisions.This work proposes a simple interface along with an algorithm for computing and selecting preemption candidates in case a lightpath cannot be established. We reason about the necessity of using information from the application layer in the RWA and present two example applications which benefit from this idea.


          Which Flows Are Hiding Behind My Wildcard Rule? Adding Packet Sampling to OpenFlow

          P. Wette, H. Karl, in: Proceedings of the ACM SIGCOMM '13, 2013, pp. 541-542

          In OpenFlow [1], multiple switches share the same control plane which is centralized atwhat is called the OpenFlow controller. A switch only consists of a forwarding plane. Rules for forwarding individual packets (called ow entries in OpenFlow) are pushed from the controller to the switches. In a network with a high arrival rate of new ows, such as in a data center, the control trac between the switch and controller can become very high. As a consequence, routing of new ows will be slow. One way to reduce control trac is to use wildcarded ow entries. Wildcard ow entries can be used to create default routes in the network. However, since switches do not keep track of ows covered by a wildcard ow entry, the controller no longer has knowledge about individual ows. To nd out about these individual ows we propose an extension to the current OpenFlow standard to enable packet sampling of wildcard ow entries.


            On the Quality of Selfish Virtual Topology Reconfiguration in IP-over-WDM Networks

            P. Wette, H. Karl, in: Proceedings of the 19th IEEE International Workshop on Local and Metropolitan Area Networks (IEEE LANMAN), 2013, pp. 1 - 6

            The process of planning a virtual topology for a Wavelength Devision Multiplexing (WDM) network is called Virtual Topology Design (VTD). The goal of VTD is to find a virtual topology that supports forwarding the expected traffic without congestion. In networks with fluctuating, high traffic demands, it can happen that no single topology fits all changing traffic demands occurring over a longer time. Thus, during operation, the virtual topology has to be reconfigured. Since modern networks tend to be large, VTD algorithms have to scale well with increasing network size, requiring distributed algorithms. Existing distributed VTD algorithms, however, react too slowly on congestion for the real-time reconfiguration of large networks. We propose Selfish Virtual Topology Reconfiguration (SVTR) as a new algorithm for distributed VTD. It combines reconfiguring the virtual topology and routing through a Software Defined Network (SDN). SVTR is used for online, on-the-fly network reconfiguration. Its integrated routing and WDM reconfiguration keeps connection disruption due to network reconfiguration to a minimum and is able to react very quickly to traffic pattern changes. SVTR works by iteratively adapting the virtual topology to the observed traffic patterns without global traffic information and without future traffic estimations. We evaluated SVTR by simulation and found that it significantly lowers congestion in realistic networks and high load scenarios.


              Specifying and Placing Chains of Virtual Network Functions

              S. Dräxler, M. Keller, H. Karl, in: Proceedings of the 3rd International Conference on Cloud Networking (CloudNet), 2014, pp. 7-13

              Network appliances perform different functions on network flows and constitute an important part of an operator’s network. Normally, a set of chained network functions process network flows. Following the trend of virtualization of networks, virtualization of the network functions has also become a topic of interest. We define a model for formalizing the chaining of network functions using a context-free language. We process deployment requests and construct virtual network function graphs that can be mapped to the network. We describe the mapping as a Mixed Integer Quadratically Constrained Program (MIQCP) for finding the placement of the network functions and chaining them together considering the limited network resources and requirements of the functions. We have performed a Pareto set analysis to investigate the possible trade-offs between different optimization objectives.


              Using Application Layer Knowledge in Routing and Wavelength Assignment Algorithms

              P. Wette, H. Karl, in: Proceedings of the IEEE International Conference on Communications 2014, 2014, pp. 3270-3276

              Preemptive Routing and Wavelength Assignment (RWA) algorithms preempt established lightpaths in case notenough resources are available to set up a new lightpath in aWavelength Division Multiplexing (WDM) network. The selectionof lightpaths to be preempted relies on internal decisions of theRWA algorithm. Thus, if dedicated properties of the networktopology are required by the applications running on the network,these requirements have to be known to the RWA algorithm.We present a family of preemptive RWA algorithms for WDMnetworks. These algorithms have two distinguishing features: a)they can handle dynamic traffic by on-the-fly reconfiguration,and b) users can give feedback for reconfiguration decisions andthus influence the preemption decision of the RWA algorithm,leading to networks which adapt directly to application needs.This is different from traffic engineering where the network is(slowly) adapted to observed traffic patterns.Our algorithms handle various WDM network configurationsincluding networks consisting of heterogeneous WDM hardware.To this end, we are using the layered graph approach togetherwith a newly developed graph model that is used to determineconflicting lightpaths.


                Response Time-Optimized Distributed Cloud Resource Allocation

                M. Keller, H. Karl, in: Proceedings of the SIGCOMM workshop on Distributed cloud computing, 2014, pp. 47--52

                In the near future many more compute resources will be available at different geographical locations. To minimize the response time of requests, application servers closer to the user can hence be used to shorten network round trip times. However, this advantage is neutralized if the used data centre is highly loaded as the processing time of re- quests is important as well. We model the request response time as the network round trip time plus the processing time at a data centre.We present a capacitated facility location problem formal- ization where the processing time is modelled as the sojourn time of a queueing model. We discuss the Pareto trade-off between the number of used data centres and the resulting response time. For example, using fewer data centres could cut expenses but results in high utilization, high response time, and smaller revenues.Previous work presented a non-linear cost function. We prove its convexity and exploit this property in two ways: First, we transform the convex model into a linear model while controlling the maximum approximation error. Sec- ond, we used a convex solver instead of a slower non-linear solver. Numerical results on network topologies exemplify our work.


                  Template Embedding: Using Application Architecture to Allocate Resources in Distributed Clouds

                  M. Keller, C. Robbert, H. Karl, in: Proceedings of 7th International Conference on Utility and Cloud Computing (UCC), 2014, pp. 387--395

                  In distributed cloud computing, application deployment across multiple sites can improve quality of service. Recent research developed algorithms to find optimal locations for virtual machines. However, those algorithms assume to have either single-tier applications or a fixed number of virtual machines – a strong simplification of reality. This paper investigates the placement and scaling of complex application architectures. An application is dynamically scaled to fit both the current demand situation and the currently available infrastructure resources. We compare two approaches: The first one is based on virtual network embedding. The second approach is a novel method called Template Embedding. It is based on a hierarchical 1-allocation hub flow problem and combines applica- tion scaling and embedding in one step. Extensive experiments on 43200 network configurations showed that Template Embedding outperforms virtual network embedding in all cases in three metrics: success rate, solution quality, and runtime. This positive result shows that template embedding is a promising approach for distributed cloud resource allocation.


                    A Game-Theoretic Approach to the Financial Benefits of Infrastructure-as-a-Service

                    J. Künsemöller, H. Karl, Future Generation Computer Systems (2014), pp. 44--52

                    Financial benefits are an important factor when cloud infrastructure is considered to meet processing demand. The dynamics of on-demand pricing and service usage are investigated in a two-stage game model for a monopoly Infrastructure-as-a-Service (IaaS) market. The possibility of hybrid clouds (public clouds plus own infrastructure) turns out to be essential in order that not only the provider but also the clients have significant benefits from on-demand services. Even if the client meets all demand in the public cloud, the threat of building a hybrid cloud keeps the instance price low. This is not the case when reserved instances are offered as well. Parameters like load profiles and economies of scale have a huge effect on likely future pricing and on a cost-optimal split-up of client demand between either a client’s own data center and a public cloud service or between reserved and on-demand cloud instances.


                      MaxiNet: Distributed Emulation of Software-Defined Networks

                      P. Wette, M. Dräxler, A. Schwabe, F. Wallaschek, M.H. Zahraee, H. Karl, in: Proceedings of the 2014 IFIP Networking Conference (Networking 2014), 2014, pp. 1-9

                      Network emulations are widely used for testing novel network protocols and routing algorithms in realistic scenarios. Up to now, there is no emulation tool that is able to emulate large software-defined data center networks that consist of several thousand nodes. Mininet is the most common tool to emulate Software-Defined Networks of several hundred nodes. We extend Mininet to span an emulated network over several physical machines, making it possible to emulate networks of several thousand nodes on just a handful of physical machines. This enables us to emulate, e.g., large data center networks. To test this approach, we additionally introduce a traffic generator for data center traffic. Since there are no data center traffic traces publicly available we use the results of two recent traffic studies to create synthetic traffic. We show the design and discuss some challenges we had in building our traffic generator. As a showcase for our work we emulated a data center consisting of 3200 hosts on a cluster of only 12 physical machines. We show the resulting workloads and the trade-offs involved.


                        Provider Competition in Infrastructure-as-a-Service

                        J. Künsemöller, S. Brangewitz, H. Karl, C. Haake, in: Proceedings of the 2014 IEEE International Conference on Services Computing (SCC), 2014, pp. 203-210

                        This paper explores how cloud provider competition influences instance pricing in an IaaS (Infrastructure-as-a-Service) market. When reserved instance pricing includes an on-demand price component in addition to a reservation fee (two-part tariffs), different providers might offer different price combinations, where the client’s choice depends on its load profile. We investigate a duopoly of providers and analyze stable market prices in two-part tariffs. Further, we study offers that allow a specified amount of included usage (three-part tariffs). Neither two-part nor three-part tariffs produce an equilibrium market outcome other than a service pricing that equals production cost, i.e., complex price structures do not significantly affect the results from ordinary Bertrand competition.


                          Using MAC addresses as efficient routing labels in data centers

                          A. Schwabe, H. Karl, in: Proceedings of the third workshop on Hot topics in software defined networking, HotSDN '14, Chicago, Illinois, USA, August 22, 2014, 2014, pp. 115--120


                          HybridTE: Traffic Engineering for Very Low-Cost Software-Defined Data-Center Networks

                          P. Wette, H. Karl, in: Proceedings of the 4th European Workshop on Software Defined Networks (EWSDN 2015), 2015, pp. 1--7

                          The size of modern data centers is constantly increasing. As it is not economic to interconnect all machines in the data center using a full-bisection-bandwidth network, techniques have to be developed to increase the efficiency of data-center networks. The Software-Defined Network paradigm opened the door for centralized traffic engineering (TE) in such environments. Up to now, there were already a number of TE proposals for SDN-controlled data centers that all work very well. However, these techniques either use a high amount of flow table entries or a high flow installation rate that overwhelms available switching hardware, or they require custom or very expensive end-of-line equipment to be usable in practice. We present HybridTE, a TE technique that uses (uncertain) information about large flows. Using this extra information, our technique has very low hardware requirements while maintaining better performance than existing TE techniques. This enables us to build very low-cost, high performance data-center networks.


                            An Architecture for Energy-aware On-demand Mobile Network Management

                            M. Peuster, H. Karl, in: Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges, 2015

                            The increasing amount of mobile traffic leads to a significantly higher energy consumption of mobile networks that is mainly caused by the high number of required base stations. One recent solution for this is based on a two-layered network that uses long-range macro cells to provide a full coverage signaling overlay and short-range small cells for fast data transmissions. These small cells can be switched off when they are not needed and allow network-wide energy optimizations. This paper presents an architecture that extends existing mobile networks to integrate a small cell layer that supports on-demand cell activation. We discuss how additional small cells can be interconnected with existing core components and how they can be controlled by a resource management component. Finally, a Wi-Fi based proof of concept testbed implementation is presented that demonstrates the feasibility of the approach.


                              SynRace: Decentralized Load-Adaptive Multi-path Routing without Collecting Statistics

                              A. Schwabe, H. Karl, in: Proceedings of the 4th European Workshop on Software Defined Networks (EWSDN 2015), 2015, pp. 37-42

                              Multi-rooted trees are becoming the norm for modern data-center networks. In these networks, scalable flow routing is challenging owing to vast number of flows. Current approaches either employ a central controller that can have scalability issues or a scalable decentralized algorithm only considering local information. In this paper we present a new decentralized approach to least-congested path routing in software-defined data center networks that has neither of these issues: By duplicating the initial (or SYN) packet of a flow and estimating the data rate of multiple flows in parallel, we exploit TCP’s habit to fill buffers to find the least congested path. We show that our algorithm significantly improves flow completion time without the need for a central controller or specialized hardware.


                                Topology model to generate realistic latency for simulations

                                A. Schwabe, H. Karl, in: 2015 IEEE International Conference on Communications, ICC 2015, London, United Kingdom, June 8-12, 2015, 2015, pp. 6122--6127


                                DCT²Gen: A traffic generator for data centers

                                P. Wette, H. Karl, Computer Communications (2016), pp. 45--58


                                MeDICINE: Rapid Prototyping of Production-Ready Network Services in Multi-PoP Environments

                                M. Peuster, H. Karl, S. van Rossem, in: IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), 2016

                                Virtualized network services consisting of multiple individual network functions are already today deployed across multiple sites, so called multi-PoP (points of presence) environments. This allows to improve service performance by optimizing its placement in the network. But prototyping and testing of these complex distributed software systems becomes extremely challenging. The reason is that not only the network service as such has to be tested but also its integration with management and orchestration systems. Existing solutions, like simulators, basic network emulators, or local cloud testbeds, do not support all aspects of these tasks. To this end, we introduce MeDICINE, a novel NFV prototyping platform that is able to execute production-ready network functions, provided as software containers, in an emulated multi-PoP environment. These network functions can be controlled by any third-party management and orchestration system that connects to our platform through standard interfaces. Based on this, a developer can use our platform to prototype and test complex network services in a realistic environment running on his laptop.


                                Understand Your Chains: Towards Performance Profile-Based Network Service Management

                                M. Peuster, H. Karl, in: Fifth European Workshop on Software-Defined Networks, EWSDN 2016, Den Haag, The Netherlands, October 10-11, 2016, 2016, pp. 7--12

                                Allocating resources to virtualized network functions and services to meet service level agreements is a challenging task for NFV management and orchestration systems. This becomes even more challenging when agile development methodologies, like DevOps, are applied. In such scenarios, management and orchestration systems are continuously facing new versions of functions and services which makes it hard to decide how much resources have to be allocated to them to provide the expected service performance. One solution for this problem is to support resource allocation decisions with performance behavior information obtained by profiling techniques applied to such network functions and services. In this position paper, we analyze and discuss the components needed to generate such performance behavior information within the NFV DevOps workflow. We also outline research questions that identify open issues and missing pieces for a fully integrated NFV profiling solution. Further, we introduce a novel profiling mechanism that is able to profile virtualized network functions and entire network service chains under different resource constraints before they are deployed on production infrastructure.


                                Placement of Services with Flexible Structures Specified by a YANG Data Model

                                S. Dräxler, H. Karl, in: Proceedings of the 2nd International IEEE Conference on Network Softwarization (NetSoft), 2016, pp. 184--192

                                Network function virtualization and software-defined networking allow services consisting of virtual network functions to be designed and implemented with great flexibility by facilitating automatic deployments, migrations, and reconfigurations for services and their components. For extended flexibility, we go beyond seeing services as a fixed chain of functions. We present a YANG model for describing the service structure in deployment requests in a flexible way that enables changing the order of functions in case the order of traversing them does not affect the functionality of the service. Upon receiving such requests, the network orchestration system can choose the optimal composition of service components that gives the best results for placement of services in the network. This introduces new complexities to the placement problem by greatly increasing the number of possible ways a service can be composed. In this paper, we describe a heuristic solution that selects a Pareto set of the possible compositions of a service as well as possible combinations of different services, with respect to different resource requirements of the services. Our evaluations show that the selected combinations consist of representative samples of possible structures and requirements and therefore, can result in optimal or close-to-optimal placement results.


                                Demonstrating on-demand cell switching with a two-layer mobile network testbed

                                M. Peuster, H. Karl, A. Enrico Redondi, A. Capone, in: IEEE Conference on Computer Communications Workshops, INFOCOM Workshops 2016, San Francisco, CA, USA, April 10-14, 2016, 2016, pp. 1015--1016

                                Traditional cellular networks are forced to remain active regardless of the actual amount of traffic that is currently produced/requested, with a clear waste of energy. Two-layer mobile networks with separated signalling and data layers have been recently proposed for energy savings in future implementations. These networks are able to switch off unneeded data cells completely while maintaining full coverage with their signalling cells, thus saving energy. In this demonstration, we showcase a testbed that uses Wi-Fi access points to emulate small cells of the data layer and a publicly available cellular connection as the signalling layer. We use off-the-shelf Android smartphones with an ad-hoc networking management module and a MultiPath TCP-enabled kernel to manage the Wi-Fi and cellular interfaces simultaneously. The testbed is used to demonstrate the general feasibility of this layered architecture and to facilitate experiments with network-wide resource optimization.


                                  Joint Optimization of Scaling and Placement of Virtual Network Services

                                  S. Dräxler, H. Karl, Z.A. Mann, in: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017), 2017

                                  Management of complex network services requires flexible and efficient service provisioning as well as optimized handling of continuous changes in the workload of the service.To adapt to changes in the demand, service components need to be replicated (scaling) and allocated to physical resources (placement) dynamically. In this paper, we propose a fullyautomated approach to the joint optimization problem of scaling and placement, enabling quick reaction to changes. We formalize the problem, analyze its complexity, and develop two algorithms to solve it. Extensive empirical results show the applicability andeffectiveness of the proposed approach.


                                  Profile Your Chains, Not Functions. Automated Network Service Profiling in DevOps Environments

                                  M. Peuster, H. Karl, in: IEEE Conference on Network Function Virtualisation and Software Defined Networks (NFV-SDN), 2017

                                  Benchmarking and profiling virtual network functions (VNFs) generates input knowledge for resource management decisions taken by management and orchestration systems. Such VNFs are usually not executed in isolation but are often deployed as part of a service function chain (SFC) that connects single functions into complex structures. To manage such chains, isolated performance profiles of single functions have to be combined to get insights into the overall behavior of an SFC. This becomes particularly challenging in highly agile DevOps environments in which profiling processes need to be fully automated and detailed insights about a chain's internal structures are not always available. In this paper, we introduce a fully automatable, flexible, and platform-agnostic profiling system that allows to profile entire SFCs at once. This obviates manual modeling procedures to combine profiling results from single VNFs to reflect SFC performance. We use a case study with different SFC configurations to show that it is hard to model the resulting SFC performance based on single-VNF measurements and that performance interactions between real, non-trivial functions that are deployed in a chain exist.


                                  Response-Time-Optimised Service Deployment: MILP Formulations of Piece-wise Linear Functions Approximating Non-linear Bivariate Mixed-integer Functions

                                  M. Keller, H. Karl, IEEE Transactions on Network and Service Management (2017)(1), pp. 121--135

                                  A current trend in networking and cloud computing is to provide compute resources at widely distributed sites; this is exemplified by developments such as Network Function Virtualisation. This paves the way for wide-area service deployments with improved service quality: e.g. user-perceived response times can be reduced by offering services at nearby sites. But always assigning users to the nearest site can be a bad decision if this site is already highly utilised. This paper formalises two related decisions of allocating compute resources at different sites and assigning users to them with the goal of minimising the response times while the total number of resources to be allocated is limited – a non-linear capacitated Facility Location Problem with integrated queuing systems. To efficiently handle its non-linearity, we introduce five linear problem linearisations and adapt the currently best heuristic for a similar scenario to our scenario. All six approaches are compared in experiments for solution quality and solving time. Surprisingly, our best optimisation formulation outperforms the heuristic in both time and quality. Additionally, we evaluate the influence of distributions of available compute resources in the network on the response time: The time was halved for some configurations. The presented formulation techniques for our problem linearisations are applicable to a broader optimisation domain.


                                    Specification, Composition, and Placement of Network Services with Flexible Structures

                                    S. Dräxler, H. Karl, International Journal of Network Management (2017)(2), pp. 1--16

                                    Network function virtualization and software-defined networking allow services consisting of virtual network functions to be designed and implemented with great flexibility by facilitating automatic deployments, migrations, and reconfigurations for services and their components. For extended flexibility, we go beyond seeing services as a fixed chain of functions. We define the service structure in a flexible way that enables changing the order of functions in case the functionality of the service is not influenced by this, and propose a YANG data model for expressing this flexibility. Flexible structures allow the network orchestration system to choose the optimal composition of service components that for example gives the best results for placement of services in the network. When number of flexible services and number of components in each service increase, combinatorial explosion limits the practical use of this flexibility. In this paper, we describe a selection heuristic that gives a Pareto set of the possible compositions of a service as well as possible combinations of different services, with respect to different optimization objectives. Moreover, we present a heuristic algorithm for placement of a combination of services, which aims at placing service components along shortest paths that have enough capacity for accommodating the services. By applying these solutions, we show that allowing flexibility in the service structure is feasible.


                                    SONATA: Service programming and orchestration for virtualized software networks

                                    S. Dräxler, H. Karl, M. Peuster, H. Razzaghi Kouchaksaraei, M. Bredel, J. Lessmann, T. Soenen, W. Tavernier, S. Mendel-Brin, G. Xilouris, in: 2017 IEEE International Conference on Communications Workshops (ICC Workshops), IEEE, 2017

                                    In conventional large-scale networks, creation and management of network services are costly and complex tasks that often consume a lot of resources, including time and manpower. Network softwarization and network function virtualization have been introduced to tackle these problems, aiming at decreasing costs and complexity of implementing new services, maintaining the implemented services, and managing available resources in service provisioning platforms and underlying infrastructures. To experience the full potential of these approaches, innovative development support tools and service provisioning environments are needed. To answer these needs, we introduce the architecture of the open-source SONATA system, a service programming, orchestration, and management framework. We present a development toolchain for virtualized network services, fully integrated with a service platform and orchestration system. We introduce the modular and flexible architecture of our system and discuss its main components and features, such as function- and service-specific managers that allow fine-grained service management, slicing support to facilitate multi-tenancy, recursiveness for improved scalability, and full-featured DevOps support.


                                    Scaling and Placing Bidirectional Services with Stateful Virtual and Physical Network Functions

                                    S. Dräxler, S.B. Schneider, H. Karl, in: 4th IEEE International Conference on Network Softwarization (NetSoft 2018), IEEE, 2018, pp. 123--131

                                    Network function virtualization requires scaling and placement, deciding the number and the location of function instances. Current approaches are limited in flexibility and practical applicability. Specifically, we study dynamic, single-step, joint scaling and placement of network services with bidirectional flows traversing Physical or Virtual Network Functions (VNFs) and returning to their sources. We develop models to support stateful components and legacy network functions with fixed locations in these network services as well as the possibility of reusing VNFs across network services. We formalize the problem of jointly scaling and placing such network services as a mixed- integer linear program (MILP). We show that this problem is NP-complete and also present a heuristic algorithm to find good solutions in short time. In an extensive evaluation with realistic scenarios, we investigate the capabilities of the two approaches.


                                    Let the state follow its flows: An SDN-based flow handover protocol to support state migration

                                    M. Peuster, H. Küttner, H. Karl, in: 4th IEEE International Conference on Network Softwarization (NetSoft 2018), 2018

                                    Dynamically steering flows through virtualized net- work function instances is a key enabler for elastic, on-demand deployments of virtualized network functions. This becomes par- ticular challenging when stateful functions are involved, necessi- tating state management. The problem with existing solutions is that they typically embrace state migration and flow rerouting jointly, imposing a huge set of requirements on the on-boarded VNFs, e.g., solution-specific state management interfaces. In this paper, we introduce the seamless handover proto- col (SHarP). It provides an easy-to-use, loss-less, and order- preserving flow rerouting mechanism that is not fixed to a single state management approach. This allows VNF vendors to implement or use the state management solution of their choice. SHarP supports these solutions with additional information when flows are migrated. Further, we show how SHarP significantly reduces the buffer usage at a central (SDN) controller, which is a typical bottleneck in existing solutions. Our experiments show that SHarP uses a constant amount of controller buffer, irrespective of the time taken to migrate the VNF state.


                                    A Prototyping Platform to Validate and Verify Network Service Header-based Service Chains

                                    M. Peuster, S.B. Schneider, F. Christ, H. Karl, in: IEEE Conference on Network Function Virtualisation and Software Defined Networks (NFV-SDN) 5GNetApp, IEEE, 2018


                                    Containernet 2.0: A Rapid Prototyping Platform for Hybrid Service Function Chains

                                    M. Peuster, J.. Kampmeyer, H. Karl, in: 4th IEEE International Conference on Network Softwarization (NetSoft 2018), 2018

                                    Developing a virtualized network service does not only involve the implementation and configuration of the network functions it is composed of but also its integration and test with management solutions that will control the service in its production environment. These integration tasks require testbeds that offer the needed network function virtualization infrastructure~(NFVI), like OpenStack, introducing a lot of management and maintenance overheads. Such testbed setups become even more complicated when the multi point-of-presence~(PoP) case, with multiple infrastructure installations, is considered. In this demo, we showcase an emulation platform that executes containerized network services in user-defined multi-PoP topologies. The platform does not only allow network service developers to locally test their services but also to connect real-world management and orchestration solutions to the emulated PoPs. During our interactive demonstration we focus on the integration between the emulated infrastructure and state-of-the-art orchestration solutions like SONATA or OSM.


                                    A Generic Emulation Framework for Reusing and Evaluating VNF Placement Algorithms

                                    S.B. Schneider, M. Peuster, H. Karl, in: IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN 2018), IEEE, 2018

                                    In recent years, a variety of different approaches have been proposed to tackle the problem of scaling and placing network services, consisting of interconnected virtual network functions (VNFs). This paper presents a placement abstraction layer (PAL) that provides a clear and simple northbound interface for using such algorithms while hiding their internal functionality and implementation. Through its southbound interface, PAL can connect to different back ends that evaluate the calculated placements, e.g., using simulations, emulations, or testbed approaches. As an example for such evaluation back ends, we introduce a novel placement emulation framework (PEF) that allows executing calculated placements using real, containerbased VNFs on real-world network topologies. In a case study, we show how PAL and PEF facilitate reusing and evaluating placement algorithms as well as validating their underlying models and performance claims.


                                    JASPER: Joint Optimization of Scaling, Placement, and Routing of Virtual Network Services

                                    S. Dräxler, H. Karl, Z.A. Mann, IEEE Transactions on Network and Service Management (2018)

                                    To adapt to continuously changing workloads in networks, components of the running network services may need to be replicated (scaling the network service) and allocated to physical resources (placement) dynamically, also necessitating dynamic re-routing of flows between service components. In this paper, we propose JASPER, a fully automated approach to jointly optimizing scaling, placement, and routing for complex network services, consisting of multiple (virtualized) components. JASPER handles multiple network services that share the same substrate network; services can be dynamically added or removed and dynamic workload changes are handled. Our approach lets service designers specify their services on a high level of abstraction using service templates. JASPER automatically makes scaling, placement and routing decisions, enabling quick reaction to changes. We formalize the problem, analyze its complexity, and develop two algorithms to solve it. Extensive empirical results show the applicability and effectiveness of the proposed approach.


                                    Emulation-based Smoke Testing of NFV Orchestrators in Large Multi-PoP Environments

                                    M. Peuster, M. Marchetti, G. Garcia de Blas, H. Karl, in: European Conference on Networks and Communications (EuCNC), 2018

                                    Management and orchestration~(MANO) systems are the key components of future large-scale NFV environments. They will manage resources of hundreds or even thousands of NFV infrastructure installations, so called points of presence~(PoP). Such scenarios need to be automatically tested during the development phase of a MANO system. This task becomes very challenging because large-scale NFV testbeds are hard to maintain, too expensive, or simply not available. In this paper, we present a multi-PoP NFV infrastructure emulation platform that enables automated, large-scale testing of MANO stacks. We show that our platform can easily emulate hundreds of PoPs on a single physical machine and reduces the setup time of a test PoP by a factor of 232x compared to a DevStack-based test PoP installation. Further, we present a case study in which we test ETSI's Open Source MANO~(OSM) against our proposed system to gain insights about OSM's behaviour in large-scale NFV deployments.


                                    Distributed Placement of Virtualized Control Applications in Mobile Backhaul Networks

                                    S. Auroux, H. Karl, Proc. of IEEE Wireless Communications and Networking Conference (WCNC), 2018


                                    A Fully Integrated Multi-Platform NFV SDK

                                    S.B. Schneider, M. Peuster, W. Tavernier, H. Karl, in: IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN 2018), IEEE, 2018

                                    A key challenge of network function virtualization (NFV) is the complexity of developing and deploying new network services. Currently, development requires many manual steps that are time-consuming and error-prone (e.g., for creating service descriptors). Furthermore, existing management and orchestration (MANO) platforms only offer limited support of standardized descriptor models or package formats, limiting the re-usability of network services. To this end, we introduce a fully integrated, open-source NFV service development kit (SDK) with multi-MANO platform support. Our SDK simplifies many NFV service development steps by offering initial generation of descriptors, advanced project management, as well as fully automated packaging and submission for on-boarding. To achieve multi-platform support, we present a package format that extends ETSI’s VNF package format. In this demonstration, we present the end-to-end workflow to develop an NFV service that is then packaged for multiple platforms, i.e., 5GTANGO and OSM.


                                    Generating Resource and Performance Models for Service Function Chains: The Video Streaming Case

                                    S. Dräxler, M. Peuster, M. Illian, H. Karl, in: 4th IEEE International Conference on Network Softwarization (NetSoft 2018), IEEE, 2018, pp. 318--322

                                    Understanding the behavior of the components of service function chains (SFCs) in different load situations is important for efficient and automatic management and orches- tration of services. For this purpose and for practical research in network function virtualization in general, there is a great need for benchmarks and experimental data. In this paper, we describe our experiments for characterizing the relationship between resource demands of virtual network functions (VNFs) and the expected performance of the SFC, considering the individual performance of the VNFs as well as the interdependencies among VNFs within the SFC. We have designed our experiments focusing on video streaming, an important application in this context. We present examples of models for predicting the interdependence between resource demands and performance characteristics of SFCs using support vector regression and polynomial regression models. We also show practical evidence from our experiments that VNFs need to be benchmarked in their final chain setup, rather than individually, to capture important interdependencies that affect their performance. The data gathered from our experiments is publicly available.


                                    Trade-offs in Dynamic Resource Allocation in Network Function Virtualization

                                    S.B. Schneider, S. Dräxler, H. Karl, in: IEEE Global Communications Conference (GLOBECOM 2018), IEEE, 2018

                                    Dynamic allocation of resources is a key feature in network function virtualization (NFV), enabling flexible adjustment of slices and contained network services to ever-changing service demands. Considering resource allocation across the entire network, many authors have proposed approaches to optimize the placement and chaining of virtual network function (VNF) instances and the allocation of resources to these VNF instances. In doing so, various optimization objectives are conceivable, e.g., minimizing certain required resources or the end-to-end delay of the placed services. In this paper, we investigate the relationship between four typical optimization objectives when coordinating the placement and resource allocation of chained VNF instances. We observe an interesting trade-off between minimizing the overhead of starting/stopping VNF instances and all other objectives when adapting to changed service demands.


                                    Demonstrating FOP4: A Flexible Platform to Prototype NFV Offloading Scenarios

                                    D. Moro, M. Peuster, H. Karl, A. Capone, in: IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), IEEE, 2019

                                    Emulation platforms supporting Virtual Network Functions (VNFs) allow developers to rapidly prototype network services. None of the available platforms, however, supports experimenting with programmable data planes to enable VNF offloading. In this demonstration, we show FOP4, a flexible platform that provides support for Docker-based VNFs, and VNF offloading, by means of P4-enabled switches. The platform provides interfaces to program the P4 devices and to deploy network functions. We demonstrate FOP4 with two complex example scenarios, demonstrating how developers can exploit data plane programmability to implement network functions.


                                    Specifying and Analyzing Virtual Network Services Using Queuing Petri Nets

                                    S.B. Schneider, A. Sharma, H. Karl, H. Wehrheim, in: 2019 IFIP/IEEE International Symposium on Integrated Network Management (IM), IFIP, 2019, pp. 116--124

                                    For optimal placement and orchestration of network services, it is crucial that their structure and semantics are specified clearly and comprehensively and are available to an orchestrator. Existing specification approaches are either ambiguous or miss important aspects regarding the behavior of virtual network functions (VNFs) forming a service. We propose to formally and unambiguously specify the behavior of these functions and services using Queuing Petri Nets (QPNs). QPNs are an established method that allows to express queuing, synchronization, stochastically distributed processing delays, and changing traffic volume and characteristics at each VNF. With QPNs, multiple VNFs can be connected to complete network services in any structure, even specifying bidirectional network services containing loops. We discuss how management and orchestration systems can benefit from our clear and comprehensive specification approach, leading to better placement of VNFs and improved Quality of Service. Another benefit of formally specifying network services with QPNs are diverse analysis options, which allow valuable insights such as the distribution of end-to-end delay. We propose a tool-based workflow that supports the specification of network services and the automatic generation of corresponding simulation code to enable an in-depth analysis of their behavior and performance.


                                    Introducing Automated Verification and Validation for Virtualized Network Functions and Services

                                    M. Peuster, S.B. Schneider, M. Zhao, G. Xilouris, P. Trakadas, F. Vicens, W. Tavernier, T. Soenen, R. Vilalta, G. Andreou, D. Kyriazis, H. Karl, IEEE Communications Magazine (2019), pp. 96-102


                                    Automated testing of NFV orchestrators against carrier-grade multi-PoP scenarios using emulation-based smoke testing

                                    M. Peuster, M. Marchetti, G. García de Blas, H. Karl, EURASIP Journal on Wireless Communications and Networking (2019)


                                    "Producing Cloud-Native": Smart Manufacturing Use Cases on Kubernetes

                                    S.B. Schneider, M. Peuster, K. Hannemann, D. Behnke, M. Müller, P. Bök, H. Karl, in: IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN) Demo Track, IEEE, 2019

                                    Building on 5G and network function virtualization (NFV), smart manufacturing has the potential to drastically increase productivity, reduce cost, and introduce novel, flexible manufacturing services. Current work mostly focuses on high-level scenarios or emulation-based prototype deployments. Extending our previous work, we showcase one of the first cloud-native 5G verticals focusing on the deployment of smart manufacturing use cases on production infrastructure. In particular, we use the 5GTANGO service platform to deploy our developed network services on Kubernetes. For this demo, we implemented a series of cloud-native virtualized network functions (VNFs) and created suitable service descriptors. Their light-weight, stateless deployment on Kubernetes enables quick instantiation, scalability, and robustness.


                                    Joint testing and profiling of microservice-based network services using TTCN-3

                                    M. Peuster, C. Dröge, C. Boos, H. Karl, ICT Express (2019)

                                    The ongoing softwarization of networks creates a big need for automated testing solutions to ensure service quality. This becomes even more important if agile environments with short time to market and high demands, in terms of service performance and availability, are considered. In this paper, we introduce a novel testing solution for virtualized, microservice-based network functions and services, which we base on TTCN-3, a well known testing language defined by the European standards institute (ETSI). We use TTCN-3 not only for functional testing but also answer the question whether TTCN-3 can be used for performance profiling tasks as well. Finally, we demonstrate the proposed concepts and solutions in a case study using our open-source prototype to test and profile a chained network service.


                                    SPRING: Scaling, Placement, and Routing of Heterogeneous Services with Flexible Structures

                                    S. Dräxler, H. Karl, in: 5th IEEE International Conference on Network Softwarization (NetSoft) 2019, 2019


                                    A flow handover protocol to support state migration in softwarized networks

                                    M. Peuster, H. Küttner, H. Karl, International Journal of Network Management (2019)

                                    Softwarized networks are the key enabler for elastic, on-demand service deployments of virtualized network functions. They allow to dynamically steer traffic through the network when new network functions are instantiated, or old ones are terminated. These scenarios become in particular challenging when stateful functions are involved, necessitating state management solutions to migrate state between the functions. The problem with existing solutions is that they typically embrace state migration and flow rerouting jointly, imposing a huge set of requirements on the on-boarded virtualized network functions (VNFs), eg, solution-specific state management interfaces. To change this, we introduce the seamless handover protocol (SHarP). An easy-to-use, loss-less, and order-preserving flow rerouting mechanism that is not fixed to a single state management approach. Using SHarP, VNF vendors are empowered to implement or use the state management solution of their choice. SHarP supports these solutions with additional information when flows are migrated. In this paper, we present SHarP's design, its open source prototype implementation, and show how SHarP significantly reduces the buffer usage at a central (SDN) controller, which is a typical bottleneck in state-of-the-art solutions. Our experiments show that SHarP uses a constant amount of controller buffer, irrespective of the time taken to migrate the VNF state.


                                    5G as Key Technology for Networked Factories: Application of Vertical-specific Network Services for Enabling Flexible Smart Manufacturing

                                    M. Müller, D. Behnke, P. Bök, M. Peuster, S.B. Schneider, H. Karl, in: IEEE 17th International Conference on Industrial Informatics (IEEE-INDIN), IEEE, 2019


                                    The Softwarised Network Data Zoo

                                    M. Peuster, S.B. Schneider, H. Karl, in: IEEE/IFIP 15th International Conference on Network and Service Management (CNSM), IEEE/IFIP, 2019

                                    More and more management and orchestration approaches for (software) networks are based on machine learning paradigms and solutions. These approaches depend not only on their program code to operate properly, but also require enough input data to train their internal models. However, such training data is barely available for the software networking domain and most presented solutions rely on their own, sometimes not even published, data sets. This makes it hard, or even infeasible, to reproduce and compare many of the existing solutions. As a result, it ultimately slows down the adoption of machine learning approaches in softwarised networks. To this end, we introduce the "softwarised network data zoo" (SNDZoo), an open collection of software networking data sets aiming to streamline and ease machine learning research in the software networking domain. We present a general methodology to collect, archive, and publish those data sets for use by other researches and, as an example, eight initial data sets, focusing on the performance of virtualised network functions.


                                    Putting 5G into Production: Realizing a Smart Manufacturing Vertical Scenario

                                    S.B. Schneider, M. Peuster, D. Behnke, M. Marcel, P. Bök, H. Karl, in: European Conference on Networks and Communications (EuCNC), IEEE, 2019

                                    As 5G and network function virtualization (NFV) are maturing, it becomes crucial to demonstrate their feasibility and benefits by means of vertical scenarios. While 5GPPP has identified smart manufacturing as one of the most important vertical industries, there is still a lack of specific, practical use cases. Using the experience from a large-scale manufacturing company, Weidm{\"u}ller Group, we present a detailed use case that reflects the needs of real-world manufacturers. We also propose an architecture with specific network services and virtual network functions (VNFs) that realize the use case in practice. As a proof of concept, we implement the required services and deploy them on an emulation-based prototyping platform. Our experimental results indicate that a fully virtualized smart manufacturing use case is not only feasible but also reduces machine interconnection and configuration time and thus improves productivity by orders of magnitude.


                                    Prototyping and Demonstrating 5G Verticals: The Smart Manufacturing Case

                                    M. Peuster, S.B. Schneider, D. Behnke, M. Müller, P. Bök, H. Karl, in: 5th IEEE International Conference on Network Softwarization (NetSoft 2019), 2019

                                    5G together with software defined networking (SDN) and network function virtualisation (NFV) will enable a wide variety of vertical use cases. One of them is the smart man- ufacturing case which utilises 5G networks to interconnect production machines, machine parks, and factory sites to enable new possibilities in terms of flexibility, automation, and novel applications (industry 4.0). However, the availability of realistic and practical proof-of-concepts for those smart manufacturing scenarios is still limited. This demo fills this gap by not only showing a real-world smart manufacturing application entirely implemented using NFV concepts, but also a lightweight prototyping framework that simplifies the realisation of vertical NFV proof-of-concepts. Dur- ing the demo, we show how an NFV-based smart manufacturing scenario can be specified, on-boarded, and instantiated before we demonstrate how the presented NFV services simplify machine data collection, aggregation, and analysis.


                                    FOP4: Function Offloading Prototyping in Heterogeneous and Programmable Network Scenarios

                                    D. Moro, M. Peuster, H. Karl, A. Capone, in: IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), IEEE, 2019

                                    Offloading packet processing tasks to programmable switches and/or to programmable network interfaces, so called “SmartNICs”, is one of the key concepts to prepare softwarized networks for the high traffic demands of the future. However, implementing network functions that make use of those offload- ing technologies is still challenging and usually requires the availability of specialized hardware. It becomes even harder if heterogeneous services, making use of different offloading and network virtualization technologies, should be developed. In this paper, we introduce FOP4 (Function Offloading Pro- totyping with P4), a novel prototyping platform that allows to prototype heterogeneous software network scenarios, including container-based, P4-switch-based, and SmartNIC-based network functions. The presented work substantially extends our existing Containernet platform with the means to prototype offloading scenarios. Besides presenting the platform’s system design, we evaluate its scalability and show that it can run scenarios with more than 64 P4 switch or SmartNIC nodes on a single laptop. Finally, we presented a case study in which we use the presented platform to prototype an extended in-band network telemetry use case.


                                    Machine Learning for Dynamic Resource Allocation in Network Function Virtualization

                                    S.B. Schneider, N.P. Satheeschandran, M. Peuster, H. Karl, in: IEEE Conference on Network Softwarization (NetSoft), IEEE, 2020

                                    Network function virtualization (NFV) proposes to replace physical middleboxes with more flexible virtual network functions (VNFs). To dynamically adjust to everchanging traffic demands, VNFs have to be instantiated and their allocated resources have to be adjusted on demand. Deciding the amount of allocated resources is non-trivial. Existing optimization approaches often assume fixed resource requirements for each VNF instance. However, this can easily lead to either waste of resources or bad service quality if too many or too few resources are allocated. To solve this problem, we train machine learning models on real VNF data, containing measurements of performance and resource requirements. For each VNF, the trained models can then accurately predict the required resources to handle a certain traffic load. We integrate these machine learning models into an algorithm for joint VNF scaling and placement and evaluate their impact on resulting VNF placements. Our evaluation based on real-world data shows that using suitable machine learning models effectively avoids over- and underallocation of resources, leading to up to 12 times lower resource consumption and better service quality with up to 4.5 times lower total delay than using standard fixed resource allocation.


                                    Cloud-Native Threat Detection and Containment for Smart Manufacturing

                                    M. Müller, D. Behnke, P. Bök, S.B. Schneider, M. Peuster, H. Karl, in: IEEE Conference on Network Softwarization (NetSoft) Demo Track, IEEE, 2020

                                    Softwarization facilitates the introduction of smart manufacturing applications in the industry. Manifold devices such as machine computers, Industrial IoT devices, tablets, smartphones and smart glasses are integrated into factory networks to enable shop floor digitalization and big data analysis. To handle the increasing number of devices and the resulting traffic, a flexible and scalable factory network is necessary which can be realized using softwarization technologies like Network Function Virtualization (NFV). However, the security risks increase with the increasing number of new devices, so that cyber security must also be considered in NFV-based networks. Therefore, extending our previous work, we showcase threat detection using a cloud-native NFV-driven intrusion detection system (IDS) that is integrated in our industrial-specific network services. As a result of the threat detection, the affected network service is put into quarantine via automatic network reconfiguration. We use the 5GTANGO service platform to deploy our developed network services on Kubernetes and to initiate the network reconfiguration.


                                      Every Node for Itself: Fully Distributed Service Coordination

                                      S.B. Schneider, L.D. Klenner, H. Karl, in: IEEE International Conference on Network and Service Management (CNSM), IEEE, 2020

                                      Modern services consist of modular, interconnected components, e.g., microservices forming a service mesh. To dynamically adjust to ever-changing service demands, service components have to be instantiated on nodes across the network. Incoming flows requesting a service then need to be routed through the deployed instances while considering node and link capacities. Ultimately, the goal is to maximize the successfully served flows and Quality of Service (QoS) through online service coordination. Current approaches for service coordination are usually centralized, assuming up-to-date global knowledge and making global decisions for all nodes in the network. Such global knowledge and centralized decisions are not realistic in practical large-scale networks. To solve this problem, we propose two algorithms for fully distributed service coordination. The proposed algorithms can be executed individually at each node in parallel and require only very limited global knowledge. We compare and evaluate both algorithms with a state-of-the-art centralized approach in extensive simulations on a large-scale, real-world network topology. Our results indicate that the two algorithms can compete with centralized approaches in terms of solution quality but require less global knowledge and are magnitudes faster (more than 100x).


                                      A Case for a New IT Ecosystem: On-The-Fly Computing

                                      H. Karl, D. Kundisch, F. Meyer auf der Heide, H. Wehrheim, Business & Information Systems Engineering (2020), 62(6), pp. 467-481


                                      Coflow Scheduling with Performance Guarantees for Data Center Applications

                                      A. Hasnain, H. Karl, in: 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), IEEE Computer Society, 2020

                                      Data-parallel applications run on cluster of servers in a datacenter and their communication triggers correlated resource demand on multiple links that can be abstracted as coflow. They often desire predictable network performance, which can be passed to network via coflow abstraction for application-aware network scheduling. In this paper, we propose a heuristic and an optimization algorithm for predictable network performance such that they guarantee coflows completion within their deadlines. The algorithms also ensure high network utilization, i.e., it's work-conserving, and avoids starvation of coflows. We evaluate both algorithms via trace-driven simulation and show that they admit 1.1x more coflows than the Varys scheme while meeting their deadlines.


                                        Self-Driving Network and Service Coordination Using Deep Reinforcement Learning

                                        S.B. Schneider, A. Manzoor, H. Qarawlus, R. Schellenberg, H. Karl, R. Khalili, A. Hecker, in: IEEE International Conference on Network and Service Management (CNSM), IEEE, 2020

                                        Modern services comprise interconnected components, e.g., microservices in a service mesh, that can scale and run on multiple nodes across the network on demand. To process incoming traffic, service components have to be instantiated and traffic assigned to these instances, taking capacities and changing demands into account. This challenge is usually solved with custom approaches designed by experts. While this typically works well for the considered scenario, the models often rely on unrealistic assumptions or on knowledge that is not available in practice (e.g., a priori knowledge). We propose a novel deep reinforcement learning approach that learns how to best coordinate services and is geared towards realistic assumptions. It interacts with the network and relies on available, possibly delayed monitoring information. Rather than defining a complex model or an algorithm how to achieve an objective, our model-free approach adapts to various objectives and traffic patterns. An agent is trained offline without expert knowledge and then applied online with minimal overhead. Compared to a state-of-the-art heuristic, it significantly improves flow throughput and overall network utility on real-world network topologies and traffic traces. It also learns to optimize different objectives, generalizes to scenarios with unseen, stochastic traffic patterns, and scales to large real-world networks.


                                        Learning Coflow Admissions

                                        A. Hasnain, H. Karl, in: IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), IEEE Communications Society, 2021

                                        Data-parallel applications are developed using different data programming models, e.g., MapReduce, partition/aggregate. These models represent diverse resource requirements of application in a datacenter network, which can be represented by the coflow abstraction. The conventional method of creating hand-crafted coflow heuristics for admission or scheduling for different workloads is practically infeasible. In this paper, we propose a deep reinforcement learning (DRL)-based coflow admission scheme -- LCS -- that can learn an admission policy for a higher-level performance objective, i.e., maximize successful coflow admissions, without manual feature engineering. LCS is trained on a production trace, which has online coflow arrivals. The evaluation results show that LCS is able to learn a reasonable admission policy that admits more coflows than state-of-the-art Varys heuristic while meeting their deadlines.


                                        Distributed Online Service Coordination Using Deep Reinforcement Learning

                                        S.B. Schneider, H. Qarawlus, H. Karl, in: IEEE International Conference on Distributed Computing Systems (ICDCS), IEEE, 2021

                                        Services often consist of multiple chained components such as microservices in a service mesh, or machine learning functions in a pipeline. Providing these services requires online coordination including scaling the service, placing instance of all components in the network, scheduling traffic to these instances, and routing traffic through the network. Optimized service coordination is still a hard problem due to many influencing factors such as rapidly arriving user demands and limited node and link capacity. Existing approaches to solve the problem are often built on rigid models and assumptions, tailored to specific scenarios. If the scenario changes and the assumptions no longer hold, they easily break and require manual adjustments by experts. Novel self-learning approaches using deep reinforcement learning (DRL) are promising but still have limitations as they only address simplified versions of the problem and are typically centralized and thus do not scale to practical large-scale networks. To address these issues, we propose a distributed self-learning service coordination approach using DRL. After centralized training, we deploy a distributed DRL agent at each node in the network, making fast coordination decisions locally in parallel with the other nodes. Each agent only observes its direct neighbors and does not need global knowledge. Hence, our approach scales independently from the size of the network. In our extensive evaluation using real-world network topologies and traffic traces, we show that our proposed approach outperforms a state-of-the-art conventional heuristic as well as a centralized DRL approach (60% higher throughput on average) while requiring less time per online decision (1 ms).


                                        Self-Learning Multi-Objective Service Coordination Using Deep Reinforcement Learning

                                        S.B. Schneider, R. Khalili, A. Manzoor, H. Qarawlus, R. Schellenberg, H. Karl, A. Hecker, Transactions on Network and Service Management (2021)

                                        Modern services consist of interconnected components,e.g., microservices in a service mesh or machine learning functions in a pipeline. These services can scale and run across multiple network nodes on demand. To process incoming traffic, service components have to be instantiated and traffic assigned to these instances, taking capacities, changing demands, and Quality of Service (QoS) requirements into account. This challenge is usually solved with custom approaches designed by experts. While this typically works well for the considered scenario, the models often rely on unrealistic assumptions or on knowledge that is not available in practice (e.g., a priori knowledge). We propose DeepCoord, a novel deep reinforcement learning approach that learns how to best coordinate services and is geared towards realistic assumptions. It interacts with the network and relies on available, possibly delayed monitoring information. Rather than defining a complex model or an algorithm on how to achieve an objective, our model-free approach adapts to various objectives and traffic patterns. An agent is trained offline without expert knowledge and then applied online with minimal overhead. Compared to a state-of-the-art heuristic, DeepCoord significantly improves flow throughput (up to 76%) and overall network utility (more than 2x) on realworld network topologies and traffic traces. It also supports optimizing multiple, possibly competing objectives, learns to respect QoS requirements, generalizes to scenarios with unseen, stochastic traffic, and scales to large real-world networks. For reproducibility and reuse, our code is publicly available.


                                        Divide and Conquer: Hierarchical Network and Service Coordination

                                        S.B. Schneider, M. Jürgens, H. Karl, in: IFIP/IEEE International Symposium on Integrated Network Management (IM), IFIP/IEEE, 2021

                                        In practical, large-scale networks, services are requested by users across the globe, e.g., for video streaming. Services consist of multiple interconnected components such as microservices in a service mesh. Coordinating these services requires scaling them according to continuously changing user demand, deploying instances at the edge close to their users, and routing traffic efficiently between users and connected instances. Network and service coordination is commonly addressed through centralized approaches, where a single coordinator knows everything and coordinates the entire network globally. While such centralized approaches can reach global optima, they do not scale to large, realistic networks. In contrast, distributed approaches scale well, but sacrifice solution quality due to their limited scope of knowledge and coordination decisions. To this end, we propose a hierarchical coordination approach that combines the good solution quality of centralized approaches with the scalability of distributed approaches. In doing so, we divide the network into multiple hierarchical domains and optimize coordination in a top-down manner. We compare our hierarchical with a centralized approach in an extensive evaluation on a real-world network topology. Our results indicate that hierarchical coordination can find close-to-optimal solutions in a fraction of the runtime of centralized approaches.


                                        Learning Flow Scheduling

                                        A. Hasnain, H. Karl, in: 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC), IEEE Computer Society, 2021

                                        Datacenter applications have different resource requirements from network and developing flow scheduling heuristics for every workload is practically infeasible. In this paper, we show that deep reinforcement learning (RL) can be used to efficiently learn flow scheduling policies for different workloads without manual feature engineering. Specifically, we present LFS, which learns to optimize a high-level performance objective, e.g., maximize the number of flow admissions while meeting the deadlines. The LFS scheduler is trained through deep RL to learn a scheduling policy on continuous online flow arrivals. The evaluation results show that the trained LFS scheduler admits 1.05x more flows than the greedy flow scheduling heuristics under varying network load.


                                          Open list in Research Information System

                                          Further information:

                                          Subproject C4

                                          Information about the project:
                                          Project members:Holger Karl
                                          Asif Hasnain
                                          Project website:http://sfb901.uni-paderborn.de/
                                          Type:DFG Project
                                          Started:July 2015
                                          Finished:Active
                                          Contact:Holger Karl

                                          The University for the Information Society