For Master's theses, Bachelor's theses or for Software Engineering projects in the Master's program
(Most topics can be adapted in scale to fit any of the above categories)
Time Series-based Event Prediction with Metadata in a Multi-System Environment
Machine learning models that can predict performance-relevant events are of high
interest when dealing with large software systems. However, due to their huge
sizes and diversity, creating appropriate machine learning models might require
finding common subparts of these systems to learn from similar parts rather than
from entire but diverse systems.
The goal of this thesis is to use monitoring time series data, performance event
data and metadata of a multi-system environment to train and test machine learning
models to predict these performance events. The metadata should be used to find
similarities among the systems and their components, and then to utilize these
similarities to create machine learning models.
Resource Exhaustion Prediction in a Multi-System Environment
Resource exhaustion, such as high CPU load, low memory or disk space, are common
problems in software systems. One way of dealing with such problems is to create
predictive models to detect early signs of exhaustion, which enables administrators
to take actions before the actual exhaustion occurs.
The goal of this thesis is to use monitoring time series data of a multi-system
environment to (1) create exhaustion events with a custom heuristic which indicate
resource exhaustion, and (2) to train and test machine learning models to predict
these exhaustion events.
Time Series-based Predictive Maintenance with RapidMiner
Prediction of failures and events plays an important role in today's software systems. An open challenge
is to find similarities in failures across different systems. The sheer amount of available monitoring
and event data in this field requires tool support, such as RapidMiner
a powerful data science environment which enables fast prototyping and validation of predictive models.
The goal of this thesis is to use RapidMiner to explore monitoring time series data and event data of a
multi-system environment, and to investigate and compare various predictive maintenance approaches,
thereby utilizing the different parts of RapidMiner's workflow pipeline, including data preparation and cleansing.
Expression Evaluation for Debugging with Sulong (Java, C)
is an interpreter for LLVM IR,
an intermediate representation of source code that can be produced by the
compiler for the C family of programming languages.
It is based on the Truffle
for implementing interpreters for programming languages and part of the
project. In addition to executing programs that were
compiled to LLVM IR, Sulong also allows users to debug these programs at source-level.
At the moment, this debugging support is limited to basic actions such as symbol inspection and
stepping and does not yet include expression evaluation.
The goal of this project is to enable Sulong to execute C/C++ expressions in the context of functions
that are currently under inspection in the debugger. More precisely, the project will require
determining a reasonable subset of the C and C++ programming languages that is essential when debugging
C/C++ programs, implementing a parser for this language subset using the
parser generator, and reusing Sulong's existing infrastructure to evaluate the parsed expression.
Partial Support for PThreads in Sulong (Java, C)
is an interpreter for LLVM IR,
an intermediate representation of source code that can be produced by the Clang
compiler for the C family of programming languages. It is implemented on top of the
framework. In principle, Sulong
supports calling library functions that are only available as native code. This enables users to call,
e.g., functions from the C standard library. However, for certain libraries this is known not to work
and Sulong either explicitly prohibits their use or provides a compatible implementation of their interface.
At the moment, the PThreads library for creating and managing threads is among the explicitly unsupported ones.
The goal of this project is to implement a core subset of the PThreads API in Sulong, adapt Sulong's
implementation where required to be used in a multi-threaded environment, and to demonstrate the
functionality of the implementation using a suitable real-world or self-implemented application.
framework for implementing interpreters for programming languages. It is the key
component of the GraalVM
project which combines several such Truffle
language implementations into a single multi-language runtime. The framework also
provides an API for efficient instrumentation and debugging of programs executed
by a Truffle language implementation. Multiple debuggers already utilize this API
to implement their back-end. The goal of this project is to develop a plugin for
Visual Studio Code
that uses Truffle's instrumentation framework as a back-end
for the editor's integrated debugger.
Reference implementation of SOMns record and replay in GraaJS (Java)
SOMns is a research language similar to Smalltalk. It provides specialised
debugging support for its concurrency models, e.g. record & replay. Record
and replay debugging is based on the idea of recording a program trace that
allows one to deterministically reproduce an execution (including bugs). SOMns and
GraalJS are both implemented in Java with the Truffle framework and use similar
concurrency models. The goal of this thesis is to reimplement
the record and replay strategy from SOMns in GraalJS. In addition, recording
performance of the GraalJS implementation should be evaluated with benchmarks.
Enhancing the AcmeAir benchmark application
AcmeAir is a simple web-application that represents the booking system of an airline.
It is used to evaluate the run-time performance of debugging tools in our SOMns language
implementation. Currently, AcmeAir supports a limited set of operations that are very
database dependent. The goal of this project is to enhance AcmeAir with features one
could find in a real booking system, for example multiple options for finding flight
connections. Additionally, the JMeter configuration used to drive the benchmark needs
to be updated to use the new features.
Automatic Evaluation of C Bug Finding Tools
Buffer overflow vulnerabilities, use-after-free errors, NULL dereferences, and other errors are omnipresent in C.
Bug-finding tools such as LLVM's AddressSanitizer, Valgrind, and Safe Sulong enable
programmers to tackle this issue by executing the system under test with the respective tool.
The goal of this project is to automatically determine the distribution of error categories
and evaluate state-of-the-art bug-finding tools on Github projects. The outcome should be a
tool that downloads projects from Github, builds the projects, and runs their test suites
with different bug finding tools. The bug finding tools must be capable of classifying
execution errors by their error messages.
A Feasibility Study on Executing Binary Code on Sulong
is an execution environment for C and
other low-level languages on the JVM. C programs often rely on third party libraries that are
only available as binary code, which cannot be executed by Sulong. However, Sulong executes a
low-level intermediate representation, called LLVM IR, that can be produced by tools from
binary code. Such tools include QEMU, MC-Semantics, and LLBT; using one could enable Sulong
to execute binary code. The goal of this project is to evaluate whether these tools are complete
enough to translate common code and determine if the produced LLVM IR is suitable for execution
on Sulong. As a subgoal, unimplemented features that are exercised by such code should be implemented in Sulong.
The Truffle Framework allows you to write code that interoperates between
different languages. In this project, you should prepare test cases to see how well-behaving
complete set of unit tests and an analysis of the failing behaviour.
NUMA support for the G1 Garbage Collector
On multi-socket systems memory access time depends on the memory location
relative to the processor (locality group): "closer" memory access latency is significantly
smaller than memory that is located with a different processor.
Currently G1 does not exploit this by improving or at least keeping access locality the same.
Goals for this task could include implementation of the common heuristics used in literature that
keep objects in the same locality group as long as possible, like a) let G1 keep data in the
same locality group in the young generation and try to evenly spread data across locality
groups in old generation; or b) try to keep locality in both young and old generation.
Measure the impact of these strategies across a set of industry benchmarks and analyze
other areas in the garbage collector that might benefit from NUMA awareness and potentially