Home
General
Staff
Contact
Partners
Alumni
Research
Areas
Projects
Papers
Books
Reports
Awards
Teaching
Lectures
Exams
B.Projects
M.Theses
PhD Theses
Go Abroad
Misc
Talks
Library
Gallery
Links
Search
Webmaster
|
Open Projects
For Master's theses, Bachelor's theses or for Software Engineering projects in the Master's program
(Most topics can be adapted in scale to fit any of the above categories)
Filter by:
-
Project MegaSlop: Use LLMs to Generate Large Codebases for Performance Tuning for Modern Runtime Systems (Python, JavaScript, LLMs, Prompting)
In the future, lots of code may be LLM-generated, and follow different engineering principles than human-written code.
For example, it is likely that codebases of modern applications will be significantly larger, with more boilerplate and code duplication.
Unfortunately, modern runtime systems that use just-in-time compilation are not yet ready for such codebases.
The goal of this project is to construct an approach that allows us to generate large codebases reliably using LLMs.
These codebases should resemble real-world application code, but do not have to be functionally correct.
Instead, it is sufficient that we can construct them in a way that much of the code is executable to trigger the runtime system's optimizations
and identify performance bottlenecks.
-
ReBenchDB: Continuous Performance Testing with GitHub/GitLab CI (Benchmarking, Performance, TypeScript, PostgreSQL, Git, GitHub/GitLab APIs)
Performance often takes a backseat for developing applications. One reason is
probably that measuring and tracking performance it is rather inconvenient and
there are no standardized tools. Big companies such as Google, Oracle, Mozilla,
and Microsoft have their own in-house tools. However, small companies and
researchers rarely have access to such tools.
ReBenchDB is an open source tool
that can be used to track the performance of an application, for instance to
see the benefit of optimizations of a language implementation over time.
The goal of this project is to expand ReBenchDB, make it more flexible, so
that it performance testing can become a natural part of modern development processes.
-
Tools for Visual Teaching and Visual Learning (HTML, CSS, JavaScript, TypeScript)
Interactive visualizations to help lecturers to teach content in a more engaging way and to help students to learn easier.
For examples, have a look at finished theses supervised by me in the past.
I do not have a predefined list of visualizations that need to be implemented, but I am open to student suggestions whether they have ideas on how their own study experience could have been improved if interactive visualizations existed in their classes.
Feel free to contact me if you have an idea for such a thesis.
-
Trace Authoring Features for the Graphical Debugger JavaWiz (HTML, CSS, JavaScript, TypeScript)
JavaWiz is a graphical debugger used in software development education.
It records execution traces—capturing all steps that occur during program execution—to build insightful visualizations and provide "time-travel" debugging capabilities.
Currently, lecturers primarily use JavaWiz for live in-class demonstrations, often resorting to taking screenshots and annotating them in PowerPoint for study materials.
The goal of this project is to streamline this workflow by adding authoring capabilities directly to JavaWiz.
You will implement features to add persistent annotations (such as speech bubbles and highlighting rectangles) that are stored as part of the execution trace.
These annotated traces can then be distributed to students as an interactive alternative to static slides.
As an optional extension, the project may explore integrating these visualizations directly into PowerPoint presentations using web views.
-
Hunting for Python Concurrency Bugs (Python, Concurrency, Threading, Parallelism)
With Python 3.14, Python's approach to parallel execution became officially part of the language.
What Python calls free threading, allows the use of multiple threads to execute Python code in parallel.
This is a major shift for how Python works, and there are concerns that
much of the existing Python code is not ready for this change.
In the project, we want to systematically explore existing open-source Python codebases
to identify potential concurrency bugs that may arise when free threading is enabled.
-
Splitting Of Large Wasm Functions to Enable JIT Compilation (Wasm, Instrumentation, Compilers, C++)
Wasm is gaining popularity as a platform-independent program format.
It can be used to run code written in languages such as C, C++, and Rust
in the browser, on the JVM, or on runtime systems design to run in the cloud.
While most dedicated Wasm runtimes support compilation of rather large functions,
JVMs such as HotSpot were never designed for it. To enable the JIT compilation
of Wasm on platforms that support only smaller functions, we need to develop
techniques to split Wasm functions.
The goal of this project is to enable programs such as Python and SQLite to run
well on top of the HotSpot JVM. To this end, the project needs to investigate techniques
how to split and possibly duplicate parts of functions that are executed together
to create smaller functions from them. The implementation of a solution in a system such as binaryen would be desirable.
To realize this goal, we will first instrument Wasm examples to better understand
which splitting strategies may be suitable, and afterwards prototype a compiler optimization
to perform such splitting.
For this project, you may work with collaborators from academia and industry, including IBM.
-
New JavaScript Language Features - ECMAScript proposals (Java, JavaScript)
JavaScript is specified in the ECMAScript language specification. It is an evolving language, and is extended by a "proposal" process.
Each new or improved feature is specified by one proposal. Current open proposals include Realms, Pipeline operator, In-Place
Resizable and Growable ArrayBuffers, Array find-from-last, Array grouping, and several more. As the different proposals vastly
differ in effort to implement them, we have topics for projects (project in software engineering), bachelor theses and master theses.
The task is to fully implement the current state of the proposal in the GraalVM/Graal.js JavaScript engine.
-
Are We Fast Yet? Tracking the Performance of Your Favorite Language (Benchmarking, Performance)
The Are We Fast Yet project aims to enable compiler developers and language implementers to optimize the core elements of their languages.
The main idea is that a strict set of rules allows us to implement the same 14 benchmarks in different languages and then compare their performance.
This allows us then to find opportunities for optimizations in the language implementations.
This project can take many forms depending on your interests.
You could port the benchmarks to new languages. This will take coming up with language-specific rules and implementing the benchmarks. Interesting languages would include, e.g., C#, Rust, PHP, Go, Scala, Kotlin, Swift, and anything else still missing from our list.
Alternatively, we do not yet have a setup to automatically run the benchmarks on new versions of language implementations automatically.
Ideally, we would like to be able to track performance over time.
With ReBenchDB and the ReBench tool, we already have a solid foundation as well as the setup to track the data.
However, the execution is not yet fully automated, and ReBenchDB could use a few more features to make such use cases easier, too.
-
The Simple Object Machine (Compilers, Interpreters, Performance, Benchmarking, C++, Java, JavaScript, Python)
In Compiler Construction, we learned how to implement a minimal Java-like language.
However, it lacked very many features modern languages have.
The Simple Object Machine (SOM) is a different style of teaching and research language that aims to be a somewhat complete object-oriented, dynamically typed language,
that has the same implementation challenges as other modern languages such as Python, JavaScript, and Ruby, but is still manageable to learn.
A project on SOM could look into different aspects based on your interests.
Here a few general ideas that can be refined to be a suitable project:
- SOM in your Language of Choice: One could implement SOM from scratch in a language such as Zig, Swift, Ruby, OCaml, or any other language we do not already have an implementation for.
- The fastest Interpreter: Interpreters are an important area of research since their performance determines the perceived performance for a wide range of languages. SOM++, the C++-based implementation of SOM has currently the fastest interpreter, but the results of colleagues indicate we can be 4-5x better. There are a number of optimizations that C++ could use to reach this goal.
- MicroSOM: MicroJava has the right size to fit into a single course on compilers. What would a MicroSOM look like if we would want to teach interpreters instead of compilers? The goal here would be to reduce the feature set and investigate how to build up SOM step by step to enable a modern and captivating course on language implementation.
- A New Interpreter: Currently, SOM++ uses a classic stack-based bytecode interpreter. Would SOM++ be faster if it would use a register-based bytecode, or an abstract-syntax-tree-based interpreter?
- Is there some language implementation technique you are interested in? Let's define a project together!
-
A Comparative Analysis of Static Site Generators (HTML, CSS, JavaScript, TypeScript, Markdown)
Static Site Generators (SSGs) transform raw content and templates into deployable HTML files.
While the core concept is simple, the ecosystem varies wildly—from classic tools like Jekyll (Ruby) to modern, JavaScript-heavy frameworks like Gatsby.
The goal of this project is to implement a reference website (e.g., a documentation site or blog) across all selected frameworks to perform a structural and performance comparison.
The student will evaluate the frameworks based on the following criteria:
- Developer Experience: Setup complexity, templating languages (Liquid, Go Templates, JSX), and configuration.
- Content Management: How the SSG parses Markdown and Frontmatter metadata (e.g., accessing post dates, tags, and categories).
- Build Performance: Measuring the time required to generate the site as the number of pages increases.
- Output Structure Analysis: A structural comparison of the generated HTML. Does the SSG produce clean, semantic markup, or does it inject unnecessary wrapper `div`s and scripts?
- Client-Side Footprint: analyzing the "weight" of the deployed site. Does the SSG ship a JavaScript runtime (like React) to the client, or does it output zero-JS HTML?
- Asset Processing: Capabilities for build-time image optimization (automatic resizing/format conversion) and CSS minification.
Possible investigation targets involve Jekyll, Hugo, Eleventy, Gatsby, Next.js, and Astro.
-
An H5P Markdown Viewer (HTML, CSS, JavaScript, TypeScript, PHP)
H5P is a tool for creating interactive learning content that can be embedded into various learning management systems (LMS) such as Moodle.
While H5P supports a wide range of content types, it currently lacks a dedicated viewer for Markdown files, which is a widely used format to develop text-based learning materials.
Thus, lecturers often have to spend extra effort to convert their Markdown content into formats compatible with H5P.
The goal of this project is to develop an H5P content type that can render Markdown files directly within the H5P framework.
This will involve implementing a viewer that can parse and display Markdown content, including support for common Markdown features such as headings, lists, links, images, and code blocks.
Users should be able to provide the markdown either via (a) file upload, (b) copy-paste into a text area, or (c) via a URL pointing to a raw markdown file.
Especially the latter option would allow lecturers to maintain their materials in version-controlled repositories (e.g., on GitHub) and have them automatically reflected in their H5P content.
Additional information on H5P content type development can be found here.
-
A Markdown-To-Interactive-H5P-Book Converter (HTML, CSS, JavaScript, TypeScript, PHP)
H5P is a tool for creating interactive learning content that can be embedded into various learning management systems (LMS) such as Moodle.
While H5P supports a wide range of content types, it is often cumbersome to create structured learning materials such as interactive books.
Often, teaching material already exists in Markdown format, which is a widely used format to develop text annotated with simple markup syntax.
The goal of this project is to develop a tool to convert Markdown files into interactive H5P books.
Assuming that one markdown file corresponds to one chapter of the book, the tool should be able to combine multiple markdown files into a single H5P book content type.
For this, the markdown files have to be parsed and converted into H5P-compatible HTML, while preserving the structure of the original markdown content.
Additional information on H5P content type development can be found here.
-
Windows Support for GraalPy (Java, Python, C, Windows APIs)
Python on Windows provides some additional modules to access Windows-specific APIs including the MSVCRT, the registry,
and the win32 API. GraalPy currently does not offer these. These modules could be implemented in pure Python using the
built-in cffi or ctypes bindings, ported from CPython as a C extension library, or implemented in Java to call into the
Windows APIs via Truffle NFI. The task is to weigh the implementation strategies and choose one to implement enough features
to pass the standard library tests.
-
Automatic Dynamic Optimization of Remembered Sets (HotSpot JVM, C++, Garbage Collection)
Let the G1 collector automatically determine remembered set container options for either reduced memory usage
or improved performance.
The G1 garbage collector is the current default garbage collector in the OpenJDK Hotspot VM. It uses remembered
sets to store locations of incoming references to a particular region of the heap. This data structure is basically an
implementation of a sparse set of integers: the entire range of possible values is split into evenly sized areas. A top
level concurrent hash table stores values in areas that are "in" the set in a so-called remembered set container. Such a
container is represented, depending on the number of values to be stored in that area it covers, by different kinds of data
structures, e.g. arrays, bitmaps, or even single special integers.
The remembered set implementation switches between containers on the fly depending on current remembered set entry occupancy
of an area.
G1 currently sizes these containers statically, i.e. independent of actual distribution of values in a given remembered set.
So a particular container has a fixed size being able to hold a fixed amount of values, eg. an "array" remembered set
container always has 128 entries, regardless of what the typical occupancy of such an array container is. This wastes memory,
because different types of applications (and remembered sets for different areas of the heap) exhibit different occupancy
characteristics.
The task is to change G1 to let it reconfigure the remembered set containers based on statistics that need to be gathered
while an application is running to optimize for the particular goal, and evaluate the effectiveness of these optimizations on
several benchmarks.
-
Memory Aware Liveness Analysis for G1 (HotSpot JVM, C++, Garbage Collection)
Implement a marking algorithm that attempts to be more memory aware.
The G1 garbage collector
is the current default garbage collector in the OpenJDK Hotspot VM. Its algorithm to determine reachable objects is a
straightforward implementation of the Tri-Color abstraction.
One drawback of this algorithm is that memory access is very random, which is not optimal for current microarchitectures with regards to to memory accesses.
There is an old algorithm that tends to linearize memory accesses for large object graphs, potentially reducing
the amount of random memory accesses, and so improving performance. Recently, a similar approach has been implemented in Go's
Green Tea garbage collector with interesting, but maybe mixed, results.
The task for this work comprises:
- Implement a more memory access conscious algorithm (based on these precedents) in the G1 garbage collector
- Compare its performance, memory access behavior and memory consumption to the existing algorithm on benchmarks.
-
Improve the JVMTI Support in GraalVM Native Image (GraalVM Native Image)
The Java Virtual Machine Tool Interface (JVMTI) [1] enables tools that are written in languages like C or C++ to inspect and control the execution of Java applications. Currently, GraalVM [2] Native Image only supports basic JVMTI functions [3], while more complex functions are not implemented and throw runtime errors instead.
The main prerequisite for implementing more advanced JVMTI functionality is to fundamentally change how JNI [4] object handles [5] are implemented in GraalVM Native Image. The current implementation allocates Java heap memory, which is a fundamental problem for non-trivial JVMTI functions.
The scope of this project includes:
- Modifying the existing JNI object handle implementation so that native memory is used instead of Java heap memory.
- Implementing several non-trivial JVMTI functions.
- Testing and benchmarking the changes.
- Contributing the approach to the open-source repository [6] (requires signing the OCA [7]).
GraalVM Native Image is primarily implemented in Java, so Java is also the main programming language that will be used in this project. However, basic C or C++ knowledge is recommended, as there is an existing C/C++ JVMTI implementation in the OpenJDK [8] that can be used as a reference.
- [1] https://docs.oracle.com/en/java/javase/23/docs/specs/jvmti.html
- [2] https://www.graalvm.org/
- [3] https://github.com/oracle/graal/blob/master/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jvmti/JvmtiFunctions.java
- [4] https://docs.oracle.com/en/java/javase/23/docs/specs/jni/index.html
- [5] https://github.com/oracle/graal/blob/master/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/handles/ObjectHandlesImpl.java
- [6] https://github.com/oracle/graal
- [7] https://oca.opensource.oracle.com
- [8] https://github.com/openjdk/jdk
-
Add a Validation Mechanism to the GraalVM Native Image JNI implementation (GraalVM Native Image)
The Java Native Interface (JNI) [1] enables interaction between Java code and applications/libraries written in other languages such as C and C++. Due to the lack of comprehensive argument validation, JNI code is prone to errors, often resulting in hard-to-diagnose crashes and undefined behavior. The goal of this project is to improve the JNI implementation [2] in GraalVM [3] Native Image by incorporating basic validation mechanisms (similar to the -Xcheck:jni option [4] in OpenJDK [5]), thus improving the robustness and debugability of JNI interactions.
The scope of this project includes:
- Adding a validation mechanism to the existing JNI implementation.
- Testing and benchmarking the changes.
- Contributing the approach to the open-source repository [6] (requires signing the OCA [7]).
GraalVM Native Image is primarily implemented in Java, so Java is also the main programming language that will be used in this project. However, basic C or C++ knowledge is recommended, as there is an existing C/C++ JNI implementation in the OpenJDK that can be used as a reference.
- [1] https://docs.oracle.com/en/java/javase/23/docs/specs/jni/index.html
- [2] https://github.com/oracle/graal/blob/master/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jni/functions/JNIFunctions.java
- [3] https://www.graalvm.org/
- [4] https://docs.oracle.com/en/java/javase/23/troubleshoot/command-line-options1.html#GUID-DE9FAAAF-DCD4-4974-A86F-C6B8907CCE9A__CHDDEGBI
- [5] https://github.com/openjdk/jdk
- [6] https://github.com/oracle/graal
- [7] https://oca.opensource.oracle.com
-
Designing a music-based programming language (Compilers, Music)
Contact: DI Christoph Pichler
While most programming languages are text-based (i.e. the source code is written in textual form), some so-called esoteric programming languages [1] use different representations such as images [2] or theatre plays [3].
This offers the possibility to "hide" programs in media.
The goal of this master thesis is to develop a programming language based on music:
Programs of that language are not represented as text, but represent some kind of music (e.g. audio stored in the MIDI file format [4]) and can thus be listened to as well.
Writing programs could be possible e.g. in a piano roll editor [5] or any notation software (e.g. MuseScore [6]).
Valid programs should be "nice to listen to".
Besides basic knowledge in compiler construction, this thesis therefore also requires background in music theory − "chords" and "harmonies" should be familiar terms.
The tasks are (a) to design a simple programming language which is based on music/audio information, and (b) to implement a simple (e.g. stack-based) interpreter for such a language.
|