VAE | CogSys Project Descriptions

Blind Non-linear Equalization Using Variational Autoencoders

Background In digital communication the goal is to send information, usually represented by bits, from A (transmitter, Tx) to B (receiver, Rx). At some point in this process, the bits “meet” the physical world in the form of a channel. In optical communication, light from a laser is used to carry the information that travels through an optical fiber and is then detected at receiver using a photodiode. However, the optical fiber channel does not perfectly pass on the light as it will be attenuated and distorted the longer the light travels. ...

Bayesian VAEs - Linearized Laplace Approximation - Can B-VAEs generate meaningful examples from its latent space?

Background Variational Auto-Encoders (VAEs) are useful models for learning non-linear latent representations of data. Usually, VAEs are learned by obtaining an approximation of the maximum likelihood estimate of the parameters through the evidence lower bound. In VAEs Bayesian Variational Auto-Encoders (B-VAEs), we obtain, instead, a posterior distribution of the parameters, leading to a more robust model, e.g., for out of distribution detection (Daxberger and Hernández-Lobato, 2019). Bayesian VAEs are typically trained by learning a variational posterior approximation of both the latent variable and the model parameters. ...

Explainability of Multimodal Models

Background A multimodal model is any model that takes in one or more different modalities of data. This could be text and images, image and audio ect. An example of this is the CLIP model [1], that learns embeddings of text and images simultaneously. Common methods for explainability in general focus on a single modality e.g. what part of an image was important for a given prediction. The purpose of this project is to investigate how methods for single modality explainability can be extended to multimodality data. ...

Guided representation learning of medical signals using textual (meta) data

Background When designing and training machine learning architectures for medical data, we often neglect prior knowledge about the data domain. In this project, we want to investigate whether we can find shared information between medical signals and their corresponding textual descriptions. Current advances in the field of NLP have made it possible to learn rich contextual information from textual data. Given cross-modality observations, Relative Representations make it possible to compare and align latent spaces of different modalities or models. ...

Learning Data Augmentation

Background Data augmentation is used to increase the diversity of a dataset by applying various transformations to the existing data, which helps improve model generalization, performance, and robustness while reducing the risk of overfitting. For this project, you will begin by surveying existing methods and metrics. Afterwards, you will focus on creating new methods for learning a data augmentation scheme in order to optimize one or more selected metrics. You will evaluate your methods using well-established benchmark datasets in a selected data domain and task, which could be images, time series, or molecular graph data, e.g. for classification. ...

Unsupervised Speech Enhancement

Background Speech enhancement is the task of recovering clean speech from noisy speech that has been degraded due to e.g. background noise, interfering speakers or reverberation in poor acoustic conditions. Deep learning-based speech enhancement is typically trained in a supervised setup using datasets consisting of separate clean speech and background noise samples, which are combined at training time and the network is then trained to directly recover the clean speech from the artificially corrupted mixture. This artificial mixing step limits the diversity of data that models are exposed to and harms the resulting models’ ability to generalize. ...

Unsupervised Speech Enhancement

Background Speech enhancement is the task of recovering clean speech from noisy speech that has been degraded due to e.g. background noise, interfering speakers or reverberation in poor acoustic conditions. Deep learning-based speech enhancement is typically trained in a supervised setup using datasets consisting of separate clean speech and background noise samples, which are combined at training time and the network is then trained to directly recover the clean speech from the artificially corrupted mixture. This artificial mixing step limits the diversity of data that models are exposed to and harms the resulting model’s ability to generalize to real-world conditions. ...