I am a PhD student in the Autonomous Vision Group of Andreas Geiger at the University of Tübingen and the MPI for Intelligent Systems in Tübingen as well as at the Robert Bosch GmbH in Leonberg, where I am working under the supervision of Alexandru Condurache in the Driver Assistance Systems Department.
The topic of my PhD is "Efficient Invariant Deep Models for Computer Vision". Convolutional neural networks (CNNs) have revolutionized the field of computer vision and currently offer state-of-the-art performance for many computer vision tasks, including image classification, object detection and semantic segmentation. However, their success comes at the cost of large annotated datasets, which take a considerable effort to create. Therefore, I am investigating how the data efficiency of deep neural networks can be improved by learning invariance/equivariance or directly encoding it into deep neural network architectures.
2014 - 2016 MSc Computer Science, KTH Royal Institute of Technology
2010 - 2013 BSc Media Computer Science, Stuttgart Media University
2016 - 2016 Master Thesis Intern, Bosch
2015 - 2015 Data Mining Intern, Bosch North America
Palo Alto, CA, USA
2013 - 2014 HMI Research Assistant, Bosch North America
In 2019 International Conference on 3D Vision (3DV), 2019 International Conference on 3D Vision (3DV), 2019 (inproceedings)
Domain adaptation techniques enable the re-use and transfer of existing labeled datasets from a source to a target domain in which little or no labeled data exists. Recently, image-level domain adaptation approaches have demonstrated impressive results in adapting from synthetic to real-world environments by translating source images to the style of a target domain. However, the domain gap between source and target may not only be caused by a different style but also by a change in viewpoint. This case necessitates a semantically consistent translation of source images and labels to the style and viewpoint of the target domain. In this work, we propose the Novel Viewpoint Adaptation (NoVA) model, which enables unsupervised adaptation to a novel viewpoint in a target domain for which no labeled data is available. NoVA utilizes an explicit representation of the 3D scene geometry to translate source view images and labels to the target view. Experiments on adaptation to synthetic and real-world datasets show the benefit of NoVA compared to state-of-the-art domain adaptation approaches on the task of semantic segmentation.
European Conference on Computer Vision (ECCV), September 2018 (conference)
Omnidirectional cameras offer great benefits over classical cameras wherever a wide field of view is essential, such as in virtual reality applications or in autonomous robots. Unfortunately, standard convolutional neural networks are not well suited for this scenario as the natural projection surface is a sphere which cannot be unwrapped to a plane without introducing significant distortions, particularly in the polar regions. In this work, we present SphereNet, a novel deep learning framework which encodes invariance against such distortions explicitly into convolutional neural networks. Towards this goal, SphereNet adapts the sampling locations of the convolutional filters, effectively reversing distortions, and wraps the filters around the sphere. By building on regular convolutions, SphereNet enables the transfer of existing perspective convolutional neural network models to the omnidirectional case. We demonstrate the effectiveness of our method on the tasks of image classification and object detection, exploiting two newly created semi-synthetic and real-world omnidirectional datasets.
In International Conference on Computer Vision Theory and Applications, International Conference on Computer Vision Theory and Applications, 2018 (inproceedings)
Deep convolutional neural networks are the current state-of-the-art solution to many computer vision tasks. However, their ability to handle large global and local image transformations is limited. Consequently, extensive data augmentation is often utilized to incorporate prior knowledge about desired invariances to geometric transformations such as rotations or scale changes. In this work, we combine data augmentation with an unsupervised loss which enforces similarity between the predictions of augmented copies of an input sample. Our loss acts as an effective regularizer which facilitates the learning of transformation invariant representations. We investigate the effectiveness of the proposed similarity loss on rotated MNIST and the German Traffic Sign Recognition Benchmark (GTSRB) in the context of different classification models including ladder networks. Our experiments demonstrate improvements with respect to the standard data augmentation approach for supervised and semi-supervised learning tasks, in particular in the presence of little annotated data. In addition, we analyze the performance of the proposed approach with respect to its hyperparameters, including the strength of the regularization as well as the layer where representation similarity is enforced.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems