Visual Computing Seminar

Visual Computing Seminar (Fall 2018)

BC 01

Wednesday
Food @ 11:50am,
Talk @ 12:15am

Delio Vicini
Organizer

General information

The Visual computing seminar is a weekly seminar series on topics in Visual Computing.

Why: The motivation for creating this seminar is that EPFL has a critical mass of people who are working on subtly related topics in computational photography, computer graphics, geometry processing, human–computer interaction, computer vision and signal processing. Having a weekly point of interaction will provide exposure to interesting work in this area and increase awareness of our shared interests and other commonalities like the use of similar computational tools — think of this as the visual computing edition of the “Know thy neighbor” seminar series.

Who: The target audience are faculty, students and postdocs in the visual computing disciplines, but the seminar is open to anyone and guests are welcomed. There is no need to formally enroll in a course. The format is very flexible and will include 45 minute talks with Q&A, talks by external visitors, as well as shorter presentations. In particular, the seminar is also intended as a way for students to obtain feedback on shorter ~20min talks preceding a presentation at a conference. If you are a student or postdoc in one of the visual computing disciplines, you’ll probably receive email from me soon on scheduling a presentation.

Where and when: every Wednesday in BC01 (note the changed location!). Food is served at 11:50, and the actual talk starts at 12:15.

How to be notified: If you want to be kept up to date with announcements, please send me an email and I’ll put you on the list. If you are working in LCAV, CVLAB, IVRL, LGG, LSP, IIG, CHILI, LDM or RGL, you are automatically subscribed to future announcements, so there is nothing you need to do.
You may add the seminar events to Google Calendar (click the '+' button in the bottom-right corner), or download the iCal file.

Schedule

Date	Lecturer	Contents
19.09.2018	Delio Vicini
03.10.2018	Cyrille Favreau	Blue Brain Brayns, A platform for high fidelity large-scale and interactive visualization of scientific data and brain structures The Blue Brain Project has made major efforts to create morphologically accurate neurons to simulate sub-cellular and electrical activities, for example, molecular simulations of neuron biochemistry or multi-scale simulations of neuronal function. One of the keys towards understanding how the brain works as a whole, is visualization of how the individual cells function. In particular, the more morphologically accurate the visualization can be, the easier it is for experts in the biological field to validate cell structures; photo-realistic rendering is therefore important. Brayns is a visualization platform that can interactively perform high-quality and high-fidelity rendering of neuroscience large data sets. Thanks to its client/server architecture, Brayns can be run in the cloud as well as on a supercomputer, and stream the rendering to any browser, either in a web UI or a Jupyter notebook. At the Blue Brain project, the Visualization team makes intensive use of Blue Brain Brayns to produce ultra-high resolution movies (8K) and high-fidelity images for scientific publications. Brayns is also used to serve immersive visualization on the large displays, as well as unique devices such as the curved OpenDeck located at the Blue Brain office. Brayns is also designed to accelerate scientific visualization, and to adapt to the large number of environments. Thanks to its modular architecture, Brayns makes it easy to use various rendering back-ends such as Intel's OSPRay (CPU) or NVIDIA's OptiX for example. Every scientific use-case such as DICOM, DTI, Blue Brain research, etc, is a standalone plug-in that runs on top of Brayns, allowing scientists and researches to benefit from a high performance/fidelity/quality rendering system, without having to deal with the technical complexity of it. Brayns currently implements an number of basic primitives such as meshes, volumes, point clouds, parametric geometries, and pioneers new rendering modalities for scientific visualization, like signed distance fields. During this talk, I will explain the motivations behind the creation of the Brayns platform, give some technical insight about the architecture of the system and the various techniques that we already use to render datasets. I will also describe how new datasets, as well as rendering components (engines, shaders, materials, etc), can be added to the platform. Links: https://github.com/BlueBrain/Brayns
10.10.2018	Kaicheng Yu	Overcoming neural brainwashing We identify a phenomenon, which we dub neural brainwashing, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a subsequent one, due to the overwriting of shared parameters. To overcome this, we introduce a statistically-justified weight plasticity loss that regularizes the learning of a model's shared parameters according to their importance for the previous models, and demonstrate its effectiveness when training two models sequentially and for neural architecture search. Adding weight plasticity in neural architecture search preserves the best models to the end of the search process leading to improved results in both natural language processing and computer vision tasks.
17.10.2018	Erhan Gündogdu	Good Features to Correlate for Visual Tracking In this talk, I will mainly talk about visual object tracking problem which I worked on during my Ph.D. studies. As a secondary topic, I will mention my recent research activities about garment virtualization by deep learning. Visual Tracking Estimating object motion is one of the key components of video processing and the first step in applications which require video representation. Visual object tracking is one way of extracting this component, and it is one of the major problems in the field of computer vision. Numerous discriminative and generative machine learning approaches have been employed to solve this problem. Recently, correlation filter based (CFB) approaches have been popular due to their computational efficiency and notable performances on benchmark datasets. The ultimate goal of CFB approaches is to find a filter (i.e., template) which can produce high correlation outputs around the actual object location and low correlation outputs around the locations that are far from the object. Nevertheless, CFB visual tracking methods suffer from many challenges, such as occlusion, abrupt appearance changes, fast motion and object deformation. The main reasons of these sufferings are forgetting the past poses of the objects due to the simple update stages of CFB methods, non-optimal model update rate and features that are not invariant to appearance changes of the target object. To address the aforementioned disadvantages of CFB visual tracking methods, this work includes three major contributions. First, a spatial window learning method is proposed to improve the correlation quality. For this purpose, a window that is to be element-wise multiplied by the object observation (or the correlation filter) is learned by a novel gradient descent procedure. The learned window is capable of suppressing/highlighting the necessary regions of the object, and can improve the tracking performance in the case of occlusions and object deformation. As the second contribution, an ensemble of trackers algorithm is proposed to handle the issues of non-optimal learning rate and forgetting the past poses of the object. The trackers in the ensemble are organized in a binary tree, which stores individual expert trackers at its nodes. During the course of tracking, the relevant expert trackers to the most recent object appearance are activated and utilized in the localization and update stages. The proposed ensemble method significantly improves the tracking accuracy, especially when the expert trackers are selected as the CFB trackers utilizing the proposed window learning method. The final contribution of this work addresses the feature learning problem specifically focused on the CFB visual tracking loss function. For this loss function, a novel backpropagation algorithm is developed to train any fully deep convolutional neural network. The proposed gradient calculation, which is required for backpropagation, is performed efficiently in both frequency and image domain, and has a linear complexity with the number of feature maps. The training of the network model is fulfilled on carefully curated datasets including well-known difficulties of visual tracking, e.g., occlusion, object deformation and fast motion. When the learned features are integrated to the state-of-the-art CFB visual trackers, favourable tracking performance is obtained on benchmark datasets against the CFB methods that employ hand-crafted features or deep features extracted from the pre-trained classification models. Garment simulation by deep learning Garment simulation is a useful tool for virtual try-on, online shopping, gaming industry, virtual reality and so forth. Realistic simulation of garments on different body shapes and poses by the help of a physically-based simulation (PBS) is a computationally heavy task which requires special parameter tuning for different body shapes and motion types. Hence, data-driven methods that model PBS approaches for the fitted garments on target bodies are preferable for both computational concerns and generalization purposes. Concretely, a PBS approach with non-optimal parametrization can output simulation results with undesirable cloth-body interpenetration. However, a data-driven model such as a deep neural network can be trained by considering additional loss terms which will prevent interpenetration. Our method presents a solution for 3D garment fitting on different target body shapes and poses without any post processing step such as cloth-body interpenetration, tightness, smoothing which are required for PBS tools such as NvCloth. For the foreseeable mistakes of the learned model, the constraints are included in the training loss function of the proposed network model. Hence, the network model seamlessly predicts the fitted garment given the input template garment and the target body in a certain pose.
24.10.2018	Krishna Kanth Nakka	Deep Attentional Structured Representation Learning for Visual Recognition Structured representations, such as Bags of Words, VLAD and Fisher Vectors, have proven highly effective to tackle complex visual recognition tasks. As such, they have recently been incorporated into deep architectures. However, while effective, the resulting deep structured representation learning strategies typically aggregate local features from the entire image, ignoring the fact that, in complex recognition tasks, some regions provide much more discriminative information than others. In this work, we introduce an attentional structured representation learning framework that incorporates an image-specific attention mechanism within the aggregation process. Our framework learns to predict jointly the image class label and an attention map in an end-to-end fashion and without any other supervision than the target label. As evidenced by our experiments, this consistently outperforms attention-less structured representation learning and yields state-of-the-art results on standard scene recognition and fine-grained categorization benchmarks.
31.10.2018	Kevin Gonyop Kim	Expanding experience of the learners in vocational education Vocational education and training (VET) that takes place in dual contexts of school and workplace is a well-established secondary education system in Switzerland. Although it is known as an effective system for developing vocational competence, there exists some gap between what they are supposed to learn and what they practice at workplaces. The workplace experiences are usually limited to concrete situations in particular environments and their connections to the general knowledge learned from schools are often weak. The central hypothesis of the Dual-T project is that digital technologies can serve as “bridges” over this school-workplace gap and enhance the learning experiences of the learners. The goal of my research in this project is to design a way to expand the workplace experience so that the learner can explore broader space of practice. It is an exploratory research on how the learners in VET accept the socially and synthetically expanded experiences and how they explore them. In this presentation, I will present some of the ongoing applications to florist and gardener apprentices as well as our previous work on logistics and carpentry.
07.11.2018	Peng Song	DESIA: A General Framework for Designing Interlocking Assemblies Interlocking assemblies have a long history in the design of puzzles, furniture, architecture, and other complex geometric structures. The key defining property of interlocking assemblies is that all component parts are immobilized by their geometric arrangement, preventing the assembly from falling apart. Computer graphics research has recently contributed design tools that allow creating new interlocking assemblies. However, these tools focus on specific kinds of assemblies and explore only a limited space of interlocking configurations, which restricts their applicability for design. In this talk, we present a new general framework for designing interlocking assemblies. The core idea is to represent part relationships with a family of base Directional Blocking Graphs and leverage efficient graph analysis tools to compute an interlocking arrangement of parts. This avoids the exponential complexity of brute-force search. Our algorithm iteratively constructs the geometry of assembly components, taking advantage of all existing blocking relations for constructing successive parts. As a result, our approach supports a wider range of assembly forms compared to previous methods and provides significantly more design flexibility. We show that our framework facilitates efficient design of complex interlocking assemblies, including new solutions that cannot be achieved by state of the art approaches.
14.11.2018	Edoardo Remelli	Deep Shape Optimisation Aerodynamic shape optimization has many industrial applications. Existing methods, however, are so computationally demanding that typical engineering practices are to either simply try a limited number of hand-designed shapes or restrict oneself to shapes that can be parameterized using only few degrees of freedom. In this work, we introduce a new way to optimize complex shapes fast and accurately. To this end, we train Geodesic Convolutional Neural Networks to emulate a fluid dynamics simulator. The key to making this approach practical is remeshing the original shape using a polycube map, which makes it possible to perform the computations on GPUs instead of CPUs. The neural net is then used to formulate an objective function that is differentiable with respect to the shape parameters, which can then be optimised using a gradient-based technique. This outperforms state-of-the-art methods by 5 to 20% for standard problems and, even more importantly, our approach applies to cases that previous methods cannot handle.
21.11.2018	Wenzel Jakob	Capturing and rendering the world of materials One of the key ingredients of any realistic rendering system is a description of the way in which light interacts with objects, typically modeled via the Bidirectional Reflectance Distribution Function (BRDF). Unfortunately, real-world BRDF data remains extremely scarce due to the difficulty of acquiring it: a BRDF measurement requires scanning a four-dimensional domain at high resolution—an infeasibly time-consuming process. In this talk, I'll showcase our ongoing work on assembling a large library of materials including including metals, fabrics, organic substances like wood or plant leaves, etc. The key idea to work around the curse of dimensionality is an adaptive parameterization, which automatically warps the 4D space so that most of the volume maps to “interesting” regions. Starting with a review of BRDF models and microfacet theory, I'll explain the new model, as well as the optical measurement apparatus that we used to conduct the measurements.
28.11.2018	Jan Bednarík	Deformable surface reconstruction from a single view Recent years have seen the development of mature solutions for reconstructing deformable surfaces from a single image, provided that they are relatively well-textured. By contrast, recovering the 3D shape of texture-less surfaces remains an open problem, and essentially relates to Shape-from-Shading. In this paper, we introduce a data-driven approach to this problem. We introduce a general framework that can predict diverse 3D representations, such as meshes, normals, and depth maps. Our experiments show that meshes are ill-suited to handle texture-less 3D reconstruction in our context. Furthermore, we demonstrate that our approach generalizes well to unseen objects, and that it yields higher-quality reconstructions than a state-of-the-art SfS technique, particularly in terms of normal estimates. Our reconstructions accurately model the ﬁne details of the surfaces, such as the creases of a T-Shirt worn by a person. Since the 3D shape reconstruction from a single view is known to be subject to ambiguities due to the fact that various combinations of shape, lighting and material result in the same 2D observation, we further explore the possibilities to predict not just a single but multiple likely shapes and we look into learning a better shape representation.
05.12.2018	Zheng Dang	Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses Motion and pose estimation from 3D to 2D correspondences, can be solved by finding the eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix representing a linear system. Incorporating this in deep learning frameworks would allow us to explicitly encode known notions of geometry, instead of having the network implicitly learn them from data. However, performing eigendecomposition within a network requires the ability to differentiate this operation. Unfortunately, while theoretically doable, this introduces numerical instability in the optimization process in practice. In this paper, we introduce an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network. We demonstrate on several tasks, including keypoint matching and 3D pose estimation, that our approach is much more robust than explicit differentiation of the eigendecomposition, It has better convergence properties and yields state-of-the-art results on both tasks.
12.12.2018	Sena Kiciroglu	Active Drone Based Human Pose Estimation Reconstruction of 3D human pose has become a widely studied direction of research, with a recent interest in outdoor and drone-based capture. The rising popularity of commercially available drones with on-board cameras makes this form of motion capture (MoCap) accessible to the consumer market. However, little work has been done on controlling the drone to maximize reconstruction accuracy. Existing drone-based MoCap solutions use pre-defined controllers, such as following the person in a constant angle, or at constant rotation. On the other hand, the robotics literature mostly covers the active reconstruction of static scenes. Our goal is to actively re-position a drone during MoCap so that it moves to the position where it will have the highest accuracy. Key to our method is to estimate the expected reconstruction uncertainty in the presence of dynamic motion. Our goal in the end is to show that our active motion planning improves the pose estimation results by comparing it against several baseline policies.
19.12.2018	Helge Rhodin	Neural Scene Decomposition for Human Motion Capture Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet. However, when it comes to 3D reconstruction, those features learned on ImageNet are only of limited use. We therefore propose an approach to learning representations that are useful for this purpose. To this end, we introduce a self-supervised approach to learning what we call a neural scene decomposition (NSD) that can be exploited for 3D pose estimation. NSD comprises three layers of abstraction to represent human subjects: A bounding-box; a 2D shape representation in terms of an instance segmentation mask; and subject-specific appearance and 3D pose information. Our NSD model can be trained end-to-end without any 2D or 3D supervision by exploiting self-supervision coming from multiview data. Because it encodes 3D geometry, NSD can then be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data. NSD is also well suited for CG applications, such as the seamless transition between two video perspectives and novel view synthesis.