Visual Computing Seminar (Spring 2020)
The Visual computing seminar is a weekly seminar series on a variety of topics in the broader area of Visual Computing.
Why: The motivation for creating this seminar is that EPFL has a critical mass of people who are working on subtly related topics in computational photography, computer graphics, geometry processing, human–computer interaction, computer vision and signal processing. Having a weekly point of interaction will provide exposure to interesting work in this area and increase awareness of our shared interests and other commonalities like the use of similar computational tools — think of this as the visual computing edition of the “Know thy neighbor” seminar series.
Who: The target audience are faculty, students and postdocs in the visual computing disciplines, but the seminar is open to anyone and guests are welcomed. There is no need to formally enroll in a course. The format is very flexible and will include 45 minute talks with Q&A, talks by external visitors, as well as shorter presentations. In particular, the seminar is also intended as a way for students to obtain feedback on shorter ~20min talks preceding a presentation at a conference. If you are a student or postdoc in one of the visual computing disciplines, you’ll probably receive email from me soon on scheduling a presentation.
Where and when: every Friday in BC410. Food is served at 11:50, and the actual talk starts at 12:15.
How to be notified: If you want to be kept up to date with announcements, please send me an email and I’ll put you on the list. If you are working in LCAV, CVLAB, IVRL, LGG, LSP, IIG, CHILI, LDM or RGL, you are automatically subscribed to future announcements, so there is nothing you need to do.
You may add the seminar events to Google Calendar (click the '+' button in the bottom-right corner), or download the iCal file.
Title: Computational Design of Metamaterials and Deployable Structures [practice job talk]
Title: Shape Reconstruction by Learning Differentiable Surface Representations
Abstract: Generative models that produce point clouds have emerged as a powerful tool to represent 3D surfaces, and the best current ones rely on learning an ensemble of parametric representations. Unfortunately, they offer no control over the deformations of the surface patches that form the ensemble and thus fail to prevent them from either overlapping or collapsing into single points or lines. As a consequence, computing shape properties such as surface normals and curvatures becomes difficult and unreliable. In this paper, we show that we can exploit the inherent differentiability of deep networks to leverage differential surface properties during training so as to prevent patch collapse and strongly reduce patch overlap. Furthermore, this lets us reliably compute quantities such as surface normals and curvatures. We will demonstrate on several tasks that this yields more accurate surface reconstructions than the state-of-the-art methods in terms of normals estimation and amount of collapsed and overlapped patches.
Title: Perception in the Action Loop
Abstract: Artificial Intelligence seeks agents that can perceive the world and act accordingly. Despite remarkable progress toward this goal, a fundamental shortcoming persists on the perception front: difficulty in scaling to the complexity of the real world, and consequently, reducing the operation domain to perceptually simplified ones (e.g. video games, controlled spaces, tabletop manipulation scenarios). I’ll talk about efforts toward a visual perception that could ultimately scale to real-world complexity and support the goals of active agents by going beyond isolated pattern recognition problems.
I’ll present a method for tractably learning a large set of perception tasks using transfer learning (Taskonomy), toward forming a multi-task compositional perception dictionary. I’ll show this dictionary can be turned into an intermediate perception module for active robotic agents (Mid-Level Vision), enabling them to improve their sample efficiency and generalization. This is accomplished using both real robots as well as a virtual environment rooted in real spaces (Gibson Environment). I will conclude with discussing cross-task consistency and quantifying uncertainty in perceptual estimations (X-TaC).
Title: 3D surface reconstruction from image(s)
Abstract: Using convolutional neural networks for single-view reconstruction has recently become a promising and trending topic in geometrical deep learning. Presented with an image of a shape, the task is to accurately reconstruct the visible parts of the object, and to plausibly hallucinate its unseen portion relying on a learned prior. Typical architectures comprise a CNN encoder and a decoder. Image encoders map a 2D view to a vectorized latent space, while the decoders map a latent vector to a 3D shape (in the form of a point cloud, a mesh deformation, the zero-crossing of an implicit function…). To perform well under different viewpoints, the whole architecture has to implicitly learn non trivial 3D geometric manipulations. This has been seen as a very limiting factor for generalization to unseen shapes and poses. We propose to construct a registered 3D latent space, using reverse camera projections. A latent vector consists of a 3D grid aligned with the output object. 2D feature maps and depth maps are pushed to 3D space, and the network is relieved of the burden of localizing spatial features in 3D space. This also allows to geometrically fuse codes in the latent space and more accurately reconstruct a surface from multiple views. Our second contribution is a hybrid 3D decoder, relying both on voxels and point clouds. A relatively coarse grid of occupancy voxels first predicts a low resolution approximation of the desired surface. Then, within each activated voxel, a 2D patch is differentiably folded to capture higher frequency details and smoother curvatures. This hybrid solution exploits both the good spatialization of 3D convolutions and the sparsity of point clouds.
Title: Generative models for solving inverse design problems
Abstract: We present a framework that trains a deep generative model to provide multifarious solutions to a given inverse problem. Inverse problems appear in several engineering tasks, where one tries to answer the question “How can I design a structure (usually this implies a choice of parameters), to achieve a certain target performance?”. Currently, we study this on the example of deployable beam networks (X-Shells), which are fabricated in a flat configuration, but can unfold to a three dimensional shape. We propose a variant of the Generative Adversarial Nets (GAN) framework wherein the generator is trained to output high quality creations (e.g. X-Shells) with respect to a given performance measure (e.g. deployability, flatness in fabrication state, “beauty” of the deployed shape, ...). Since we have limited insight in how to create examples of good X-Shells, our training data samples the space of such structures insufficiently. Thus, we adapt our training framework to rely only on forward simulation of generated structures and drop the use of a dataset. The first version of this framework is highly prone to severe mode-collapse, which is why we introduced a novel diversity regularization term into the loss. In addition to encouraging diversity among the creations, this term seems to stabilize training for GAN. We demonstrate these results on a toy example, as the use of this framework for X-Shells is still a work in progress.