Visual Computing Seminar (Spring 2019)
The Visual computing seminar is a weekly seminar series on topics in Visual Computing.
Why: The motivation for creating this seminar is that EPFL has a critical mass of people who are working on subtly related topics in computational photography, computer graphics, geometry processing, human–computer interaction, computer vision and signal processing. Having a weekly point of interaction will provide exposure to interesting work in this area and increase awareness of our shared interests and other commonalities like the use of similar computational tools — think of this as the visual computing edition of the “Know thy neighbor” seminar series.
Who: The target audience are faculty, students and postdocs in the visual computing disciplines, but the seminar is open to anyone and guests are welcomed. There is no need to formally enroll in a course. The format is very flexible and will include 45 minute talks with Q&A, talks by external visitors, as well as shorter presentations. In particular, the seminar is also intended as a way for students to obtain feedback on shorter ~20min talks preceding a presentation at a conference. If you are a student or postdoc in one of the visual computing disciplines, you’ll probably receive email from me soon on scheduling a presentation.
Where and when: every Wednesday in BC03 (note the changed location!). Food is served at 11:50, and the actual talk starts at 12:15.
How to be notified: If you want to be kept up to date with announcements, please send me an email and I’ll put you on the list. If you are working in LCAV, CVLAB, IVRL, LGG, LSP, IIG, CHILI, LDM or RGL, you are automatically subscribed to future announcements, so there is nothing you need to do.
You may add the seminar events to Google Calendar (click the '+' button in the bottom-right corner), or download the iCal file.
Title: Light-Transport Simulation with Machine Learning
Abstract: Machine-learning-based techniques have conquered many fields by storm, but, until recently, have seen relatively little usage in physically-based rendering. This has begun to change. In my talk, I will present techniques for accelerating the simulation of light transport with the help of machine learning. I will briefly introduce two projects in which we learn the radiance field permeating volumetric media - grains and atmospheric clouds - and I will go into more detail on another project, in which we learn how to optimally sample a Monte Carlo estimator of the reflection integral. The latter approach connects path tracing algorithms with the field of reinforcement learning and provides a general technique for efficient Monte Carlo estimation using deep neural networks.
Bio: Thomas Müller is a soon-to-be-graduating doctoral student at ETH Zürich & Disney Research, where he also received his Bachelor degree (2014) and Master degree (2016). Thomas' research focuses on the intersection of light-transport simulation and machine learning. His work was featured on the cover of Computer Graphics Forum, won a Best Paper award, led to two patents, and is implemented in production renderers at the Walt Disney and Pixar Animation Studios.
Title: Generative design of multi-stable surfaces
Abstract: A flat surface that can be reconfigured into given 3D target shapes is of great importance in numerous fields at different length scales, e.g. aeronautical systems, architectural installations, and targeted medicine delivery. Such a mechanical system is able to drastically reduce demands on fabrication and transportation, and enable precise controlled deployment. Given an arbitrary target shape, such an inverse problem is typically tackled by discretizing the target shape, then mapping each element to the flat surface. During the mapping process, either the periodicity or the internal properties of the elements are changed. As this is a geometric problem, the system is not necessarily mechanically stable when reconfigured into the target shape, i.e. when the means of reconfiguration is removed, the system will revert back to the flat shape. A method is proposed for the generation of flat surfaces that are able to be reconfigured into a number of target shapes, each of which are mechanically stable. First, the target shapes are discretized using a Chebyshev net. The resulting quadrilateral elements are mapped to a flat surface by accounting for their defects, or excesses in the internal angles. These are then accommodated by the lengthening or shortening of the added diagonal members. By embedding bistable elements into the diagonal members, the length change necessary is achieved while ensuring mechanical stability after the lengths are changed. Using a multi-material 3D printer, this method is demonstrated by fabricating one flat surface that reconfigures into two distinct and stable target shapes. The proposed method serves as a new direction for the design of reconfigurable systems. The combination of such systems with autonomous activation may enable complex, self-reconfiguration of surfaces.
Title: Rendering of specular microstructure using hierarchical sampling
Abstract: Today's state-of-the-art material models used in photorealistic rendering are of very high quality –– but often look too perfect in practice! They are extremely smooth and lack small surface details such as scratches, dents, or other imperfections that we can observe almost everywhere in the real world. Rendering these is a challenging task however because current Monte Carlo based methods require a prohibitively large number of samples to fully resolve the tiny and highly directional specular highlights that occur in these materials.
Title: X-Shells: A New Class of Deployable Beam Structures
Abstract: I will present our work on X-shells, a new class of deployable structures formed by an ensemble of elastically deforming beams coupled through rotational joints. An X-shell can be assembled conveniently in a flat configuration from standard elastic beam elements and then be deployed through expansive force actuation into the desired 3D target state. During deployment, the coupling imposed by the joints will force the beams to twist and buckle out of plane to maintain a static equilibrium state. This complex interaction of discrete joints and continuously deforming beams allows interesting 3D forms to emerge.
Simulating X-shells is challenging due to unstable equilibria occurring at the onset of beam buckling. I will present my simulation framework based on a discrete elastic rods model that robustly handles such difficult scenarios by analyzing and appropriately modifying the elastic energy Hessian. This realtime simulation forms the basis of a computational design tool for X-shells that enables interactive design space exploration by varying and optimizing design parameters to achieve a specific design intent. We jointly optimize the assembly state and the deployed configuration to ensure the geometric and structural integrity of the deployable X-shell. Once a design is finalized, we also optimize for a sparse distribution of actuation forces to efficiently deploy a specific X-shell from its flat assembly state to its 3D target state.
I will demonstrate the effectiveness of our design approach with a number of design studies and physical prototypes that highlight the richness of the X-shell design space, enabling new forms not possible with existing approaches.
Title: Recurrent U-Net for Resource-Constrained Segmentation
Abstract: Real-time segmentation has a wide range of applications. For instance, real-time biomedical image segmentation is a helpful diagnostic tool, and real time egocentric hand segmentation is very critical for mixed reality. Traditional segmentation techniques typically follow a one-shot approach, where the image is passed forward only once through a model that produces a segmentation mask. This strategy, however, usually requies a very deep model which is very time consuming and requies large GPU memory budget. U-Net, as a compact network, is very efficient.
Title: Detecting the Unexpected via Image Resynthesis
Abstract: Classical semantic segmentation methods, including the recent deep learning ones, assume that all classes observed at test time have been seen during training.
Title: Crowd Counting: From Image Plane to Head Plane
Abstract: State-of-the-art methods for counting people in crowded scenes rely on deep networks to estimate crowd density in the image plane. While useful for this purpose, this image- plane density has no immediate physical meaning because it is subject to perspective distortion. This is a concern in sequences acquired by drones because the viewpoint changes often. This distortion is usually handled implicitly by either learning scale- invariant features or estimating density in patches of different sizes, neither of which accounts for the fact that scale changes must be consistent over the whole scene.
In this paper, we explicitly model the scale changes and reason in terms of people per square-meter. We show that feeding the perspective model to the network allows us to enforce global scale consistency and that this model can be obtained on the fly from the drone sensors. In addition, it also enables us to enforce physically-inspired temporal consistency constraints that do not have to be learned. This yields an algorithm that outperforms state-of-the-art methods in inferring crowd density from a moving drone camera especially when perspective effects are strong.
Title: Segmentation-driven 6D Object Pose Estimation
Abstract: The most recent trend in estimating the 6D pose of rigid objects has been to train deep networks to either directly regress the pose from the image or to predict the 2D locations of 3D keypoints, from which the pose can be obtained using a PnP algorithm. In both cases, the object is treated as a global entity, and a single pose estimate is computed. As a consequence, the resulting techniques can be vulnerable to large occlusions.
Title: Evaluating the Search Phase of Neural Architecture Search
Abstract: Neural Architecture Search (NAS) aims to facilitate the design of deep networks for new tasks. Existing techniques rely on two stages: searching over the architecture space and validating the best architecture. Evaluating NAS algorithms is currently solely done by comparing their results on the downstream task. While intuitive, this fails to explicitly evaluate the effectiveness of their search strategies.
In this paper, we extend the NAS evaluation procedure to include the search phase. To this end, we compare the quality of the solutions obtained by NAS search policies with that of random architecture selection. We ﬁnd that: (i) On average, the random policy outperforms state-of-the-art NAS algorithms; and (ii) The results and candidate rankings of NAS algorithms do not reﬂect the true performance of the candidate architectures. While our former ﬁnding illustrates the fact that the NAS search space has been sufﬁciently constrained so that random solutions yield good results, we trace the latter back to the weight sharing strategy used by state-of-the-art NAS methods. In contrast with common belief, weight sharing negatively impacts the training of good architectures, thus reducing the effectiveness of the search process. We believe that following our evaluation framework will be key to designing NAS strategies that truly discover superior architectures.
Title: Denoising Deep Monte Carlo Renderings
Abstract: We present a novel algorithm to denoise deep Monte Carlo renderings, in which pixels contain multiple color values, each for a different range of depths. Deep images are a more expressive representation of the scene than conventional flat images. However, since each depth bin receives only a fraction of the flat pixel's samples, denoising the bins is harder due to the less accurate mean and variance estimates. Furthermore, deep images lack a regular structure in depth—the number of depth bins and their depth ranges vary across pixels. This prevents a straightforward application of patch-based distance metrics frequently used to improve the robustness of existing denoising filters. We address these constraints by combining a flat image-space Non-Local Means filter operating on pixel colors with a deep cross-bilateral filter operating on auxiliary features (albedo, normal, etc.). Our approach significantly reduces noise in deep images while preserving their structure. To our best knowledge, our algorithm is the first to enable efficient deep-compositing workflows with denoised Monte Carlo renderings. We demonstrate the performance of our filter on a range of scenes highlighting the challenges and advantages of denoising deep images.
Title: Hyper-Reduced Projective Dynamics
Abstract: Hyper-Reduced Projective Dynamics is a framework for the real-time simulation of elastic deformable bodies. It combines the efficient Projective Dynamics method [Bouaziz et al. 2014] with a model reduction approach, that will allow for the simulation of meshes of arbitrary resolution. To achieve this, we restrict the unknowns to a subspace and estimate the non-linear terms through a novel approximation approach. I will provide a short introduction to physical simulations in computer graphics, motivate the Projective Dynamics method, detail our model reduction layers and, of course, show some results.
Title: Generative models for Point Sets & Point Set Differentiable Rendering
Abstract: We present a method for learning to generate the surface of 3D shapes via point sets.
Title: Repurposing supervised models for new visual domains
Abstract: Training supervised machine learning models for inference on new visual domains requires annotations, which in most cases are difficult to obtain. Domain Adaptation techniques attempt to relax this need, by either leveraging annotated data from different domains or repurposing models trained on data that differs from the new domain. In this talk, we explore two ideas of how to achieve this: by exploiting local similarities between visual domains, and by learning to selectively share layers from pre-trained networks that best relate to the new visual domain. We offer experimental evidence that both strategies are effective unsupervised domain adaptation techniques for both natural images and biomedical visual data.
Title: Self-supervised Training of Proposal-based Segmentation via Background Prediction
Abstract: While supervised object detection methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this in scenarios where annotating data is prohibitively expensive, we introduce a self-supervised approach to object detection and segmentation, able to work with monocular images captured with a moving camera. At the heart of our approach lies the observation that segmentation and background reconstruction are linked tasks, and the idea that, because we observe a structured scene, background regions can be re-synthesized from their surroundings, whereas regions depicting the object cannot.
We therefore encode this intuition as a self-supervised loss function that we exploit to train a proposal-based segmentation network. To account for the discrete nature of object proposals, we develop a Monte Carlo-based training strategy that allows us to explore the large space of object proposals. Our experiments demonstrate that our approach yields accurate detections and segmentations in images that visually depart from those of standard benchmarks, outperforming existing self-supervised methods and approaching weakly supervised ones that exploit large annotated datasets.