Visual Computing Seminar (Fall 2016)
The Visual computing seminar is a weekly seminar series on topics in Visual Computing.
Why: The motivation for creating this seminar is that EPFL has a critical mass of people who are working on subtly related topics in computational photography, computer graphics, geometry processing, human–computer interaction, computer vision and signal processing. Having a weekly point of interaction will provide exposure to interesting work in this area and increase awareness of our shared interests and other commonalities like the use of similar computational tools — think of this as the visual computing edition of the “Know thy neighbor” seminar series.
Who: The target audience are faculty, students and postdocs in the visual computing disciplines, but the seminar is open to anyone and guests are welcomed. There is no need to formally enroll in a course. The format is very flexible and will include 45 minute talks with Q&A, talks by external visitors, as well as shorter presentations. In particular, the seminar is also intended as a way for students to obtain feedback on shorter ~20min talks preceding a presentation at a conference. If you are a student or postdoc in one of the visual computing disciplines, you’ll probably receive email from me soon on scheduling a presentation.
Where and when: every Wednesday in BC02 (next to the ground floor atrium). Food is served at 11:50, and the actual talk starts at 12:15.
How to be notified: If you want to be kept up to date with announcements, please send me an email and I’ll put you on the list. If you are working in LCAV, CVLAB, IVRL, LGG, LSP, IIG, CHILI, LDM or RGL, you are automatically subscribed to future announcements, so there is nothing you need to do.
Title: Writing Efficient Numerical Code
Abstract: Visual computing disciplines are characterized by an unsatiable hunger for fast floating point computations. In the last decade, a series of fundamental physial limitations has led to major changes in the microarchitecture of today's processors that have made it increasing difficult to fully harness their available numerical computing power. In this informal lecture, I'll discuss some of the implications and ways of writing numerical software that runs efficiently on current and upcoming processor architectures.
Title: Multi-Modal Mean-Fields and clamping.
Abstract: Mean-Field inference is a popular technique, which has recently regained interest in the computer vision community. However, it makes a strong assumption of independence of the variables by using a fully-factorised approximation to the posterior distribution of the graphical model, at the cost of limiting the efficiency of the method.
When correlations in the true posterior are strong, the Mean-Field approximation converges to a local minimum, and therefore tends to model only one mode of the distribution. We design an extension of the clamping method proposed in previous works, which allows us to obtain a Multi-Modal approximation to the posterior distribution, which is richer than the naive Mean-Field one. We also show that our generalisation of the clamping idea unleashes to power of this method for practical applications.
We illustrate, through two practical examples, how this Multi-Modal structured output can be used, for improving pedestrian tracking and proposing diverse outputs in semantic segmentation.
Title: Color and Spectral Information in Computer Vision and Multimedia
Abstract: I will present the tutorial I gave at the joint ECCV/ACM Multimedia tutorial day in Amsterdam a few days ago. I will start off with a brief (and biased) history of colors, talking specifically about trichromatic vision and opponent color, how they are modeled, and how they should (and should not) be used. I then introduce a couple of examples in computer vision, namely saliency and super pixels, followed by some multimedia examples, namely video aesthetics and semantic image enhancements. At the end, I show how we can extend our color models into the near-infrared, and why this is interesting. The research examples shown are from former PhDs and Postdocs of IVRL.
Title: Ways of Machine Seeing
Abstract: This talk will focus on my project with Sabine Suesstrunk and Franco Moretti to analyse Aby Warburg’s Bilderatlas (images-atlas), a kind of early ‘big data’ project of 1920s Art History. The project started in the opposite way to most in computer-science: a clear set of data, but no explicit problem to be solved. Through the ‘operationalisaton’ of Aby Warburg’s concepts - their translation into series of formal operations - I’ll present a new type of digital art history that seeks to be at once morphological and historical, reductive and interpretative. This focuses on the computational analysis of pose in paintings, how they are used to display emotion and movement, and how certain archetypal representations of emotion persist or re-appear through history.
Title: Image descriptors: from hand-crafting to learning from raw data
Abstract: Image descriptors, i.e. small, invariant representation of image patches, are a key component in many Computer Vision applications. In this talk I will present some of my work on this subject. Firstly, I will talk about a hand-crafted technique to build descriptors invariant by design to scale, rotation and background changes. Secondly, I will present a technique to extract invariant representations with convolutional neural networks from raw image patches.
Title: Learned Invariant Feature Transform
Abstract: Local features are one of the core building blocks of Computer Vision, used in various tasks such as Image Retrieval, Visual Tracking, Image Registration, and Image Matching. There has been numerous works regarding the local feature pipeline since the seminal work of Lowe in 2004. This includes the traditional hand crafted methods and the more recent ones based on Machine Learning.
In this talk, I will introduce learning based approaches to the local feature pipeline, and how to integrate them together into a fully learned pipeline through Deep Learning. I will first introduce TILDE, a learned local feature detector based on piece-wise linear regressor, that can provide highly repetitive keypoints. I will then introduce how to learn orientations of feature points through Deep Siamese Networks. I will then discuss how to put them together with Eduard's Deep Descriptor presented in the earlier week. By leveraging Machine Learning techniques we achieve performances that significantly outperforms the state-of-the-art.
Title: Reconstructing Personalized Anatomical Models for Physics-based Body Animation
Abstract: We present a method to create personalized anatomical models ready for physics-based animation, using only on a set of surface 3D scans. We start by building a template anatomical model of an average male which supports deformations due to both 1) subject-specific variations: shapes and sizes of bones, muscles, and adipose tissues and 2) skeletal poses. Next, we capture a set of 3D scans of an actor in various poses. Our key contribution is formulating and solving a large-scale optimization problem where we solve for both subject-specific and pose-dependent parameters such that our resulting anatomical model explains the captured 3D scans as closely as possible. Compared to data-driven body modeling techniques that focus only on the surface, our approach has the advantage of creating physics-based models, which provide realistic 3D geometry of the bones and muscles, and naturally supports effects such as inertia, gravity, and collisions according to the Newtonian dynamics.
Title: Sphere Meshes for Real-Time Hand Modeling and Tracking
Abstract: Modern systems for real-time hand tracking rely on a combination of discriminative and generative approaches to robustly recover hand poses. Generative approaches require the specification of a geometric model. In this paper, we propose a the use of sphere-meshes as a novel geometric representation for real-time generative hand-tracking. How tightly this model fits a specific user heavily affects tracking precision. We derive an optimisation to non-rigidly deformation template model to fit the user data in a number of poses. At the same time, the limited number of primitives in the tracking template allows us to retain excellent computational performance. We confirm this by embedding our models in an open source real-time registration algorithm to obtain a tracker steadily running at 60Hz. We show that the improved tracking accuracy at high frame-rate enables stable tracking of extended and complex motion sequences without the need for per-frame re-initialisation.
Title: Transforming Rule-based Procedural Models
Abstract: This is a presentation about ongoing work on how to transform designs defined by rule-based procedural models, e.g., buildings or plants.
Given several procedural designs, each specified by a grammar, we combine and merge elements of the existing designs to generate new designs. We introduce two novel technical components to enable such transformations. 1) We extend the concept of discrete rule substitution to rule merging, leading to a huge space for combining procedural designs. 2) We present an algorithm to jointly derive two or more grammars. We demonstrate two applications of our work: we show that our framework leads to more variations of procedural designs than previous work, and we show smooth animation sequences between two procedural models.
Title: Revealing Information by Averaging
Abstract: We present a method for hiding images in synthetic videos and reveal them by temporal averaging. The main challenge is to develop a visual masking method that hides the input image both spatially and temporally. Our masking approach consists of temporal and spatial pixel by pixel variations of the frequency band coefficients representing the image to be hidden. These variations ensure that the target image remains invisible both in the spatial and the temporal domains. In addition, by applying a temporal masking function derived from a dither matrix, we allow the video to carry a visible message that is different from the hidden image. The image hidden in the video can be revealed by software averaging, or with a camera, by long exposure photography. The presented work may find applications in the secure transmission of digital information.
Title: Digital Lippmann Photography
Abstract: Lippmann photography is one of the earliest techniques that reproduces color in photographs. This method, based on the phenomenon of interference, was invented by Gabriel Lippmann and got him the Nobel prize in physics in 1908. It essentially works by capturing the Fourier transform of the spectrum of the incoming light in the depth of a photosensitive material. What is remarkable is that it enables a much richer color reproduction than traditional RGB film or sensor approaches in the sense that it captures the entire spectrum of visible light.
In this talk, I will first briefly introduce a few wave optics key concepts that are needed to understand the Lippmann procedure. Then, I will describe the recording and replay stages of the Lippmann method. Finally, I will discuss our ongoing work to allow digital capture and reproduction of these fascinating artworks. I will for example show what happens when the artworks are observed under varying viewing angles and propose a way to recover the complete spectrum of Lippmann plates using only an RGB camera.