Visual Computing Seminar (Spring 2017)
The Visual computing seminar is a weekly seminar series on topics in Visual Computing.
Why: The motivation for creating this seminar is that EPFL has a critical mass of people who are working on subtly related topics in computational photography, computer graphics, geometry processing, human–computer interaction, computer vision and signal processing. Having a weekly point of interaction will provide exposure to interesting work in this area and increase awareness of our shared interests and other commonalities like the use of similar computational tools — think of this as the visual computing edition of the “Know thy neighbor” seminar series.
Who: The target audience are faculty, students and postdocs in the visual computing disciplines, but the seminar is open to anyone and guests are welcomed. There is no need to formally enroll in a course. The format is very flexible and will include 45 minute talks with Q&A, talks by external visitors, as well as shorter presentations. In particular, the seminar is also intended as a way for students to obtain feedback on shorter ~20min talks preceding a presentation at a conference. If you are a student or postdoc in one of the visual computing disciplines, you’ll probably receive email from me soon on scheduling a presentation.
Where and when: every Wednesday in BC04 (next to the ground floor atrium). Food is served at 11:50, and the actual talk starts at 12:15.
How to be notified: If you want to be kept up to date with announcements, please send me an email and I’ll put you on the list. If you are working in LCAV, CVLAB, IVRL, LGG, LSP, IIG, CHILI, LDM or RGL, you are automatically subscribed to future announcements, so there is nothing you need to do.
Title: Human motion capture for interactive applications
Abstract: Motion capture is a long-standing problem in computer vision. Markerless solutions promised to enable new forms of human-computer interaction. In practice, however, commercial solutions heavily rely on suite-based capture (movie industry) and partial reconstruction with handheld devices (Wii remote and HTC Vive).
I’ll talk about my research on overcoming remaining limitations. How to 1) reduce the necessary number of cameras, 2) automatically create actor models, and 3) enable arbitrary large capture volumes from egocentric perspectives. We would like to enable applications such as a virtual (ski-) trainer. We imagine a drone that follows the user’s (skiing-) motion and gives expert feedback and corrections in real-time—like a real trainer. Towards this goal, we look into 4) active reconstruction and 5) weakly supervised training.
Exceptional room change: This seminar takes place in INR 113.
Title: Physics-based Human Reconstruction and Animation
Abstract: Creating digital representations of humans is of utmost importance for applications ranging from entertainment (video games, movies) to human-computer interaction and even psychiatrical treatments. Building credible digital doubles is difficult, because the human vision system is very sensitive to perceiving the expressivity and potential anomalies in body structures and motion.
During this talk I will present several projects completed during my PhD that tackle these problems. I will begin by describing a complete pipeline that allows users to reconstruct fully rigged 3D facial avatars using video data coming from a handheld device (e.g., smartphone). This is done through an optimization that integrates feature tracking, optical flow, and shape from shading. Continuing along the lines of accessible acquisition systems, we discuss a framework for simultaneous tracking and modeling of articulated human bodies from RGB-D data. In the second half of the talk, we will deviate from using standard linear reconstruction and animation models, and rather focus on exploiting physics-based techniques that are able to incorporate complex phenomena such as dynamics, collision response and incompressibility of the flesh. I will present a project which assumes that each 3D scan of an actor records his body in a physical steady state and uses inverse physics to extract a volumetric physics-ready anatomical model of him. This is then extended to a novel physics-based approach for facial reconstruction and animation, which allows for new avenues of dynamic artistic control, simulation of corrective facial surgery, and interaction with external forces and objects.
Title: Weakly-supervised semantic segmentation from web images
Abstract: We propose a weakly supervised semantic segmentation algorithm that uses image tags for supervision. We apply the tags in queries to collect three sets of web images, which encode the clean foregrounds, the common backgrounds, and realistic scenes of the classes. We introduce a novel three-stage training pipeline to progressively learn semantic segmentation models. We first train and refine a class-specific shallow neural network to obtain segmentation masks for each class. The shallow neural networks of all classes are then assembled into one deep convolutional neural network for end-to-end training and testing. Experiments show that our method notably outperforms previous state-of-the-art weakly supervised semantic segmentation approaches on the PASCAL VOC 2012 segmentation benchmark. We further apply the class-specific shallow neural networks to object segmentation and obtain excellent results.
Title: Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition
Abstract: We present a unified framework for understanding human social behaviors in raw image sequences. Our model jointly detects multiple individuals, infers their social actions, and estimates the collective actions with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end to generate dense proposal maps that are refined via a novel inference scheme. The temporal consistency is handled via a person-level matching Recurrent Neural Network. The complete model takes as input a sequence of frames and outputs detections along with the estimates of individual actions and collective activities. We demonstrate state-of-the-art performance of our algorithm on multiple publicly available benchmarks.
Title: Active Learning for Delineation of Curvilinear Structures
Abstract: Supervised machine learning algorithms intrinsically require extensive amounts of annotated ground-truth data, which is difficult and tedious to obtain especially in biomedical applications. To make such methods truly practical, we propose an Active Learning approach suited especially for annotating elongated structures like neurons, blood vessels and roads. It speeds up the annotation process by up to 80%, thus greatly decreasing the effort. It does so by taking into consideration local and global topology specificities of the delineation, which give us a hint which parts of the reconstruction are particularly ambiguous. Moreover, we show that similar approach can be used to detect the errors made by the reconstruction algorithm. It is then possible to present them to the expert for validation, without requiring him to visual inspect the whole scan. This way, we make the most out of expert knowledge and efficiency of automatic tools.
Title: Vision-based detection of aircrafts and UAVs
Abstract: Unmanned Aerial Vehicles are becoming increasingly popular for a broad variety of tasks ranging from aerial imagery to objects delivery. With the expansion of the areas, where drones can be efficiently used, the collision risk with other flying objects increases. Avoiding such collisions would be a simpler problem, if all the aircrafts could communicate with each other and share their location information. However, it is often the case that either location information is unavailable or communication is not possible. To ensure flight safety in this kind of situations drones need a way to autonomously detect other objects that are intruding the neighboring airspace. Visual-based collision avoidance is of particular interest as cameras generally consume less power and are more lightweight than active sensor alternatives. We have therefore developed a set of increasingly sophisticated algorithms to provide drones with a visual collision avoidance capability.
First, we present a novel method that combines motion and appearance information for detecting flying objects such as drones and planes that occupy a small part of the camera field of view, possibly move in front of complex backgrounds, and are filmed by a moving camera. Second, in order to reduce the need to collect a large training dataset and to manually annotate it, we introduce a way to generate realistic synthetic images based on a small set of real examples and a coarse 3D model of the object. Finally, motivated by the recent success of Deep learning methods, we present deep domain adaptation approach, that effectively leverages synthetic data to improve the detector quality.
No seminar (Easter)
Title: Simultaneous Geometric and Radiometric Calibration of a Projector-Camera Pair
Abstract: We present a method that allows for simultaneous geometric and radiometric calibration of a projector-camera pair. It is simple and does not require specialized hardware. We prewarp and align a specially designed projection pattern onto a printed pattern of different colorimetric properties. After capturing the patterns in several orientations, we perform geometric calibration by estimating the corner locations of the two patterns in different color channels. We perform radiometric calibration of the projector by using the information contained inside the projected squares. We show that our method performs on par with current approaches that all require separate geometric and radiometric calibration, while being efficient and user friendly.