Visual Computing Seminar (Fall 2018)
The Visual computing seminar is a weekly seminar series on topics in Visual Computing.
Why: The motivation for creating this seminar is that EPFL has a critical mass of people who are working on subtly related topics in computational photography, computer graphics, geometry processing, human–computer interaction, computer vision and signal processing. Having a weekly point of interaction will provide exposure to interesting work in this area and increase awareness of our shared interests and other commonalities like the use of similar computational tools — think of this as the visual computing edition of the “Know thy neighbor” seminar series.
Who: The target audience are faculty, students and postdocs in the visual computing disciplines, but the seminar is open to anyone and guests are welcomed. There is no need to formally enroll in a course. The format is very flexible and will include 45 minute talks with Q&A, talks by external visitors, as well as shorter presentations. In particular, the seminar is also intended as a way for students to obtain feedback on shorter ~20min talks preceding a presentation at a conference. If you are a student or postdoc in one of the visual computing disciplines, you’ll probably receive email from me soon on scheduling a presentation.
Where and when: every Wednesday in BC01 (note the changed location!). Food is served at 11:50, and the actual talk starts at 12:15.
How to be notified: If you want to be kept up to date with announcements, please send me an email and I’ll put you on the list. If you are working in LCAV, CVLAB, IVRL, LGG, LSP, IIG, CHILI, LDM or RGL, you are automatically subscribed to future announcements, so there is nothing you need to do.
You may add the seminar events to Google Calendar (click the '+' button in the bottom-right corner), or download the iCal file.
Blue Brain Brayns, A platform for high fidelity large-scale and interactive visualization of scientific data and brain structures
The Blue Brain Project has made major efforts to create morphologically accurate neurons to simulate sub-cellular and electrical activities, for example, molecular simulations of neuron biochemistry or multi-scale simulations of neuronal function.
Overcoming neural brainwashing
We identify a phenomenon, which we dub neural brainwashing, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a subsequent one, due to the overwriting of shared parameters. To overcome this, we introduce a statistically-justified weight plasticity loss that regularizes the learning of a model's shared parameters according to their importance for the previous models, and demonstrate its effectiveness when training two models sequentially and for neural architecture search. Adding weight plasticity in neural architecture search preserves the best models to the end of the search process leading to improved results in both natural language processing and computer vision tasks.
Good Features to Correlate for Visual Tracking
In this talk, I will mainly talk about visual object tracking problem which I worked on during my Ph.D. studies. As a secondary topic, I will mention my recent research activities about garment virtualization by deep learning.
|24.10.2018||Krishna Kanth Nakka||
Deep Attentional Structured Representation Learning for Visual Recognition
Structured representations, such as Bags of Words, VLAD and Fisher Vectors, have proven highly effective to tackle complex visual recognition tasks. As such, they have recently been incorporated into deep architectures. However, while effective, the resulting deep structured representation learning strategies typically aggregate local features from the entire image, ignoring the fact that, in complex recognition tasks, some regions provide much more discriminative information than others. In this work, we introduce an attentional structured representation learning framework that incorporates an image-specific attention mechanism within the aggregation process. Our framework learns to predict jointly the image class label and an attention map in an end-to-end fashion and without any other supervision than the target label. As evidenced by our experiments, this consistently outperforms attention-less structured representation learning and yields state-of-the-art results on standard scene recognition and fine-grained categorization benchmarks.
|31.10.2018||Kevin Gonyop Kim||
Expanding experience of the learners in vocational education
Vocational education and training (VET) that takes place in dual contexts of school and workplace is a well-established secondary education system in Switzerland. Although it is known as an effective system for developing vocational competence, there exists some gap between what they are supposed to learn and what they practice at workplaces. The workplace experiences are usually limited to concrete situations in particular environments and their connections to the general knowledge learned from schools are often weak.
DESIA: A General Framework for Designing Interlocking Assemblies
Interlocking assemblies have a long history in the design of puzzles, furniture, architecture, and other complex geometric structures. The key defining property of interlocking assemblies is that all component parts are immobilized by their geometric arrangement, preventing the assembly from falling apart. Computer graphics research has recently contributed design tools that allow creating new interlocking assemblies. However, these tools focus on specific kinds of assemblies and explore only a limited space of interlocking configurations, which restricts their applicability for design.
In this talk, we present a new general framework for designing interlocking assemblies. The core idea is to represent part relationships with
a family of base Directional Blocking Graphs and leverage efficient graph analysis tools to compute an interlocking arrangement of parts. This avoids the exponential complexity of brute-force search. Our algorithm iteratively constructs the geometry of assembly components, taking advantage of all existing blocking relations for constructing successive parts. As a result, our approach supports a wider range of assembly forms compared to previous methods and provides significantly more design flexibility. We show that our framework facilitates efficient design of complex interlocking assemblies, including new solutions that cannot be achieved by state of the art approaches.
Deep Shape Optimisation
Aerodynamic shape optimization has many industrial applications. Existing methods, however, are so computationally demanding that typical engineering practices are to either simply try a limited number of hand-designed shapes or restrict oneself to shapes that can be parameterized using only few degrees of freedom. In this work, we introduce a new way to optimize complex shapes fast and accurately. To this end, we train Geodesic Convolutional Neural Networks to emulate a fluid dynamics simulator. The key to making this approach practical is remeshing the original shape using a polycube map, which makes it possible to perform the computations on GPUs instead of CPUs. The neural net is then used to formulate an objective function that is differentiable with respect to the shape parameters, which can then be optimised using a gradient-based technique. This outperforms state-of-the-art methods by 5 to 20% for standard problems and, even more importantly, our approach applies to cases that previous methods cannot handle.
Capturing and rendering the world of materials
One of the key ingredients of any realistic rendering system is a description of the way in which light interacts with objects, typically modeled via the Bidirectional Reflectance Distribution Function (BRDF). Unfortunately, real-world BRDF data remains extremely scarce due to the difficulty of acquiring it: a BRDF measurement requires scanning a four-dimensional domain at high resolution—an infeasibly time-consuming process.
In this talk, I'll showcase our ongoing work on assembling a large library of materials including including metals, fabrics, organic substances like wood or plant leaves, etc. The key idea to work around the curse of dimensionality is an adaptive parameterization, which automatically warps the 4D space so that most of the volume maps to “interesting” regions. Starting with a review of BRDF models and microfacet theory, I'll explain the new model, as well as the optical measurement apparatus that we used to conduct the measurements.
Deformable surface reconstruction from a single view
Recent years have seen the development of mature solutions for reconstructing deformable surfaces from a single image, provided that they are relatively well-textured. By contrast, recovering the 3D shape of texture-less surfaces remains an open problem, and essentially relates to Shape-from-Shading. In this paper, we introduce a data-driven approach to this problem. We introduce a general framework that can predict diverse 3D representations, such as meshes, normals, and depth maps. Our experiments show that meshes are ill-suited to handle texture-less 3D reconstruction in our context. Furthermore, we demonstrate that our approach generalizes well to unseen objects, and that it yields higher-quality reconstructions than a state-of-the-art SfS technique, particularly in terms of normal estimates. Our reconstructions accurately model the ﬁne details of the surfaces, such as the creases of a T-Shirt worn by a person. Since the 3D shape reconstruction from a single view is known to be subject to ambiguities due to the fact that various combinations of shape, lighting and material result in the same 2D observation, we further explore the possibilities to predict not just a single but multiple likely shapes and we look into learning a better shape representation.
Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses
Motion and pose estimation from 3D to 2D correspondences, can be solved by finding the eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix representing a linear system. Incorporating this in deep learning frameworks would allow us to explicitly encode known notions of geometry, instead of having the network implicitly learn them from data. However, performing eigendecomposition within a network requires the ability to differentiate this operation. Unfortunately, while theoretically doable, this introduces numerical instability in the optimization process in practice.
Active Drone Based Human Pose Estimation
Reconstruction of 3D human pose has become a widely studied direction of research, with a recent interest in outdoor and drone-based capture. The rising popularity of commercially available drones with on-board cameras makes this form of motion capture (MoCap) accessible to the consumer market. However, little work has been done on controlling the drone to maximize reconstruction accuracy. Existing drone-based MoCap solutions use pre-defined controllers, such as following the person in a constant angle, or at constant rotation. On the other hand, the robotics literature mostly covers the active reconstruction of static scenes. Our goal is to actively re-position a drone during MoCap so that it moves to the position where it will have the highest accuracy. Key to our method is to estimate the expected reconstruction uncertainty in the presence of dynamic motion. Our goal in the end is to show that our active motion planning improves the pose estimation results by comparing it against several baseline policies.
Neural Scene Decomposition for Human Motion Capture
Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet. However, when it comes to 3D reconstruction, those features learned on ImageNet are only of limited use.
We therefore propose an approach to learning representations that are useful for this purpose. To this end, we introduce a self-supervised approach to learning what we call a neural scene decomposition (NSD) that can be exploited for 3D pose estimation. NSD comprises three layers of abstraction to represent human subjects: A bounding-box; a 2D shape representation in terms of an instance segmentation mask; and subject-specific appearance and 3D pose information. Our NSD model can be trained end-to-end without any 2D or 3D supervision by exploiting self-supervision coming from multiview data. Because it encodes 3D geometry, NSD can then be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data. NSD is also well suited for CG applications, such as the seamless transition between two video perspectives and novel view synthesis.