Visu­al Com­put­ing Sem­in­ar (Fall 2016)

Food @ 11:50am,
Talk @ 12:15am
Wenzel Jakob

General information

The Visu­al com­put­ing sem­in­ar is a weekly sem­in­ar series on top­ics in Visu­al Com­put­ing.

Why: The mo­tiv­a­tion for cre­at­ing this sem­in­ar is that EPFL has a crit­ic­al mass of people who are work­ing on subtly re­lated top­ics in com­pu­ta­tion­al pho­to­graphy, com­puter graph­ics, geo­metry pro­cessing, hu­man–com­puter in­ter­ac­tion, com­puter vis­ion and sig­nal pro­cessing. Hav­ing a weekly point of in­ter­ac­tion will provide ex­pos­ure to in­ter­est­ing work in this area and in­crease aware­ness of our shared in­terests and oth­er com­mon­al­it­ies like the use of sim­il­ar com­pu­ta­tion­al tools — think of this as the visu­al com­put­ing edi­tion of the “Know thy neigh­bor” sem­in­ar series.

Who: The tar­get audi­ence are fac­ulty, stu­dents and postdocs in the visu­al com­put­ing dis­cip­lines, but the sem­in­ar is open to any­one and guests are wel­comed. There is no need to form­ally en­roll in a course. The format is very flex­ible and will in­clude 45 minute talks with Q&A, talks by ex­tern­al vis­it­ors, as well as short­er present­a­tions. In par­tic­u­lar, the sem­in­ar is also in­ten­ded as a way for stu­dents to ob­tain feed­back on short­er ~20min talks pre­ced­ing a present­a­tion at a con­fer­ence. If you are a stu­dent or postdoc in one of the visu­al com­put­ing dis­cip­lines, you’ll prob­ably re­ceive email from me soon on schedul­ing a present­a­tion.

Where and when: every Wed­nes­day in BC02 (next to the ground floor at­ri­um).  Food is served at 11:50, and the ac­tu­al talk starts at 12:15.

How to be no­ti­fied: If you want to be kept up to date with an­nounce­ments, please send me an email and I’ll put you on the list. If you are work­ing in LCAV, CVLAB, IVRL, LGG, LSP, IIG, CHILI, LDM or RGL, you are auto­mat­ic­ally sub­scribed to fu­ture an­nounce­ments, so there is noth­ing you need to do.


Date Lecturer Contents
05.10.2016 Wenzel Jakob

Title: Writ­ing Ef­fi­cient Nu­mer­ic­al Code

Ab­stract: Visu­al com­put­ing dis­cip­lines are char­ac­ter­ized by an un­sa­ti­able hun­ger for fast float­ing point com­pu­ta­tions. In the last dec­ade, a series of fun­da­ment­al physi­al lim­it­a­tions has led to ma­jor changes in the mi­croar­chi­tec­ture of today's pro­cessors that have made it in­creas­ing dif­fi­cult to fully har­ness their avail­able nu­mer­ic­al com­put­ing power. In this in­form­al lec­ture, I'll dis­cuss some of the im­plic­a­tions and ways of writ­ing nu­mer­ic­al soft­ware that runs ef­fi­ciently on cur­rent and up­com­ing pro­cessor ar­chi­tec­tures.



19.10.2016 Pierre Baqué

Title: Multi-Mod­al Mean-Fields and clamp­ing.

Ab­stract:  Mean-Field in­fer­ence is a pop­u­lar tech­nique, which has re­cently re­gained in­terest in the com­puter vis­ion com­munity. However, it makes a strong as­sump­tion of in­de­pend­ence of the vari­ables by us­ing a fully-fac­tor­ised ap­prox­im­a­tion to the pos­teri­or dis­tri­bu­tion of the graph­ic­al mod­el, at the cost of lim­it­ing the ef­fi­ciency of the meth­od.

When cor­rel­a­tions in the true pos­teri­or are strong, the Mean-Field ap­prox­im­a­tion con­verges to a loc­al min­im­um, and there­fore tends to mod­el only one mode of the dis­tri­bu­tion. We design an ex­ten­sion of the clamp­ing meth­od pro­posed in pre­vi­ous works, which al­lows us to ob­tain a Multi-Mod­al ap­prox­im­a­tion to the pos­teri­or dis­tri­bu­tion, which is rich­er than the na­ive Mean-Field one. We also show that our gen­er­al­isa­tion of the clamp­ing idea un­leashes to power of this meth­od for prac­tic­al ap­plic­a­tions. 

We il­lus­trate, through two prac­tic­al ex­amples, how this Multi-Mod­al struc­tured out­put can be used, for im­prov­ing ped­es­tri­an track­ing and pro­pos­ing di­verse out­puts in se­mant­ic seg­ment­a­tion.

26.10.2016 Sabine Süsstrunk

Title: Col­or and Spec­tral In­form­a­tion in Com­puter Vis­ion and Mul­ti­me­dia

Ab­stract: I will present the tu­tori­al I gave at the joint ECCV/ACM Mul­ti­me­dia tu­tori­al day in Am­s­ter­dam a few days ago. I will start off with a brief (and biased) his­tory of col­ors, talk­ing spe­cific­ally about tri­chro­mat­ic vis­ion and op­pon­ent col­or, how they are modeled, and how they should (and should not) be used. I then in­tro­duce a couple of ex­amples in com­puter vis­ion, namely sa­li­ency and su­per pixels, fol­lowed by some mul­ti­me­dia ex­amples, namely video aes­thet­ics and se­mant­ic im­age en­hance­ments. At the end, I show how we can ex­tend our col­or mod­els in­to the near-in­frared, and why this is in­ter­est­ing. The re­search ex­amples shown are from former PhDs and Postdocs of IVRL.

02.11.2016 Leonardo Impett

Title: Ways of Ma­chine See­ing

Ab­stract: This talk will fo­cus on my pro­ject with Sabine Suesstrunk and Franco Mor­etti to ana­lyse Aby War­burg’s Bil­der­at­las (im­ages-at­las), a kind of early ‘big data’ pro­ject of 1920s Art His­tory. The pro­ject star­ted in the op­pos­ite way to most in com­puter-sci­ence: a clear set of data, but no ex­pli­cit prob­lem to be solved. Through the ‘op­er­a­tion­al­isaton’ of Aby War­burg’s con­cepts - their trans­la­tion in­to series of form­al op­er­a­tions - I’ll present a new type of di­git­al art his­tory that seeks to be at once mor­pho­lo­gic­al and his­tor­ic­al, re­duct­ive and in­ter­pret­at­ive. This fo­cuses on the com­pu­ta­tion­al ana­lys­is of pose in paint­ings, how they are used to dis­play emo­tion and move­ment, and how cer­tain ar­chetyp­al rep­res­ent­a­tions of emo­tion per­sist or re-ap­pear through his­tory. 

Un­like lit­er­ary his­tory or mu­si­co­logy, art his­tory cur­rently has al­most no com­pu­ta­tion­al study that ana­lyses the work it­self (the im­age, not the metadata). I’ll there­fore talk about the pro­spects for scal­ing this kind of art-his­tor­ic­al ana­lys­is to bey­ond the Bil­der­at­las, in­clud­ing some of my planned fu­ture work on auto­mat­ic hu­man re­cog­ni­tion and pose de­tec­tion in paint­ings, the dif­fer­ence between do­ing com­puter-vis­ion on pho­tos and on paint­ings, and some prob­lems with in­cur­ably tiny data­sets.

09.11.2016 Eduard Trulls

Title: Im­age descriptors: from hand-craft­ing to learn­ing from raw data

Ab­stract: Im­age descriptors, i.e. small, in­vari­ant rep­res­ent­a­tion of im­age patches, are a key com­pon­ent in many Com­puter Vis­ion ap­plic­a­tions. In this talk I will present some of my work on this sub­ject. Firstly, I will talk about a hand-craf­ted tech­nique to build descriptors in­vari­ant by design to scale, ro­ta­tion and back­ground changes. Secondly, I will present a tech­nique to ex­tract in­vari­ant rep­res­ent­a­tions with con­vo­lu­tion­al neur­al net­works from raw im­age patches.

16.11.2016 Kwang Yi

Title: Learned In­vari­ant Fea­ture Trans­form

Ab­stract: Loc­al fea­tures are one of the core build­ing blocks of Com­puter Vis­ion, used in vari­ous tasks such as Im­age Re­triev­al, Visu­al Track­ing, Im­age Re­gis­tra­tion, and Im­age Match­ing. There has been nu­mer­ous works re­gard­ing the loc­al fea­ture pipeline since the sem­in­al work of Lowe in 2004. This in­cludes the tra­di­tion­al hand craf­ted meth­ods and the more re­cent ones based on Ma­chine Learn­ing.

In this talk, I will in­tro­duce learn­ing based ap­proaches to the loc­al fea­ture pipeline, and how to in­teg­rate them to­geth­er in­to a fully learned pipeline through Deep Learn­ing. I will first in­tro­duce TILDE, a learned loc­al fea­ture de­tect­or based on piece-wise lin­ear re­gressor, that can provide highly re­pet­it­ive key­po­ints. I will then in­tro­duce how to learn ori­ent­a­tions of fea­ture points through Deep Sia­mese Net­works. I will then dis­cuss how to put them to­geth­er with Eduard's Deep Descriptor presen­ted in the earli­er week. By lever­aging Ma­chine Learn­ing tech­niques we achieve per­form­ances that sig­ni­fic­antly out­per­forms the state-of-the-art.

23.11.2016 Alexandru Ichim

Title: Re­con­struct­ing Per­son­al­ized Ana­tom­ic­al Mod­els for Phys­ics-based Body An­im­a­tion

Ab­stract: We present a meth­od to cre­ate per­son­al­ized ana­tom­ic­al mod­els ready for phys­ics-based an­im­a­tion, us­ing only on a set of sur­face 3D scans. We start by build­ing a tem­plate ana­tom­ic­al mod­el of an av­er­age male which sup­ports de­form­a­tions due to both 1) sub­ject-spe­cif­ic vari­ations: shapes and sizes of bones, muscles, and adipose tis­sues and 2) skelet­al poses. Next, we cap­ture a set of 3D scans of an act­or in vari­ous poses. Our key con­tri­bu­tion is for­mu­lat­ing and solv­ing a large-scale op­tim­iz­a­tion prob­lem where we solve for both sub­ject-spe­cif­ic and pose-de­pend­ent para­met­ers such that our res­ult­ing ana­tom­ic­al mod­el ex­plains the cap­tured 3D scans as closely as pos­sible. Com­pared to data-driv­en body mod­el­ing tech­niques that fo­cus only on the sur­face, our ap­proach has the ad­vant­age of cre­at­ing phys­ics-based mod­els, which provide real­ist­ic 3D geo­metry of the bones and muscles, and nat­ur­ally sup­ports ef­fects such as in­er­tia, grav­ity, and col­li­sions ac­cord­ing to the New­to­ni­an dy­nam­ics.

30.11.2016 Anastasia Tkach

Title: Sphere Meshes for Real-Time Hand Mod­el­ing and Track­ing

Ab­stract: Mod­ern sys­tems for real-time hand track­ing rely on a com­bin­a­tion of dis­crim­in­at­ive and gen­er­at­ive ap­proaches to ro­bustly re­cov­er hand poses. Gen­er­at­ive ap­proaches re­quire the spe­cific­a­tion of a geo­met­ric mod­el.  In this pa­per, we pro­pose a the use of sphere-meshes as a nov­el geo­met­ric rep­res­ent­a­tion for real-time gen­er­at­ive hand-track­ing. How tightly this mod­el fits a spe­cif­ic user heav­ily af­fects track­ing pre­ci­sion. We de­rive an op­tim­isa­tion to non-ri­gidly de­form­a­tion tem­plate mod­el to fit the user data in a num­ber of poses.  At the same time, the lim­ited num­ber of prim­it­ives in the track­ing tem­plate al­lows us to re­tain ex­cel­lent com­pu­ta­tion­al per­form­ance. We con­firm this by em­bed­ding our mod­els in an open source real-time re­gis­tra­tion al­gorithm to ob­tain a track­er stead­ily run­ning at 60Hz. We show that the im­proved track­ing ac­cur­acy at high frame-rate en­ables stable track­ing of ex­ten­ded and com­plex mo­tion se­quences without the need for per-frame re-ini­tial­isa­tion.

07.12.2016 Stefan Lienhard

Title: Trans­form­ing Rule-based Pro­ced­ur­al Mod­els

Ab­stract: This is a present­a­tion about on­go­ing work on how to trans­form designs defined by rule-based pro­ced­ur­al mod­els, e.g., build­ings or plants.

Giv­en sev­er­al pro­ced­ur­al designs, each spe­cified by a gram­mar, we com­bine and merge ele­ments of the ex­ist­ing designs to gen­er­ate new designs. We in­tro­duce two nov­el tech­nic­al com­pon­ents to en­able such trans­form­a­tions. 1) We ex­tend the concept of dis­crete rule sub­sti­tu­tion to rule mer­ging, lead­ing to a huge space for com­bin­ing pro­ced­ur­al designs. 2) We present an al­gorithm to jointly de­rive two or more gram­mars. We demon­strate two ap­plic­a­tions of our work: we show that our frame­work leads to more vari­ations of pro­ced­ur­al designs than pre­vi­ous work, and we show smooth an­im­a­tion se­quences between two pro­ced­ur­al mod­els.

14.12.2016 Sami Arpa

Title: Re­veal­ing In­form­a­tion by Av­er­aging

Ab­stract: We present a meth­od for hid­ing im­ages in syn­thet­ic videos and re­veal them by tem­por­al av­er­aging. The main chal­lenge is to de­vel­op a visu­al mask­ing meth­od that hides the in­put im­age both spa­tially and tem­por­ally. Our mask­ing ap­proach con­sists of tem­por­al and spa­tial pixel by pixel vari­ations of the fre­quency band coef­fi­cients rep­res­ent­ing the im­age to be hid­den. These vari­ations en­sure that the tar­get im­age re­mains in­vis­ible both in the spa­tial and the tem­por­al do­mains. In ad­di­tion, by ap­ply­ing a tem­por­al mask­ing func­tion de­rived from a dith­er mat­rix, we al­low the video to carry a vis­ible mes­sage that is dif­fer­ent from the hid­den im­age. The im­age hid­den in the video can be re­vealed by soft­ware av­er­aging, or with a cam­era, by long ex­pos­ure pho­to­graphy. The presen­ted work may find ap­plic­a­tions in the se­cure trans­mis­sion of di­git­al in­form­a­tion.

21.12.2016 Gilles Baechler

Title: Di­git­al Lippmann Pho­to­graphy

Ab­stract: Lippmann pho­to­graphy is one of the earli­est tech­niques that re­pro­duces col­or in pho­to­graphs. This meth­od, based on the phe­nomen­on of in­ter­fer­ence, was in­ven­ted by Gab­ri­el Lippmann and got him the No­bel prize in phys­ics in 1908. It es­sen­tially works by cap­tur­ing the Four­i­er trans­form of the spec­trum of the in­com­ing light in the depth of a pho­to­sensit­ive ma­ter­i­al. What is re­mark­able is that it en­ables a much rich­er col­or re­pro­duc­tion than tra­di­tion­al RGB film or sensor ap­proaches in the sense that it cap­tures the en­tire spec­trum of vis­ible light.

In this talk, I will first briefly in­tro­duce a few wave op­tics key con­cepts that are needed to un­der­stand the Lippmann pro­ced­ure. Then, I will de­scribe the re­cord­ing and re­play stages of the Lippmann meth­od. Fi­nally, I will dis­cuss our on­go­ing work to al­low di­git­al cap­ture and re­pro­duc­tion of these fas­cin­at­ing art­works. I will for ex­ample show what hap­pens when the art­works are ob­served un­der vary­ing view­ing angles and pro­pose a way to re­cov­er the com­plete spec­trum of Lippmann plates us­ing only an RGB cam­era.