Visu­al Com­put­ing Sem­in­ar (Spring 2017)

Food @ 11:50am,
Talk @ 12:15am
Wenzel Jakob

General information

The Visu­al com­put­ing sem­in­ar is a weekly sem­in­ar series on top­ics in Visu­al Com­put­ing.

Why: The mo­tiv­a­tion for cre­at­ing this sem­in­ar is that EPFL has a crit­ic­al mass of people who are work­ing on subtly re­lated top­ics in com­pu­ta­tion­al pho­to­graphy, com­puter graph­ics, geo­metry pro­cessing, hu­man–com­puter in­ter­ac­tion, com­puter vis­ion and sig­nal pro­cessing. Hav­ing a weekly point of in­ter­ac­tion will provide ex­pos­ure to in­ter­est­ing work in this area and in­crease aware­ness of our shared in­terests and oth­er com­mon­al­it­ies like the use of sim­il­ar com­pu­ta­tion­al tools — think of this as the visu­al com­put­ing edi­tion of the “Know thy neigh­bor” sem­in­ar series.

Who: The tar­get audi­ence are fac­ulty, stu­dents and postdocs in the visu­al com­put­ing dis­cip­lines, but the sem­in­ar is open to any­one and guests are wel­comed. There is no need to form­ally en­roll in a course. The format is very flex­ible and will in­clude 45 minute talks with Q&A, talks by ex­tern­al vis­it­ors, as well as short­er present­a­tions. In par­tic­u­lar, the sem­in­ar is also in­ten­ded as a way for stu­dents to ob­tain feed­back on short­er ~20min talks pre­ced­ing a present­a­tion at a con­fer­ence. If you are a stu­dent or postdoc in one of the visu­al com­put­ing dis­cip­lines, you’ll prob­ably re­ceive email from me soon on schedul­ing a present­a­tion.

Where and when: every Wed­nes­day in BC04 (next to the ground floor at­ri­um).  Food is served at 11:50, and the ac­tu­al talk starts at 12:15.

How to be no­ti­fied: If you want to be kept up to date with an­nounce­ments, please send me an email and I’ll put you on the list. If you are work­ing in LCAV, CVLAB, IVRL, LGG, LSP, IIG, CHILI, LDM or RGL, you are auto­mat­ic­ally sub­scribed to fu­ture an­nounce­ments, so there is noth­ing you need to do.


Date Lecturer Contents
08.03.2017 Helge Rhodin

Title: Hu­man mo­tion cap­ture for in­ter­act­ive ap­plic­a­tions

Ab­stract: Mo­tion cap­ture is a long-stand­ing prob­lem in com­puter vis­ion. Mark­er­less solu­tions prom­ised to en­able new forms of hu­man-com­puter in­ter­ac­tion. In prac­tice, however, com­mer­cial solu­tions heav­ily rely on suite-based cap­ture (movie in­dustry) and par­tial re­con­struc­tion with hand­held devices (Wii re­mote and HTC Vive).

I’ll talk about my re­search on over­com­ing re­main­ing lim­it­a­tions. How to 1) re­duce the ne­ces­sary num­ber of cam­er­as, 2) auto­mat­ic­ally cre­ate act­or mod­els, and 3) en­able ar­bit­rary large cap­ture volumes from ego­centric per­spect­ives. We would like to en­able ap­plic­a­tions such as a vir­tu­al (ski-) train­er. We ima­gine a drone that fol­lows the user’s (ski­ing-) mo­tion and gives ex­pert feed­back and cor­rec­tions in real-time—like a real train­er. To­wards this goal, we look in­to 4) act­ive re­con­struc­tion and 5) weakly su­per­vised train­ing.

15.03.2017 Alexandru Ichim

Ex­cep­tion­al room change: This sem­in­ar takes place in INR 113.

Title: Phys­ics-based Hu­man Re­con­struc­tion and An­im­a­tion

Ab­stract: Cre­at­ing di­git­al rep­res­ent­a­tions of hu­mans is of ut­most im­port­ance for ap­plic­a­tions ran­ging from en­ter­tain­ment (video games, movies) to hu­man-com­puter in­ter­ac­tion and even psy­chi­at­ric­al treat­ments. Build­ing cred­ible di­git­al doubles is dif­fi­cult, be­cause the hu­man vis­ion sys­tem is very sens­it­ive to per­ceiv­ing the ex­pressiv­ity and po­ten­tial an­om­alies in body struc­tures and mo­tion.

Dur­ing this talk I will present sev­er­al pro­jects com­pleted dur­ing my PhD that tackle these prob­lems. I will be­gin by de­scrib­ing a com­plete pipeline that al­lows users to re­con­struct fully rigged 3D fa­cial avatars us­ing video data com­ing from a hand­held device (e.g., smart­phone). This is done through an op­tim­iz­a­tion that in­teg­rates fea­ture track­ing, op­tic­al flow, and shape from shad­ing. Con­tinu­ing along the lines of ac­cess­ible ac­quis­i­tion sys­tems, we dis­cuss a frame­work for sim­ul­tan­eous track­ing and mod­el­ing of ar­tic­u­lated hu­man bod­ies from RGB-D data. In the second half of the talk, we will de­vi­ate from us­ing stand­ard lin­ear re­con­struc­tion and an­im­a­tion mod­els, and rather fo­cus on ex­ploit­ing phys­ics-based tech­niques that are able to in­cor­por­ate com­plex phe­nom­ena such as dy­nam­ics, col­li­sion re­sponse and in­com­press­ib­il­ity of the flesh. I will present a pro­ject which as­sumes that each 3D scan of an act­or re­cords his body in a phys­ic­al steady state and uses in­verse phys­ics to ex­tract a volu­met­ric phys­ics-ready ana­tom­ic­al mod­el of him. This is then ex­ten­ded to a nov­el phys­ics-based ap­proach for fa­cial re­con­struc­tion and an­im­a­tion, which al­lows for new av­en­ues of dy­nam­ic artist­ic con­trol, sim­u­la­tion of cor­rect­ive fa­cial sur­gery, and in­ter­ac­tion with ex­tern­al forces and ob­jects.

22.03.2017 Bin Jin

Title: Weakly-su­per­vised se­mant­ic seg­ment­a­tion from web im­ages

Ab­stract: We pro­pose a weakly su­per­vised se­mant­ic seg­ment­a­tion al­gorithm that uses im­age tags for su­per­vi­sion. We ap­ply the tags in quer­ies to col­lect three sets of web im­ages, which en­code the clean fore­grounds, the com­mon back­grounds, and real­ist­ic scenes of the classes. We in­tro­duce a nov­el three-stage train­ing pipeline to pro­gress­ively learn se­mant­ic seg­ment­a­tion mod­els. We first train and re­fine a class-spe­cif­ic shal­low neur­al net­work to ob­tain seg­ment­a­tion masks for each class. The shal­low neur­al net­works of all classes are then as­sembled in­to one deep con­vo­lu­tion­al neur­al net­work for end-to-end train­ing and test­ing. Ex­per­i­ments show that our meth­od not­ably out­per­forms pre­vi­ous state-of-the-art weakly su­per­vised se­mant­ic seg­ment­a­tion ap­proaches on the PASCAL VOC 2012 seg­ment­a­tion bench­mark. We fur­ther ap­ply the class-spe­cif­ic shal­low neur­al net­works to ob­ject seg­ment­a­tion and ob­tain ex­cel­lent res­ults.

29.03.2017 Timur Bagautdinov

Title: So­cial Scene Un­der­stand­ing: End-to-End Multi-Per­son Ac­tion Loc­al­iz­a­tion and Col­lect­ive Activ­ity Re­cog­ni­tion

Ab­stract: We present a uni­fied frame­work for un­der­stand­ing hu­man so­cial be­ha­vi­ors in raw im­age se­quences. Our mod­el jointly de­tects mul­tiple in­di­vidu­als, in­fers their so­cial ac­tions, and es­tim­ates the col­lect­ive ac­tions with a single feed-for­ward pass through a neur­al net­work. We pro­pose a single ar­chi­tec­ture that does not rely on ex­tern­al de­tec­tion al­gorithms but rather is trained end-to-end to gen­er­ate dense pro­pos­al maps that are re­fined via a nov­el in­fer­ence scheme. The tem­por­al con­sist­ency is handled via a per­son-level match­ing Re­cur­rent Neur­al Net­work. The com­plete mod­el takes as in­put a se­quence of frames and out­puts de­tec­tions along with the es­tim­ates of in­di­vidu­al ac­tions and col­lect­ive activ­it­ies. We demon­strate state-of-the-art per­form­ance of our al­gorithm on mul­tiple pub­licly avail­able bench­marks.

05.04.2017 Agata Mosinska

Title: Act­ive Learn­ing for De­lin­eation of Cur­vi­lin­ear Struc­tures

Ab­stract: Su­per­vised ma­chine learn­ing al­gorithms in­trins­ic­ally re­quire ex­tens­ive amounts of an­not­ated ground-truth data, which is dif­fi­cult and te­di­ous to ob­tain es­pe­cially in bio­med­ic­al ap­plic­a­tions. To make such meth­ods truly prac­tic­al, we pro­pose an Act­ive Learn­ing ap­proach suited es­pe­cially for an­not­at­ing elong­ated struc­tures like neur­ons, blood ves­sels and roads. It speeds up the an­nota­tion pro­cess by up to 80%, thus greatly de­creas­ing the ef­fort. It does so by tak­ing in­to con­sid­er­a­tion loc­al and glob­al to­po­logy spe­cificit­ies of the de­lin­eation, which give us a hint which parts of the re­con­struc­tion are par­tic­u­larly am­bigu­ous. Moreover, we show that sim­il­ar ap­proach can be used to de­tect the er­rors made by the re­con­struc­tion al­gorithm. It is then pos­sible to present them to the ex­pert for val­id­a­tion, without re­quir­ing him to visu­al in­spect the whole scan. This way, we make the most out of ex­pert know­ledge and ef­fi­ciency of auto­mat­ic tools.

12.04.2017 Artem Rozantsev

Title: Vis­ion-based de­tec­tion of air­crafts and UAVs

Ab­stract: Un­manned Aer­i­al Vehicles are be­com­ing in­creas­ingly pop­u­lar for a broad vari­ety of tasks ran­ging from aer­i­al im­agery to ob­jects de­liv­ery. With the ex­pan­sion of the areas, where drones can be ef­fi­ciently used, the col­li­sion risk with oth­er fly­ing ob­jects in­creases. Avoid­ing such col­li­sions would be a sim­pler prob­lem, if all the air­crafts could com­mu­nic­ate with each oth­er and share their loc­a­tion in­form­a­tion. However, it is of­ten the case that either loc­a­tion in­form­a­tion is un­avail­able or com­mu­nic­a­tion is not pos­sible. To en­sure flight safety in this kind of situ­ations drones need a way to autonom­ously de­tect oth­er ob­jects that are in­trud­ing the neigh­bor­ing air­space. Visu­al-based col­li­sion avoid­ance is of par­tic­u­lar in­terest as cam­er­as gen­er­ally con­sume less power and are more light­weight than act­ive sensor al­tern­at­ives. We have there­fore de­veloped a set of in­creas­ingly soph­ist­ic­ated al­gorithms to provide drones with a visu­al col­li­sion avoid­ance cap­ab­il­ity. 

First, we present a nov­el meth­od that com­bines mo­tion and ap­pear­ance in­form­a­tion for de­tect­ing fly­ing ob­jects such as drones and planes that oc­cupy a small part of the cam­era field of view, pos­sibly move in front of com­plex back­grounds, and are filmed by a mov­ing cam­era. Second, in or­der to re­duce the need to col­lect a large train­ing data­set and to manu­ally an­not­ate it, we in­tro­duce a way to gen­er­ate real­ist­ic syn­thet­ic im­ages based on a small set of real ex­amples and a coarse 3D mod­el of the ob­ject. Fi­nally, mo­tiv­ated by the re­cent suc­cess of Deep learn­ing meth­ods, we present deep do­main ad­apt­a­tion ap­proach, that ef­fect­ively lever­ages syn­thet­ic data to im­prove the de­tect­or qual­ity.


No sem­in­ar (East­er)

26.04.2017 Marjan Shahpaski

Title: Sim­ul­tan­eous Geo­met­ric and Ra­diomet­ric Cal­ib­ra­tion of a Pro­ject­or-Cam­era Pair

Ab­stract: We present a meth­od that al­lows for sim­ul­tan­eous geo­met­ric and ra­diomet­ric cal­ib­ra­tion of a pro­ject­or-cam­era pair. It is simple and does not re­quire spe­cial­ized hard­ware. We pre­warp and align a spe­cially de­signed pro­jec­tion pat­tern onto a prin­ted pat­tern of dif­fer­ent col­or­i­met­ric prop­er­ties. After cap­tur­ing the pat­terns in sev­er­al ori­ent­a­tions, we per­form geo­met­ric cal­ib­ra­tion by es­tim­at­ing the corner loc­a­tions of the two pat­terns in dif­fer­ent col­or chan­nels. We per­form ra­diomet­ric cal­ib­ra­tion of the pro­ject­or by us­ing the in­form­a­tion con­tained in­side the pro­jec­ted squares. We show that our meth­od per­forms on par with cur­rent ap­proaches that all re­quire sep­ar­ate geo­met­ric and ra­diomet­ric cal­ib­ra­tion, while be­ing ef­fi­cient and user friendly.

03.05.2017 Nikolaos Arvanitopoulos

Title: Single Im­age Re­flec­tion Sup­pres­sion

Ab­stract: Re­flec­tions are a com­mon ar­ti­fact in im­ages taken through glass win­dows. Re­mov­ing the re­flec­tion ar­ti­facts auto­mat­ic­ally after the pic­ture is taken is an ill-posed prob­lem. At­tempts to solve the prob­lem us­ing op­tim­iz­a­tion schemes there­fore rely on vari­ous pri­or as­sump­tions from the phys­ic­al world. In­stead of re­mov­ing re­flec­tions from a single im­age, which has met with lim­ited suc­cess so far, we pro­pose a nov­el ap­proach to sup­press re­flec­tions. It is based on a Lapla­cian data fi­del­ity term and an l0 gradi­ent sparsity term im­posed on the out­put. With ex­per­i­ments on ar­ti­fi­cial and real-world im­ages we show that our re­flec­tion sup­pres­sion meth­od per­forms bet­ter than the state-of-the-art re­flec­tion re­mov­al tech­niques.

10.05.2017 Leonardo Impett


17.05.2017 Robin Scheibler

Title: Para­met­ric sound source loc­al­iz­a­tion us­ing mi­cro­phone ar­rays

Ab­stract: In this talk, I will present FRIDA—an al­gorithm for es­tim­at­ing dir­ec­tions of ar­rival of mul­tiple wide­band sound sources. FRIDA com­bines multi-band in­form­a­tion co­her­ently and achieves state-of-the-art res­ol­u­tion at ex­tremely low sig­nal-to-noise ra­tios. It works for ar­bit­rary ar­ray lay­outs, but un­like the vari­ous steered re­sponse power and sub­space meth­ods, it does not re­quire a grid search. FRIDA lever­ages re­cent ad­vances in sampling sig­nals with a fi­nite rate of in­nov­a­tion. It is based on the in­sight that for any ar­ray lay­out, the entries of the spa­tial co­v­ari­ance mat­rix can be lin­early trans­formed in­to a uni­formly sampled sum of si­nus­oids.

24.05.2017 Miranda Krekovic

Title: Om­ni­direc­tion­al bats and the price of unique­ness

Ab­stract: We study the prob­lem of sim­ul­tan­eously re­con­struct­ing a poly­gon­al room and a tra­ject­ory of a device equipped with a (nearly) col­loc­ated om­ni­direc­tion­al source and re­ceiv­er. The device meas­ures ar­rival times of echoes of pulses emit­ted by the source and picked up by the re­ceiv­er; it be­haves like a bat with no ca­pa­city for dir­ec­tion­al hear­ing or vo­cal­iz­ing. 
In this talk, I will present an al­gorithm for re­con­struct­ing the 2D geo­metry of a con­vex poly­hed­ral room from a few first-or­der echoes. No pri­or know­ledge about the device's tra­ject­ory is re­quired. We also study unique­ness of the re­con­struc­tion and show that in ad­di­tion to the usu­al in­vari­ance to ri­gid mo­tions, new am­bi­gu­ities arise for im­port­ant classes of rooms and tra­ject­or­ies. We sup­port our the­or­et­ic­al de­vel­op­ments with a num­ber of nu­mer­ic­al ex­per­i­ments, while the ex­per­i­ments with real meas­ured room im­pulse re­sponses are the work in pro­gress.

31.05.2017 Christophe Hery

The sem­in­ar will ex­cep­tion­ally be in BC 420. Chris­tophe is a Glob­al Tech and Re­search tech­nic­al dir­ect­or at Pix­ar, Inc.

Title: His­tory and tax­onomy of sub­sur­face scat­ter­ing ap­proaches in film pro­duc­tion

Ab­stract: In this talk, we will present the vari­ous ded­ic­ated solu­tions com­puter graph­ics prac­ti­tion­ers in the visu­al ef­fects or an­im­a­tion in­dus­tries came up with since the early 2000s, for sim­u­lat­ing skin and oth­er trans­lu­cent me­di­um. In these types of ma­ter­i­als, light pen­et­rates the sur­face and tends to exit at a dif­fer­ent point, cre­at­ing an ap­pear­ance of glow and soft­ness, and al­low­ing strong back il­lu­min­a­tion in thin areas. Di­git­al act­ors and creatures, among them Harry Pot­ter’s Dobby, Lord of the Rings’ Gol­lum, Pir­ates of the Carib­bean’s Davy Jones and the Nav­is in Avatar, have greatly be­nefited from this ef­fect. We will show some of these res­ults and we will also out­line the spe­cif­ic ad­vant­ages and short­com­ings of each meth­od.