Visu­al Com­put­ing Sem­in­ar (Spring 2019)

Food @ 11:50am,
Talk @ 12:15am

General information

The Visu­al com­put­ing sem­in­ar is a weekly sem­in­ar series on top­ics in Visu­al Com­put­ing.

Why: The mo­tiv­a­tion for cre­at­ing this sem­in­ar is that EPFL has a crit­ic­al mass of people who are work­ing on subtly re­lated top­ics in com­pu­ta­tion­al pho­to­graphy, com­puter graph­ics, geo­metry pro­cessing, hu­man–com­puter in­ter­ac­tion, com­puter vis­ion and sig­nal pro­cessing. Hav­ing a weekly point of in­ter­ac­tion will provide ex­pos­ure to in­ter­est­ing work in this area and in­crease aware­ness of our shared in­terests and oth­er com­mon­al­it­ies like the use of sim­il­ar com­pu­ta­tion­al tools — think of this as the visu­al com­put­ing edi­tion of the “Know thy neigh­bor” sem­in­ar series.

Who: The tar­get audi­ence are fac­ulty, stu­dents and postdocs in the visu­al com­put­ing dis­cip­lines, but the sem­in­ar is open to any­one and guests are wel­comed. There is no need to form­ally en­roll in a course. The format is very flex­ible and will in­clude 45 minute talks with Q&A, talks by ex­tern­al vis­it­ors, as well as short­er present­a­tions. In par­tic­u­lar, the sem­in­ar is also in­ten­ded as a way for stu­dents to ob­tain feed­back on short­er ~20min talks pre­ced­ing a present­a­tion at a con­fer­ence. If you are a stu­dent or postdoc in one of the visu­al com­put­ing dis­cip­lines, you’ll prob­ably re­ceive email from me soon on schedul­ing a present­a­tion.

Where and when: every Wed­nes­day in BC03 (note the changed loc­a­tion!).  Food is served at 11:50, and the ac­tu­al talk starts at 12:15.

How to be no­ti­fied: If you want to be kept up to date with an­nounce­ments, please send me an email and I’ll put you on the list. If you are work­ing in LCAV, CVLAB, IVRL, LGG, LSP, IIG, CHILI, LDM or RGL, you are auto­mat­ic­ally sub­scribed to fu­ture an­nounce­ments, so there is noth­ing you need to do.
You may add the sem­in­ar events to Google Cal­en­dar (click the '+' but­ton in the bot­tom-right corner), or down­load the iC­al file.


Date Lecturer Contents
20.02.2019 Thomas Müller

Title: Light-Trans­port Sim­u­la­tion with Ma­chine Learn­ing

Ab­stract: Ma­chine-learn­ing-based tech­niques have conquered many fields by storm, but, un­til re­cently, have seen re­l­at­ively little us­age in phys­ic­ally-based ren­der­ing. This has be­gun to change. In my talk, I will present tech­niques for ac­cel­er­at­ing the sim­u­la­tion of light trans­port with the help of ma­chine learn­ing. I will briefly in­tro­duce two pro­jects in which we learn the ra­di­ance field per­meat­ing volu­met­ric me­dia - grains and at­mo­spher­ic clouds - and I will go in­to more de­tail on an­oth­er pro­ject, in which we learn how to op­tim­ally sample a Monte Carlo es­tim­at­or of the re­flec­tion in­teg­ral. The lat­ter ap­proach con­nects path tra­cing al­gorithms with the field of re­in­force­ment learn­ing and provides a gen­er­al tech­nique for ef­fi­cient Monte Carlo es­tim­a­tion us­ing deep neur­al net­works.

Bio: Thomas Müller is a soon-to-be-gradu­at­ing doc­tor­al stu­dent at ETH Zürich & Dis­ney Re­search, where he also re­ceived his Bach­el­or de­gree (2014) and Mas­ter de­gree (2016). Thomas' re­search fo­cuses on the in­ter­sec­tion of light-trans­port sim­u­la­tion and ma­chine learn­ing. His work was fea­tured on the cov­er of Com­puter Graph­ics For­um, won a Best Pa­per award, led to two pat­ents, and is im­ple­men­ted in pro­duc­tion ren­der­ers at the Walt Dis­ney and Pix­ar An­im­a­tion Stu­di­os.

06.03.2019 Tian Chen

Title: Gen­er­at­ive design of multi-stable sur­faces

Ab­stract: A flat sur­face that can be re­con­figured in­to giv­en 3D tar­get shapes is of great im­port­ance in nu­mer­ous fields at dif­fer­ent length scales, e.g. aero­naut­ic­al sys­tems, ar­chi­tec­tur­al in­stall­a­tions, and tar­geted medi­cine de­liv­ery. Such a mech­an­ic­al sys­tem is able to drastic­ally re­duce de­mands on fab­ric­a­tion and trans­port­a­tion, and en­able pre­cise con­trolled de­ploy­ment. Giv­en an ar­bit­rary tar­get shape, such an in­verse prob­lem is typ­ic­ally tackled by dis­cret­iz­ing the tar­get shape, then map­ping each ele­ment to the flat sur­face. Dur­ing the map­ping pro­cess, either the peri­od­icity or the in­tern­al prop­er­ties of the ele­ments are changed. As this is a geo­met­ric prob­lem, the sys­tem is not ne­ces­sar­ily mech­an­ic­ally stable when re­con­figured in­to the tar­get shape, i.e. when the means of re­con­fig­ur­a­tion is re­moved, the sys­tem will re­vert back to the flat shape. A meth­od is pro­posed for the gen­er­a­tion of flat sur­faces that are able to be re­con­figured in­to a num­ber of tar­get shapes, each of which are mech­an­ic­ally stable. First, the tar­get shapes are dis­cret­ized us­ing a Cheby­shev net. The res­ult­ing quad­ri­lat­er­al ele­ments are mapped to a flat sur­face by ac­count­ing for their de­fects, or ex­cesses in the in­tern­al angles. These are then ac­com­mod­ated by the length­en­ing or short­en­ing of the ad­ded di­ag­on­al mem­bers. By em­bed­ding bistable ele­ments in­to the di­ag­on­al mem­bers, the length change ne­ces­sary is achieved while en­sur­ing mech­an­ic­al sta­bil­ity after the lengths are changed. Us­ing a multi-ma­ter­i­al 3D print­er, this meth­od is demon­strated by fab­ric­at­ing one flat sur­face that re­con­fig­ures in­to two dis­tinct and stable tar­get shapes. The pro­posed meth­od serves as a new dir­ec­tion for the design of re­con­fig­ur­able sys­tems. The com­bin­a­tion of such sys­tems with autonom­ous ac­tiv­a­tion may en­able com­plex, self-re­con­fig­ur­a­tion of sur­faces.

13.03.2019 Tizian Zeltner

Title: Ren­der­ing of spec­u­lar mi­cro­struc­ture us­ing hier­arch­ic­al sampling

Ab­stract: Today's state-of-the-art ma­ter­i­al mod­els used in photoreal­ist­ic ren­der­ing are of very high qual­ity –– but of­ten look too per­fect in prac­tice! They are ex­tremely smooth and lack small sur­face de­tails such as scratches, dents, or oth­er im­per­fec­tions that we can ob­serve al­most every­where in the real world. Ren­der­ing these is a chal­len­ging task however be­cause cur­rent Monte Carlo based meth­ods re­quire a pro­hib­it­ively large num­ber of samples to fully re­solve the tiny and highly dir­ec­tion­al spec­u­lar high­lights that oc­cur in these ma­ter­i­als.
While there has been sig­ni­fic­ant pro­gress to­wards this over the re­cent years, cur­rent ap­proaches are either tailored to a spe­cif­ic class of sur­face stat­ist­ics, com­pu­ta­tion­ally ex­pens­ive, or have large memory re­quire­ments which pre­vents their use in typ­ic­al pro­duc­tion en­vir­on­ments.
In this talk we will cov­er the back­ground and some of the ex­ist­ing work done in this area be­fore dis­cuss­ing our re­lated work-in-pro­gress pro­ject.
We make the ob­ser­va­tion that we can sim­pli­fy the prob­lem by re­ly­ing on a level-of-de­tail rep­res­ent­a­tion of the mi­cro­struc­ture where tiny and sharp high­lights are traded off for lar­ger de­tails with wider an­gu­lar dis­tri­bu­tions. This mo­tiv­ates a stochast­ic sampling al­gorithm that guides the search for high­lights and con­cen­trates com­pu­ta­tion on the rel­ev­ant re­gions only.

20.03.2019 Julian Panetta

Title: X-Shells: A New Class of De­ploy­able Beam Struc­tures

Ab­stract: I will present our work on X-shells, a new class of de­ploy­able struc­tures formed by an en­semble of elast­ic­ally de­form­ing beams coupled through ro­ta­tion­al joints. An X-shell can be as­sembled con­veni­ently in a flat con­fig­ur­a­tion from stand­ard elast­ic beam ele­ments and then be de­ployed through ex­pans­ive force ac­tu­ation in­to the de­sired 3D tar­get state. Dur­ing de­ploy­ment, the coup­ling im­posed by the joints will force the beams to twist and buckle out of plane to main­tain a stat­ic equi­lib­ri­um state. This com­plex in­ter­ac­tion of dis­crete joints and con­tinu­ously de­form­ing beams al­lows in­ter­est­ing 3D forms to emerge.

Sim­u­lat­ing X-shells is chal­len­ging due to un­stable equi­lib­ria oc­cur­ring at the on­set of beam buck­ling. I will present my sim­u­la­tion frame­work based on a dis­crete elast­ic rods mod­el that ro­bustly handles such dif­fi­cult scen­ari­os by ana­lyz­ing and ap­pro­pri­ately modi­fy­ing the elast­ic en­ergy Hes­si­an. This re­al­time sim­u­la­tion forms the basis of a com­pu­ta­tion­al design tool for X-shells that en­ables in­ter­act­ive design space ex­plor­a­tion by vary­ing and op­tim­iz­ing design para­met­ers to achieve a spe­cif­ic design in­tent. We jointly op­tim­ize the as­sembly state and the de­ployed con­fig­ur­a­tion to en­sure the geo­met­ric and struc­tur­al in­teg­rity of the de­ploy­able X-shell. Once a design is fi­nal­ized, we also op­tim­ize for a sparse dis­tri­bu­tion of ac­tu­ation forces to ef­fi­ciently de­ploy a spe­cif­ic X-shell from its flat as­sembly state to its 3D tar­get state.

I will demon­strate the ef­fect­ive­ness of our design ap­proach with a num­ber of design stud­ies and phys­ic­al pro­to­types that high­light the rich­ness of the X-shell design space, en­abling new forms not pos­sible with ex­ist­ing ap­proaches.

27.03.2019 Wei Wang

Title: Re­cur­rent U-Net for Re­source-Con­strained Seg­ment­a­tion

Ab­stract: Real-time seg­ment­a­tion has a wide range of ap­plic­a­tions. For in­stance, real-time bio­med­ic­al im­age seg­ment­a­tion is a help­ful dia­gnost­ic tool, and real time ego­centric hand seg­ment­a­tion is very crit­ic­al for mixed real­ity. Tra­di­tion­al seg­ment­a­tion tech­niques typ­ic­ally fol­low a one-shot ap­proach, where the im­age is passed for­ward only once through a mod­el that pro­duces a seg­ment­a­tion mask. This strategy, however, usu­ally re­quies a very deep mod­el which is very time con­sum­ing and re­quies large GPU memory budget.  U-Net, as a com­pact net­work, is very ef­fi­cient. 

We there­fore in­tro­duce a re­cur­rent U-Net mod­el which can not only run in real time due to the com­pact U-Net back­bone, but also has very good per­form­ance brought by the nov­el gated re­cur­rent ar­chi­tec­ture. The pre­dic­tions can be re­fined gradu­ally after each re­cur­rence. As evid­enced by our res­ults on stand­ard hand seg­ment­a­tion bench­marks and on our own data­set, our ap­proach out­per­forms these oth­er, sim­pler re­cur­rent seg­ment­a­tion tech­niques, and it also achieves com­petet­ive res­ults com­pared with the state-of-the-art meth­ods.

03.04.2019 Krzysztof Lis

Title: De­tect­ing the Un­ex­pec­ted via Im­age Re­syn­thes­is

Ab­stract: Clas­sic­al se­mant­ic seg­ment­a­tion meth­ods, in­clud­ing the re­cent deep learn­ing ones, as­sume that all classes ob­served at test time have been seen dur­ing train­ing.
In real­ity that is usu­ally not the case, for ex­ample in an autonom­ous driv­ing scen­ario one can oc­ca­sion­ally find the un­ex­pec­ted: an­im­als, rocks, fallen branches, snow heaps, or lost cargo on the road.
There are no la­bels for these ob­jects in the pop­u­lar data­sets.
A self-driv­ing vehicle should at least be able to de­tect that some im­age re­gions can­not be labeled prop­erly and war­rant fur­ther at­ten­tion.

I will present our work to de­tect such an­om­al­ous ob­jects in the con­text of se­mant­ic seg­ment­a­tion.
Our ap­proach re­lies on the in­tu­ition that the net­work will pro­duce spuri­ous la­bels in re­gions de­pict­ing un­ex­pec­ted ob­jects.
We re­syn­thes­ize the im­age from the res­ult­ing se­mant­ic map and de­tect dis­crep­an­cies between the im­ages.
In oth­er words, we trans­late the prob­lem of de­tect­ing un­known classes to one of identi­fy­ing poorly-re­syn­thes­ized im­age re­gions.

I will also show our new data­set of road an­om­alies and the semi-auto­mat­ic meth­od used to la­bel them.

10.04.2019 Weizhe Liu

Title: Crowd Count­ing: From Im­age Plane to Head Plane

Ab­stract: State-of-the-art meth­ods for count­ing people in crowded scenes rely on deep net­works to es­tim­ate crowd dens­ity in the im­age plane. While use­ful for this pur­pose, this im­age- plane dens­ity has no im­me­di­ate phys­ic­al mean­ing be­cause it is sub­ject to per­spect­ive dis­tor­tion. This is a con­cern in se­quences ac­quired by drones be­cause the view­point changes of­ten. This dis­tor­tion is usu­ally handled im­pli­citly by either learn­ing scale- in­vari­ant fea­tures or es­tim­at­ing dens­ity in patches of dif­fer­ent sizes, neither of which ac­counts for the fact that scale changes must be con­sist­ent over the whole scene.

In this pa­per, we ex­pli­citly mod­el the scale changes and reas­on in terms of people per square-meter. We show that feed­ing the per­spect­ive mod­el to the net­work al­lows us to en­force glob­al scale con­sist­ency and that this mod­el can be ob­tained on the fly from the drone sensors. In ad­di­tion, it also en­ables us to en­force phys­ic­ally-in­spired tem­por­al con­sist­ency con­straints that do not have to be learned. This yields an al­gorithm that out­per­forms state-of-the-art meth­ods in in­fer­ring crowd dens­ity from a mov­ing drone cam­era es­pe­cially when per­spect­ive ef­fects are strong.

17.04.2019 Yinlin Hu

Title: Seg­ment­a­tion-driv­en 6D Ob­ject Pose Es­tim­a­tion

Ab­stract: The most re­cent trend in es­tim­at­ing the 6D pose of ri­gid ob­jects has been to train deep net­works to either dir­ectly re­gress the pose from the im­age or to pre­dict the 2D loc­a­tions of 3D key­po­ints, from which the pose can be ob­tained us­ing a PnP al­gorithm. In both cases, the ob­ject is treated as a glob­al en­tity, and a single pose es­tim­ate is com­puted. As a con­sequence, the res­ult­ing tech­niques can be vul­ner­able to large oc­clu­sions.
In this talk, we in­tro­duce a seg­ment­a­tion-driv­en 6D pose es­tim­a­tion frame­work where each vis­ible part of the ob­jects con­trib­utes a loc­al pose pre­dic­tion in the form of 2D key­po­int loc­a­tions. We then use a pre­dicted meas­ure of con­fid­ence to com­bine these pose can­did­ates in­to a ro­bust set of 3D-to-2D cor­res­pond­ences, from which a re­li­able pose es­tim­ate can be ob­tained. We out­per­form the state-of-the-art on the chal­len­ging Oc­cluded-LINEMOD and YCB-Video data­sets, which is evid­ence that our ap­proach deals well with mul­tiple poorly-tex­tured ob­jects oc­clud­ing each oth­er. Fur­ther­more, it re­lies on a simple enough ar­chi­tec­ture to achieve real-time per­form­ance.

24.04.2019 Kaicheng Yu

Title: Eval­u­at­ing the Search Phase of Neur­al Ar­chi­tec­ture Search

Ab­stract: Neur­al Ar­chi­tec­ture Search (NAS) aims to fa­cil­it­ate the design of deep net­works for new tasks. Ex­ist­ing tech­niques rely on two stages: search­ing over the ar­chi­tec­ture space and val­id­at­ing the best ar­chi­tec­ture. Eval­u­at­ing NAS al­gorithms is cur­rently solely done by com­par­ing their res­ults on the down­stream task. While in­tu­it­ive, this fails to ex­pli­citly eval­u­ate the ef­fect­ive­ness of their search strategies.

In this pa­per, we ex­tend the NAS eval­u­ation pro­ced­ure to in­clude the search phase. To this end, we com­pare the qual­ity of the solu­tions ob­tained by NAS search policies with that of ran­dom ar­chi­tec­ture se­lec­tion. We find that: (i) On av­er­age, the ran­dom policy out­per­forms state-of-the-art NAS al­gorithms; and (ii) The res­ults and can­did­ate rank­ings of NAS al­gorithms do not reflect the true per­form­ance of the can­did­ate ar­chi­tec­tures. While our former find­ing il­lus­trates the fact that the NAS search space has been suffi­ciently con­strained so that ran­dom solu­tions yield good res­ults, we trace the lat­ter back to the weight shar­ing strategy used by state-of-the-art NAS meth­ods. In con­trast with com­mon be­lief, weight shar­ing neg­at­ively im­pacts the train­ing of good ar­chi­tec­tures, thus re­du­cing the ef­fect­ive­ness of the search pro­cess. We be­lieve that fol­low­ing our eval­u­ation frame­work will be key to design­ing NAS strategies that truly dis­cov­er su­per­i­or ar­chi­tec­tures.

01.05.2019 Delio Vicini

Title: De­nois­ing Deep Monte Carlo Ren­der­ings

Ab­stract We present a nov­el al­gorithm to de­noise deep Monte Carlo ren­der­ings, in which pixels con­tain mul­tiple col­or val­ues, each for a dif­fer­ent range of depths. Deep im­ages are a more ex­press­ive rep­res­ent­a­tion of the scene than con­ven­tion­al flat im­ages. However, since each depth bin re­ceives only a frac­tion of the flat pixel's samples, de­nois­ing the bins is harder due to the less ac­cur­ate mean and vari­ance es­tim­ates. Fur­ther­more, deep im­ages lack a reg­u­lar struc­ture in depth—the num­ber of depth bins and their depth ranges vary across pixels. This pre­vents a straight­for­ward ap­plic­a­tion of patch-based dis­tance met­rics fre­quently used to im­prove the ro­bust­ness of ex­ist­ing de­nois­ing fil­ters. We ad­dress these con­straints by com­bin­ing a flat im­age-space Non-Loc­al Means fil­ter op­er­at­ing on pixel col­ors with a deep cross-bi­lat­er­al fil­ter op­er­at­ing on aux­il­i­ary fea­tures (al­bedo, nor­mal, etc.). Our ap­proach sig­ni­fic­antly re­duces noise in deep im­ages while pre­serving their struc­ture. To our best know­ledge, our al­gorithm is the first to en­able ef­fi­cient deep-com­pos­it­ing work­flows with de­noised Monte Carlo ren­der­ings. We demon­strate the per­form­ance of our fil­ter on a range of scenes high­light­ing the chal­lenges and ad­vant­ages of de­nois­ing deep im­ages.

08.05.2019 Christopher Brandt

Title: Hy­per-Re­duced Pro­ject­ive Dy­nam­ics

Ab­stract: Hy­per-Re­duced Pro­ject­ive Dy­nam­ics is a frame­work for the real-time sim­u­la­tion of elast­ic de­form­able bod­ies. It com­bines the ef­fi­cient Pro­ject­ive Dy­nam­ics meth­od [Bou­aziz et al. 2014] with a mod­el re­duc­tion ap­proach, that will al­low for the sim­u­la­tion of meshes of ar­bit­rary res­ol­u­tion. To achieve this, we re­strict the un­knowns to a sub­space and es­tim­ate the non-lin­ear terms through a nov­el ap­prox­im­a­tion ap­proach. I will provide a short in­tro­duc­tion to phys­ic­al sim­u­la­tions in com­puter graph­ics, mo­tiv­ate the Pro­ject­ive Dy­nam­ics meth­od, de­tail our mod­el re­duc­tion lay­ers and, of course, show some res­ults.

15.05.2019 Fabien Pesquerel

Title: Gen­er­at­ive mod­els for Point Sets & Point Set Dif­fer­en­ti­able Ren­der­ing

Ab­stract: We present a meth­od for learn­ing to gen­er­ate the sur­face of 3D shapes via point sets. 
In our frame­work, 3D point clouds are para­met­er­ized on an un­der­ly­ing Eu­c­lidean grid, al­low­ing us to use stand­ard con­vo­lu­tions and ex­ploit glob­al struc­ture when gen­er­at­ing point sets while re­du­cing the over­all com­plex­ity of our net­work. 
We demon­strate the be­ne­fits of our ap­proach on the ShapeN­et bench­mark for three ap­plic­a­tions: (i) auto-en­cod­ing shapes, (ii) point cloud up­sampling and (iii) single-view re­con­struc­tion from a still im­age. 
As large-scale single-view re­con­struc­tion data­sets are not avail­able, we pro­pose to self-su­per­vise train­ing by mak­ing use of a simple Point Cloud dif­fer­en­ti­able ren­der­er.

22.05.2019 Róger Bermúdez

Title: Re­pur­pos­ing su­per­vised mod­els for new visu­al do­mains

Ab­stract: Train­ing su­per­vised ma­chine learn­ing mod­els for in­fer­ence on new visu­al do­mains re­quires an­nota­tions, which in most cases are dif­fi­cult to ob­tain. Do­main Ad­apt­a­tion tech­niques at­tempt to re­lax this need, by either lever­aging an­not­ated data from dif­fer­ent do­mains or re­pur­pos­ing mod­els trained on data that dif­fers from the new do­main. In this talk, we ex­plore two ideas of how to achieve this: by ex­ploit­ing loc­al sim­il­ar­it­ies between visu­al do­mains, and by learn­ing to se­lect­ively share lay­ers from pre-trained net­works that best re­late to the new visu­al do­main. We of­fer ex­per­i­ment­al evid­ence that both strategies are ef­fect­ive un­su­per­vised do­main ad­apt­a­tion tech­niques for both nat­ur­al im­ages and bio­med­ic­al visu­al data.

29.05.2019 Isinsu Katircioglu

Title: Self-su­per­vised Train­ing of Pro­pos­al-based Seg­ment­a­tion via Back­ground Pre­dic­tion

Ab­stract: While su­per­vised ob­ject de­tec­tion meth­ods achieve im­press­ive ac­cur­acy, they gen­er­al­ize poorly to im­ages whose ap­pear­ance sig­ni­fic­antly dif­fers from the data they have been trained on. To ad­dress this in scen­ari­os where an­not­at­ing data is pro­hib­it­ively ex­pens­ive, we in­tro­duce a self-su­per­vised ap­proach to ob­ject de­tec­tion and seg­ment­a­tion, able to work with mon­ocu­lar im­ages cap­tured with a mov­ing cam­era. At the heart of our ap­proach lies the ob­ser­va­tion that seg­ment­a­tion and back­ground re­con­struc­tion are linked tasks, and the idea that, be­cause we ob­serve a struc­tured scene, back­ground re­gions can be re-syn­thes­ized from their sur­round­ings, where­as re­gions de­pict­ing the ob­ject can­not.

We there­fore en­code this in­tu­ition as a self-su­per­vised loss func­tion that we ex­ploit to train a pro­pos­al-based seg­ment­a­tion net­work. To ac­count for the dis­crete nature of ob­ject pro­pos­als, we de­vel­op a Monte Carlo-based train­ing strategy that al­lows us to ex­plore the large space of ob­ject pro­pos­als. Our ex­per­i­ments demon­strate that our ap­proach yields ac­cur­ate de­tec­tions and seg­ment­a­tions in im­ages that visu­ally de­part from those of stand­ard bench­marks, out­per­form­ing ex­ist­ing self-su­per­vised meth­ods and  ap­proach­ing weakly su­per­vised ones that ex­ploit large an­not­ated data­sets.