Visu­al Com­put­ing Sem­in­ar (Fall 2018)

Food @ 11:50am,
Talk @ 12:15am
Delio Vicini

General information

The Visu­al com­put­ing sem­in­ar is a weekly sem­in­ar series on top­ics in Visu­al Com­put­ing.

Why: The mo­tiv­a­tion for cre­at­ing this sem­in­ar is that EPFL has a crit­ic­al mass of people who are work­ing on subtly re­lated top­ics in com­pu­ta­tion­al pho­to­graphy, com­puter graph­ics, geo­metry pro­cessing, hu­man–com­puter in­ter­ac­tion, com­puter vis­ion and sig­nal pro­cessing. Hav­ing a weekly point of in­ter­ac­tion will provide ex­pos­ure to in­ter­est­ing work in this area and in­crease aware­ness of our shared in­terests and oth­er com­mon­al­it­ies like the use of sim­il­ar com­pu­ta­tion­al tools — think of this as the visu­al com­put­ing edi­tion of the “Know thy neigh­bor” sem­in­ar series.

Who: The tar­get audi­ence are fac­ulty, stu­dents and postdocs in the visu­al com­put­ing dis­cip­lines, but the sem­in­ar is open to any­one and guests are wel­comed. There is no need to form­ally en­roll in a course. The format is very flex­ible and will in­clude 45 minute talks with Q&A, talks by ex­tern­al vis­it­ors, as well as short­er present­a­tions. In par­tic­u­lar, the sem­in­ar is also in­ten­ded as a way for stu­dents to ob­tain feed­back on short­er ~20min talks pre­ced­ing a present­a­tion at a con­fer­ence. If you are a stu­dent or postdoc in one of the visu­al com­put­ing dis­cip­lines, you’ll prob­ably re­ceive email from me soon on schedul­ing a present­a­tion.

Where and when: every Wed­nes­day in BC01 (note the changed loc­a­tion!).  Food is served at 11:50, and the ac­tu­al talk starts at 12:15.

How to be no­ti­fied: If you want to be kept up to date with an­nounce­ments, please send me an email and I’ll put you on the list. If you are work­ing in LCAV, CVLAB, IVRL, LGG, LSP, IIG, CHILI, LDM or RGL, you are auto­mat­ic­ally sub­scribed to fu­ture an­nounce­ments, so there is noth­ing you need to do.
You may add the sem­in­ar events to Google Cal­en­dar (click the '+' but­ton in the bot­tom-right corner), or down­load the iC­al file.


Date Lecturer Contents
19.09.2018 Delio Vicini
03.10.2018 Cyrille Favreau

Blue Brain Brayns, A plat­form for high fi­del­ity large-scale and in­ter­act­ive visu­al­iz­a­tion of sci­entif­ic data and brain struc­tures

The Blue Brain Pro­ject has made ma­jor ef­forts to cre­ate mor­pho­lo­gic­ally ac­cur­ate neur­ons to sim­u­late sub-cel­lu­lar and elec­tric­al activ­it­ies, for ex­ample, mo­lecu­lar sim­u­la­tions of neur­on bio­chem­istry or multi-scale sim­u­la­tions of neur­on­al func­tion.

One of the keys to­wards un­der­stand­ing how the brain works as a whole, is visu­al­iz­a­tion of how the in­di­vidu­al cells func­tion. In par­tic­u­lar, the more mor­pho­lo­gic­ally ac­cur­ate the visu­al­iz­a­tion can be, the easi­er it is for ex­perts in the bio­lo­gic­al field to val­id­ate cell struc­tures; photo-real­ist­ic ren­der­ing is there­fore im­port­ant. Brayns is a visu­al­iz­a­tion plat­form that can in­ter­act­ively per­form high-qual­ity and high-fi­del­ity ren­der­ing of neur­os­cience

large data sets. Thanks to its cli­ent/serv­er ar­chi­tec­ture, Brayns can be run in the cloud as well as on a su­per­com­puter, and stream the ren­der­ing to any browser, either in a web UI or a Jupy­ter note­book.

At the Blue Brain pro­ject, the Visu­al­iz­a­tion team makes in­tens­ive use of Blue Brain Brayns to pro­duce ul­tra-high res­ol­u­tion movies (8K) and high-fi­del­ity im­ages for sci­entif­ic pub­lic­a­tions. Brayns is also used to serve im­mers­ive visu­al­iz­a­tion on the large dis­plays, as well as unique devices such as the curved Open­Deck loc­ated at the Blue Brain of­fice.

Brayns is also de­signed to ac­cel­er­ate sci­entif­ic visu­al­iz­a­tion, and to ad­apt to the large num­ber of en­vir­on­ments. Thanks to its mod­u­lar ar­chi­tec­ture, Brayns makes it easy to use vari­ous ren­der­ing back-ends such as In­tel's OS­PRay (CPU) or NVIDIA's Op­tiX for ex­ample. Every sci­entif­ic use-case such as DICOM, DTI, Blue Brain re­search, etc, is a stan­dalone plug-in that runs on top of Brayns, al­low­ing sci­ent­ists and re­searches to be­ne­fit from a high per­form­ance/fi­del­ity/qual­ity ren­der­ing sys­tem, without hav­ing to deal with the tech­nic­al com­plex­ity of it.

Brayns cur­rently im­ple­ments an num­ber of ba­sic prim­it­ives such as meshes, volumes, point clouds, para­met­ric geo­met­ries, and pi­on­eers new ren­der­ing mod­al­it­ies for sci­entif­ic visu­al­iz­a­tion, like signed dis­tance fields.
Dur­ing this talk, I will ex­plain the mo­tiv­a­tions be­hind the cre­ation of the Brayns plat­form, give some tech­nic­al in­sight about the ar­chi­tec­ture of the sys­tem and the vari­ous tech­niques that we already use to render data­sets. I will also de­scribe how new data­sets, as well as ren­der­ing com­pon­ents (en­gines, shaders, ma­ter­i­als, etc), can be ad­ded to the plat­form.

Links: ht­tps://git­hub.com/BlueBrain/Brayns

10.10.2018 Kaicheng Yu

Over­com­ing neur­al brain­wash­ing

We identi­fy a phe­nomen­on, which we dub neur­al brain­wash­ing, that oc­curs when se­quen­tially train­ing mul­tiple deep net­works with par­tially-shared para­met­ers; the per­form­ance of pre­vi­ously-trained mod­els de­grades as one op­tim­izes a sub­sequent one, due to the over­writ­ing of shared para­met­ers. To over­come this, we in­tro­duce a stat­ist­ic­ally-jus­ti­fied weight plas­ti­city loss that reg­u­lar­izes the learn­ing of a mod­el's shared para­met­ers ac­cord­ing to their im­port­ance for the pre­vi­ous mod­els, and demon­strate its ef­fect­ive­ness when train­ing two mod­els se­quen­tially and for neur­al ar­chi­tec­ture search. Adding weight plas­ti­city in neur­al ar­chi­tec­ture search pre­serves the best mod­els to the end of the search pro­cess lead­ing to im­proved res­ults in both nat­ur­al lan­guage pro­cessing and com­puter vis­ion tasks.

17.10.2018 Erhan Gündogdu

Good Fea­tures to Cor­rel­ate for Visu­al Track­ing

In this talk, I will mainly talk about visu­al ob­ject track­ing prob­lem which I worked on dur­ing my Ph.D. stud­ies. As a sec­ond­ary top­ic, I will men­tion my re­cent re­search activ­it­ies about gar­ment vir­tu­al­iz­a­tion by deep learn­ing.

Visu­al Track­ing
Es­tim­at­ing ob­ject mo­tion is one of the key com­pon­ents of video pro­cessing and the first step in ap­plic­a­tions which re­quire video rep­res­ent­a­tion. Visu­al ob­ject track­ing is one way of ex­tract­ing this com­pon­ent, and it is one of the ma­jor prob­lems in the field of com­puter vis­ion. Nu­mer­ous dis­crim­in­at­ive and gen­er­at­ive ma­chine learn­ing ap­proaches have been em­ployed to solve this prob­lem. Re­cently, cor­rel­a­tion fil­ter based (CFB) ap­proaches have been pop­u­lar due to their com­pu­ta­tion­al ef­fi­ciency and not­able per­form­ances on bench­mark data­sets. The ul­ti­mate goal of CFB ap­proaches is to find a fil­ter (i.e., tem­plate) which can pro­duce high cor­rel­a­tion out­puts around the ac­tu­al ob­ject loc­a­tion and low cor­rel­a­tion out­puts around the loc­a­tions that are far from the ob­ject. Nev­er­the­less, CFB visu­al track­ing meth­ods suf­fer from many chal­lenges, such as oc­clu­sion, ab­rupt ap­pear­ance changes, fast mo­tion and ob­ject de­form­a­tion. The main reas­ons of these suf­fer­ings are for­get­ting the past poses of the ob­jects due to the simple up­date stages of CFB meth­ods, non-op­tim­al mod­el up­date rate and fea­tures that are not in­vari­ant to ap­pear­ance changes of the tar­get ob­ject.
To ad­dress the afore­men­tioned dis­ad­vant­ages of CFB visu­al track­ing meth­ods, this work in­cludes three ma­jor con­tri­bu­tions. First, a spa­tial win­dow learn­ing meth­od is pro­posed to im­prove the cor­rel­a­tion qual­ity. For this pur­pose, a win­dow that is to be ele­ment-wise mul­ti­plied by the ob­ject ob­ser­va­tion (or the cor­rel­a­tion fil­ter) is learned by a nov­el gradi­ent des­cent pro­ced­ure. The learned win­dow is cap­able of sup­press­ing/high­light­ing the ne­ces­sary re­gions of the ob­ject, and can im­prove the track­ing per­form­ance in the case of oc­clu­sions and ob­ject de­form­a­tion. As the second con­tri­bu­tion, an en­semble of track­ers al­gorithm is pro­posed to handle the is­sues of non-op­tim­al learn­ing rate and for­get­ting the past poses of the ob­ject. The track­ers in the en­semble are or­gan­ized in a bin­ary tree, which stores in­di­vidu­al ex­pert track­ers at its nodes. Dur­ing the course of track­ing, the rel­ev­ant ex­pert track­ers to the most re­cent ob­ject ap­pear­ance are ac­tiv­ated and util­ized in the loc­al­iz­a­tion and up­date stages. The pro­posed en­semble meth­od sig­ni­fic­antly im­proves the track­ing ac­cur­acy, es­pe­cially when the ex­pert track­ers are se­lec­ted as the CFB track­ers util­iz­ing the pro­posed win­dow learn­ing meth­od. The fi­nal con­tri­bu­tion of this work ad­dresses the fea­ture learn­ing prob­lem spe­cific­ally fo­cused on the CFB visu­al track­ing loss func­tion. For this loss func­tion, a nov­el back­propaga­tion al­gorithm is de­veloped to train any fully deep con­vo­lu­tion­al neur­al net­work. The pro­posed gradi­ent cal­cu­la­tion, which is re­quired for back­propaga­tion, is per­formed ef­fi­ciently in both fre­quency and im­age do­main, and has a lin­ear com­plex­ity with the num­ber of fea­ture maps. The train­ing of the net­work mod­el is ful­filled on care­fully cur­ated data­sets in­clud­ing well-known dif­fi­culties of visu­al track­ing, e.g., oc­clu­sion, ob­ject de­form­a­tion and fast mo­tion. When the learned fea­tures are in­teg­rated to the state-of-the-art CFB visu­al track­ers, fa­vour­able track­ing per­form­ance is ob­tained on bench­mark data­sets against the CFB meth­ods that em­ploy hand-craf­ted fea­tures or deep fea­tures ex­trac­ted from the pre-trained clas­si­fic­a­tion mod­els.

Gar­ment sim­u­la­tion by deep learn­ing
Gar­ment sim­u­la­tion is a use­ful tool for vir­tu­al try-on, on­line shop­ping, gam­ing in­dustry, vir­tu­al real­ity and so forth. Real­ist­ic sim­u­la­tion of gar­ments on dif­fer­ent body shapes and poses by the help of a phys­ic­ally-based sim­u­la­tion (PBS) is a com­pu­ta­tion­ally heavy task which re­quires spe­cial para­met­er tun­ing for dif­fer­ent body shapes and mo­tion types. Hence, data-driv­en meth­ods that mod­el PBS ap­proaches for the fit­ted gar­ments on tar­get bod­ies are prefer­able for both com­pu­ta­tion­al con­cerns and gen­er­al­iz­a­tion pur­poses. Con­cretely, a PBS ap­proach with non-op­tim­al para­met­riz­a­tion can out­put sim­u­la­tion res­ults with un­desir­able cloth-body in­ter­pen­et­ra­tion. However, a data-driv­en mod­el such as a deep neur­al net­work can be trained by con­sid­er­ing ad­di­tion­al loss terms which will pre­vent in­ter­pen­et­ra­tion. Our meth­od presents a solu­tion for 3D gar­ment fit­ting on dif­fer­ent tar­get body shapes and poses without any post pro­cessing step such as cloth-body in­ter­pen­et­ra­tion, tight­ness, smooth­ing which are re­quired for PBS tools such as NvCloth. For the fore­see­able mis­takes of the learned mod­el, the con­straints are in­cluded in the train­ing loss func­tion of the pro­posed net­work mod­el. Hence, the net­work mod­el seam­lessly pre­dicts the fit­ted gar­ment giv­en the in­put tem­plate gar­ment and the tar­get body in a cer­tain pose.

24.10.2018 Krishna Kanth Nakka

Deep At­ten­tion­al Struc­tured Rep­res­ent­a­tion Learn­ing for Visu­al Re­cog­ni­tion

Struc­tured rep­res­ent­a­tions, such as Bags of Words, VLAD and Fish­er Vec­tors, have proven highly ef­fect­ive to tackle com­plex visu­al re­cog­ni­tion tasks. As such, they have re­cently been in­cor­por­ated in­to deep ar­chi­tec­tures. However, while ef­fect­ive, the res­ult­ing deep struc­tured rep­res­ent­a­tion learn­ing strategies typ­ic­ally ag­greg­ate loc­al fea­tures from the en­tire im­age, ig­nor­ing the fact that, in com­plex re­cog­ni­tion tasks, some re­gions provide much more dis­crim­in­at­ive in­form­a­tion than oth­ers. In this work, we in­tro­duce an at­ten­tion­al struc­tured rep­res­ent­a­tion learn­ing frame­work that in­cor­por­ates an im­age-spe­cif­ic at­ten­tion mech­an­ism with­in the ag­greg­a­tion pro­cess. Our frame­work learns to pre­dict jointly the im­age class la­bel and an at­ten­tion map in an end-to-end fash­ion and without any oth­er su­per­vi­sion than the tar­get la­bel. As evid­enced by our ex­per­i­ments, this con­sist­ently out­per­forms at­ten­tion-less struc­tured rep­res­ent­a­tion learn­ing and yields state-of-the-art res­ults on stand­ard scene re­cog­ni­tion and fine-grained cat­egor­iz­a­tion bench­marks.

31.10.2018 Kevin Gonyop Kim

Ex­pand­ing ex­per­i­ence of the learners in vo­ca­tion­al edu­ca­tion

Vo­ca­tion­al edu­ca­tion and train­ing (VET) that takes place in dual con­texts of school and work­place is a well-es­tab­lished sec­ond­ary edu­ca­tion sys­tem in Switzer­land. Al­though it is known as an ef­fect­ive sys­tem for de­vel­op­ing vo­ca­tion­al com­pet­ence, there ex­ists some gap between what they are sup­posed to learn and what they prac­tice at work­places. The work­place ex­per­i­ences are usu­ally lim­ited to con­crete situ­ations in par­tic­u­lar en­vir­on­ments and their con­nec­tions to the gen­er­al know­ledge learned from schools are of­ten weak.
The cent­ral hy­po­thes­is of the Dual-T pro­ject is that di­git­al tech­no­lo­gies can serve as “bridges” over this school-work­place gap and en­hance the learn­ing ex­per­i­ences of the learners.  The goal of my re­search in this pro­ject is to design a way to ex­pand the work­place ex­per­i­ence so that the learner can ex­plore broad­er space of prac­tice. It is an ex­plor­at­ory re­search on how the learners in VET ac­cept the so­cially and syn­thet­ic­ally ex­pan­ded ex­per­i­ences and how they ex­plore them. In this present­a­tion, I will present some of the on­go­ing ap­plic­a­tions to flor­ist and garden­er ap­pren­tices as well as our pre­vi­ous work on lo­gist­ics and car­pentry.

07.11.2018 Peng Song

DESIA: A Gen­er­al Frame­work for Design­ing In­ter­lock­ing As­sem­blies

In­ter­lock­ing as­sem­blies have a long his­tory in the design of puzzles, fur­niture, ar­chi­tec­ture, and oth­er com­plex geo­met­ric struc­tures. The key de­fin­ing prop­erty of in­ter­lock­ing as­sem­blies is that all com­pon­ent parts are im­mob­il­ized by their geo­met­ric ar­range­ment, pre­vent­ing the as­sembly from fall­ing apart. Com­puter graph­ics re­search has re­cently con­trib­uted design tools that al­low cre­at­ing new in­ter­lock­ing as­sem­blies. However, these tools fo­cus on spe­cif­ic kinds of as­sem­blies and ex­plore only a lim­ited space of in­ter­lock­ing con­fig­ur­a­tions, which re­stricts their ap­plic­ab­il­ity for design.

In this talk, we present a new gen­er­al frame­work for design­ing in­ter­lock­ing as­sem­blies. The core idea is to rep­res­ent part re­la­tion­ships with

a fam­ily of base Dir­ec­tion­al Block­ing Graphs and lever­age ef­fi­cient graph ana­lys­is tools to com­pute an in­ter­lock­ing ar­range­ment of parts. This avoids the ex­po­nen­tial com­plex­ity of brute-force search. Our al­gorithm it­er­at­ively con­structs the geo­metry of as­sembly com­pon­ents, tak­ing ad­vant­age of all ex­ist­ing block­ing re­la­tions for con­struct­ing suc­cess­ive parts. As a res­ult, our ap­proach sup­ports a wider range of as­sembly forms com­pared to pre­vi­ous meth­ods and provides sig­ni­fic­antly more design flex­ib­il­ity. We show that our frame­work fa­cil­it­ates ef­fi­cient design of com­plex in­ter­lock­ing as­sem­blies, in­clud­ing new solu­tions that can­not be achieved by state of the art ap­proaches.

14.11.2018 Edoardo Remelli

Deep Shape Op­tim­isa­tion

Aero­dy­nam­ic shape op­tim­iz­a­tion has many in­dus­tri­al ap­plic­a­tions. Ex­ist­ing meth­ods, however, are so com­pu­ta­tion­ally de­mand­ing that typ­ic­al en­gin­eer­ing prac­tices are to either simply try a lim­ited num­ber of hand-de­signed shapes or re­strict one­self to shapes that can be para­met­er­ized us­ing only few de­grees of free­dom. In this work, we in­tro­duce a new way to op­tim­ize com­plex shapes fast and ac­cur­ately. To this end, we train Geodes­ic Con­vo­lu­tion­al Neur­al Net­works to emu­late a flu­id dy­nam­ics sim­u­lat­or. The key to mak­ing this ap­proach prac­tic­al is remesh­ing the ori­gin­al shape us­ing a poly­cube map, which makes it pos­sible to per­form the com­pu­ta­tions on GPUs in­stead of CPUs. The neur­al net is then used to for­mu­late an ob­ject­ive func­tion that is dif­fer­en­ti­able with re­spect to the shape para­met­ers, which can then be op­tim­ised us­ing a gradi­ent-based tech­nique. This out­per­forms state-of-the-art meth­ods by 5 to 20% for stand­ard prob­lems and, even more im­port­antly, our ap­proach ap­plies to cases that pre­vi­ous meth­ods can­not handle.

21.11.2018 Wenzel Jakob

Cap­tur­ing and ren­der­ing the world of ma­ter­i­als

One of the key in­gredi­ents of any real­ist­ic ren­der­ing sys­tem is a de­scrip­tion of the way in which light in­ter­acts with ob­jects, typ­ic­ally modeled via the Bi­d­irec­tion­al Re­flect­ance Dis­tri­bu­tion Func­tion (BRDF). Un­for­tu­nately, real-world BRDF data re­mains ex­tremely scarce due to the dif­fi­culty of ac­quir­ing it: a BRDF meas­ure­ment re­quires scan­ning a four-di­men­sion­al do­main at high res­ol­u­tion—an in­feas­ibly time-con­sum­ing pro­cess.

In this talk, I'll show­case our on­go­ing work on as­sem­bling a large lib­rary of ma­ter­i­als in­clud­ing in­clud­ing metals, fab­rics, or­gan­ic sub­stances like wood or plant leaves, etc. The key idea to work around the curse of di­men­sion­al­ity is an ad­apt­ive para­met­er­iz­a­tion, which auto­mat­ic­ally warps the 4D space so that most of the volume maps to “in­ter­est­ing” re­gions. Start­ing with a re­view of BRDF mod­els and mi­cro­fa­cet the­ory, I'll ex­plain the new mod­el, as well as the op­tic­al meas­ure­ment ap­par­at­us that we used to con­duct the meas­ure­ments.

28.11.2018 Jan Bednarík

De­form­able sur­face re­con­struc­tion from a single view

Re­cent years have seen the de­vel­op­ment of ma­ture solu­tions for re­con­struct­ing de­form­able sur­faces from a single im­age, provided that they are re­l­at­ively well-tex­tured. By con­trast, re­cov­er­ing the 3D shape of tex­ture-less sur­faces re­mains an open prob­lem, and es­sen­tially relates to Shape-from-Shad­ing. In this pa­per, we in­tro­duce a data-driv­en ap­proach to this prob­lem. We in­tro­duce a gen­er­al frame­work that can pre­dict di­verse 3D rep­res­ent­a­tions, such as meshes, nor­mals, and depth maps. Our ex­per­i­ments show that meshes are ill-suited to handle tex­ture-less 3D re­con­struc­tion in our con­text. Fur­ther­more, we demon­strate that our ap­proach gen­er­al­izes well to un­seen ob­jects, and that it yields high­er-qual­ity re­con­struc­tions than a state-of-the-art SfS tech­nique, par­tic­u­larly in terms of nor­mal es­tim­ates. Our re­con­struc­tions ac­cur­ately mod­el the fine de­tails of the sur­faces, such as the creases of a T-Shirt worn by a per­son. Since the 3D shape re­con­struc­tion from a single view is known to be sub­ject to am­bi­gu­ities due to the fact that vari­ous com­bin­a­tions of shape, light­ing and ma­ter­i­al res­ult in the same 2D ob­ser­va­tion, we fur­ther ex­plore the pos­sib­il­it­ies to pre­dict not just a single but mul­tiple likely shapes and we look in­to learn­ing a bet­ter shape rep­res­ent­a­tion.

05.12.2018 Zheng Dang

Ei­gen­decom­pos­i­tion-free Train­ing of Deep Net­works with Zero Ei­gen­value-based Losses

Mo­tion and pose es­tim­a­tion from 3D to 2D cor­res­pond­ences, can be solved by find­ing the ei­gen­vector cor­res­pond­ing to the smal­lest, or zero, ei­gen­value of a mat­rix rep­res­ent­ing a lin­ear sys­tem. In­cor­por­at­ing this in deep learn­ing frame­works would al­low us to ex­pli­citly en­code known no­tions of geo­metry, in­stead of hav­ing the net­work im­pli­citly learn them from data. However, per­form­ing ei­gen­decom­pos­i­tion with­in a net­work re­quires the abil­ity to dif­fer­en­ti­ate this op­er­a­tion. Un­for­tu­nately, while the­or­et­ic­ally doable, this in­tro­duces nu­mer­ic­al in­stabil­ity in the op­tim­iz­a­tion pro­cess in prac­tice.
In this pa­per, we in­tro­duce an ei­gen­decom­pos­i­tion-free ap­proach to train­ing a deep net­work whose loss de­pends on the ei­gen­vector cor­res­pond­ing to a zero ei­gen­value of a mat­rix pre­dicted by the net­work. We demon­strate on sev­er­al tasks, in­clud­ing key­po­int match­ing and 3D pose es­tim­a­tion, that our ap­proach is much more ro­bust than ex­pli­cit dif­fer­en­ti­ation of the ei­gen­decom­pos­i­tion, It has bet­ter con­ver­gence prop­er­ties and yields state-of-the-art res­ults on both tasks.

12.12.2018 Sena Kiciroglu

Act­ive Drone Based Hu­man Pose Es­tim­a­tion

Re­con­struc­tion of 3D hu­man pose has be­come a widely stud­ied dir­ec­tion of re­search, with a re­cent in­terest in out­door and drone-based cap­ture. The rising pop­ular­ity of com­mer­cially avail­able drones with on-board cam­er­as makes this form of mo­tion cap­ture (Mo­Cap) ac­cess­ible to the con­sumer mar­ket. However, little work has been done on con­trolling the drone to max­im­ize re­con­struc­tion ac­cur­acy. Ex­ist­ing drone-based Mo­Cap solu­tions use pre-defined con­trol­lers, such as fol­low­ing the per­son in a con­stant angle, or at con­stant ro­ta­tion. On the oth­er hand, the ro­bot­ics lit­er­at­ure mostly cov­ers the act­ive re­con­struc­tion of stat­ic scenes. Our goal is to act­ively re-po­s­i­tion a drone dur­ing Mo­Cap so that it moves to the po­s­i­tion where it will have the highest ac­cur­acy. Key to our meth­od is to es­tim­ate the ex­pec­ted re­con­struc­tion un­cer­tainty in the pres­ence of dy­nam­ic mo­tion. Our goal in the end is to show that our act­ive mo­tion plan­ning im­proves the pose es­tim­a­tion res­ults by com­par­ing it against sev­er­al baseline policies.

19.12.2018 Helge Rhodin

Neur­al Scene De­com­pos­i­tion for Hu­man Mo­tion Cap­ture

Learn­ing gen­er­al im­age rep­res­ent­a­tions has proven key to the suc­cess of many com­puter vis­ion tasks. For ex­ample, many ap­proaches to im­age un­der­stand­ing prob­lems rely on deep net­works that were ini­tially trained on Im­ageN­et. However, when it comes to 3D re­con­struc­tion, those fea­tures learned on Im­ageN­et are only of lim­ited use.

We there­fore pro­pose an ap­proach to learn­ing rep­res­ent­a­tions that are use­ful for this pur­pose. To this end, we in­tro­duce a self-su­per­vised ap­proach to learn­ing what we call a neur­al scene de­com­pos­i­tion (NSD) that can be ex­ploited for 3D pose es­tim­a­tion. NSD com­prises three lay­ers of ab­strac­tion to rep­res­ent hu­man sub­jects: A bound­ing-box; a 2D shape rep­res­ent­a­tion in terms of an in­stance seg­ment­a­tion mask; and sub­ject-spe­cif­ic ap­pear­ance and 3D pose in­form­a­tion. Our NSD mod­el can be trained end-to-end without any 2D or 3D su­per­vi­sion by ex­ploit­ing self-su­per­vi­sion com­ing from mul­tiview data. Be­cause it en­codes 3D geo­metry, NSD can then be ef­fect­ively lever­aged to train a 3D pose es­tim­a­tion net­work from small amounts of an­not­ated data. NSD is also well suited for CG ap­plic­a­tions, such as the seam­less trans­ition between two video per­spect­ives and nov­el view syn­thes­is.