Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing

Yoel Levy1, David Shavin1, Itai Lang2, Sagie Benaim1
1The Hebrew University of Jerusalem 2University of Chicago

Abstract


Recent work has demonstrated the ability to leverage, or distill, pre-trained 2D features, obtained using large pre-trained 2D models, into 3D features, enabling impressive 3D editing and understanding capabilities, using 2D supervision only. Although impressive, models assume that 3D features are captured using a single feature field and often make a simplifying assumption that features are view-independent. In this work, we propose instead to capture 3D features us- ing multiple disentangled feature fields that capture different structural components of 3D features involving view-dependent and view-independent components, which can be learned from 2D feature supervision only. Subsequently, each component can be controlled in isolation, enabling semantic and structural editing and understanding capabilities. For instance, using a user click, one can segment 3D features that correspond to a given object, and then segment, edit, or remove their view-dependent (reflective) properties. We evaluate our approach on the task of 3D segmentation and demonstrate a set of novel understanding and editing tasks.

Method Overview


Segmentation

Our method ability to segment object using only the independent part of a feature vector.


Highlight Segmentation

We leverage our ability to use the dependent part of a feature vector to show a novel ability of highlights and reflections segmentation.


Edit Color

Edit capabilities of the independent and dependent (or both) color component after segmenting.


Edit Roughness and Highlight

Edit capabilities of roughness and removal of dependent color after segmenting.