Structurally Disentangled

Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing

Yoel Levy¹, David Shavin¹, Itai Lang², Sagie Benaim¹

¹The Hebrew University of Jerusalem ²University of Chicago

Abstract

Recent work has demonstrated the ability to leverage, or distill, pre-trained 2D features, obtained using large pre-trained 2D models, into 3D features, enabling impressive 3D editing and understanding capabilities, using 2D supervision only. Although impressive, models assume that 3D features are captured using a single feature field and often make a simplifying assumption that features are view-independent. In this work, we propose instead to capture 3D features us- ing multiple disentangled feature fields that capture different structural components of 3D features involving view-dependent and view-independent components, which can be learned from 2D feature supervision only. Subsequently, each component can be controlled in isolation, enabling semantic and structural editing and understanding capabilities. For instance, using a user click, one can segment 3D features that correspond to a given object, and then segment, edit, or remove their view-dependent (reflective) properties. We evaluate our approach on the task of 3D segmentation and demonstrate a set of novel understanding and editing tasks.

Segmentation