Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion

Abstract

Rendering and inverse rendering are pivotal tasks in both computer vision and graphics. The rendering equation is the core of the two tasks, as an ideal conditional distribution transfer function from intrinsic properties to RGB images. Despite achieving promising results of existing rendering methods, they merely approximate the ideal estimation for a specific scene and come with a high computational cost. Additionally, the inverse conditional distribution transfer is intractable due to the inherent ambiguity. To address these challenges, we propose a data-driven method that jointly models rendering and inverse rendering as two conditional generation tasks within a single diffusion framework. Inspired by UniDiffuser, we utilize two distinct time schedules to model both tasks, and with a tailored dual streaming module, we achieve cross-conditioning of two pre-trained diffusion models. This unified approach, named Uni-Renderer, allows the two processes to facilitate each other through a cycle-consistent constrain, mitigating ambiguity by enforcing consistency between intrinsic properties and rendered images. Combined with a meticulously prepared dataset, our method effectively decomposition of intrinsic properties and demonstrating a strong capability to recognize changes during rendering.

Our Method

During training, both attribute and RGB images are input to a unified model with pre-trained VAE encoders. The timestep selector plays a crucial role by adjusting the timesteps for each branch. Specifically, it ensures that one branch (either the attribute or RGB) has a timestep of 0, while the other branch selects a timestep from \( t \in [0, T] \). This mechanism allows our model to effectively learn the conditional distributions \( q(\mathbf{x_0}|\mathbf{y_0}) \), \( q(\mathbf{y_0}|\mathbf{x_0}) \) in alternating iterations. During rendering and inverse rendering, the corresponding conditions are input to the model with a timestep of 0, and the attributes/RGB images are generated through a sampled noise. (The VAE encoder and decoder are omitted for simplicity.)

Our Application

We demonstrate smooth changes via rendering for different metallic and roughness strengths. The rendering is performed giving different combinations of the attributes. When the roughness value was set to 1, the cake and clock case shown in the top left are without specular highlights. When the metallic value was set to 1, the orange and baseball cases appeared to be metallic and revealing object illumination. Best viewed in color.

Comparison with other methods

Albedo comparison. Albedo Comparison of Uni-Renderer with baseline methods. We compared 4 learning-based methods and 2 optimization-based methods. Among all, Uni-renderer yields the most realistic results. Best viewed in color.

Relighting Comparsion. The relighting comparison is performed on validation objects. We first inverse render the input RGB to acquire the intrinsics, and then we updated the lighting information to get the relighting results. The leftmost column is the reference environment lighting. Best viewed in color.

Normal Comparison. Normal Comparison of Uni-Renderer with others methods.Best viewed in color.

Real World Inversing

We present different real-world inversing cases under different lighting conditions. Best viewed in color.

BibTeX

@article{chen2024uni,
      title={Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion},
      author={Chen, Zhifei and Xu, Tianshuo and Ge, Wenhang and Wu, Leyi and Yan, Dongyu and He, Jing and Wang, Luozhou and Zeng, Lu and Zhang, Shunsi and Chen, Yingcong},
      journal={arXiv preprint arXiv:2412.15050},
      year={2024}
    }