ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

1Amazon, 2The University of Sydney, 3The University of Adelaide

ViewFusion generates novel-view images and promise multi-view consistentcy based on a single-view RGB image.

Abstract

Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images. Yet, the independent process of image generation in these prevailing methods leads to challenges in maintaining multiple view consistency.

To address this, we introduce ViewFusion, a novel, training-free algorithm that can be seamlessly integrated into existing pre-trained diffusion models.

Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for next view generation, ensuring robust multi-view consistency during the novel-view generation process. Through a diffusion process that fuses known-view information via interpolated denoising, our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning. Extensive experimental results demonstrate the effectiveness of ViewFusion in generating consistent and detailed novel views.

3D shapes can be extracted from the generated multi-view images by training a Nerf model.

Using ViewFusion you can get a novel viewpoint of a daily object and reconstruct its 3D model.

Related Links

There's a lot of excellent work that was introduced before and around the same time as ours.

3DiM introduces diffusion models to novel-view sythesis.

Zero-1-to-3 train it on a larger dataset Objaverse3D and find its generalization ability

SyncDreamer synchronizes multi-view noise predictor in 3D space to keep multi-view consistency.

Wonder3D leverages texture and normal information and facilitates information exchange across views and modalities by using cross-attention mechanism.

There are probably many more by the time you are reading this, such as One-2-3-45, One-2-3-45++, Zero-1-to-3++.

Feel free to contact me if you have any amazing insights for our work or the whole 3D Generation and AIGC community. I'm really eager to talk about it!

BibTeX

@misc{yang2024viewfusion,
      title={ViewFusion: Towards Multi-View Consistency via Interpolated Denoising},
      author={Xianghui Yang and Yan Zuo and Sameera Ramasinghe and Loris Bazzani and Gil Avraham and Anton van den Hengel},
      year={2024},
      eprint={2402.18842},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
  }