ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

Abstract

Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images. Yet, the independent process of image generation in these prevailing methods leads to challenges in maintaining multiple view consistency.

To address this, we introduce ViewFusion, a novel, training-free algorithm that can be seamlessly integrated into existing pre-trained diffusion models.

Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for next view generation, ensuring robust multi-view consistency during the novel-view generation process. Through a diffusion process that fuses known-view information via interpolated denoising, our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning. Extensive experimental results demonstrate the effectiveness of ViewFusion in generating consistent and detailed novel views.

BibTeX

@misc{yang2024viewfusion,
      title={ViewFusion: Towards Multi-View Consistency via Interpolated Denoising},
      author={Xianghui Yang and Yan Zuo and Sameera Ramasinghe and Loris Bazzani and Gil Avraham and Anton van den Hengel},
      year={2024},
      eprint={2402.18842},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
  }

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

ViewFusion generates novel-view images and promise multi-view consistentcy based on a single-view RGB image.

Abstract

3D shapes can be extracted from the generated multi-view images by training a Nerf model.

Using ViewFusion you can get a novel viewpoint of a daily object and reconstruct its 3D model.

Related Links

BibTeX