Xianghui Yang is a Ph.D. in the School of Electrical & Information Engineering, The University of Sydney, where he works at the USYD-Vision Lab under the supervision of Prof. Luping Zhou, Prof. Guosheng Lin and Prof. Wanli Ouyang. Before that, he received B.Sc. degree in Physiscs from the School of Physics, Nanjing University in 2019.
We propose a two-stage approach named Hunyuan3D-1.0 including a lite version and a standard version, that both support text- and image-conditioned generation. In the first stage, we employ a multi-view diffusion model that efficiently generates multiview RGB in approximately 4 seconds. In the second stage, we introduce a feedforward reconstruction model that rapidly and faithfully reconstructs the 3D asset given the generated multi-view images in approximately 7 seconds.