pottrait

Zhiwen ("Aaron") Fan

University of Texas at Austin

About Me

I am a Ph.D. candidate in Electrical Computer Engineering at The University of Texas at Austin advised by Prof. Atlas Wang at VITA group.
I work closely with Prof. Marco Pavone and Prof. Yue Wang on 3D end-to-end models with robust generalization capabilities; with Prof. Achuta Kadambi on recovering 3D/4D signals that capture the space-time structure of our world from casually captured data; and with Prof. Callie Hao on hardware-software co-design. I was the awardees of Qualcomm Innovation Fellowship 2022.

Recent News

  • Our NeurIPS'24 (LightGaussian) is selected as spotlight presentation.
  • Our Symbolic Visual RL was accepted by IEEE Trans. PAMI.
  • Our IROS'24 (Multi-modal 3DGS SLAM) is selected as oral pitch finalist presentation.
  • Our CVPR'24 (Feature-3DGS) is selected as highlight presentation.
  • Our CVPR'23 (NeuralLift-360) is selected as highlight presentation.
  • I was one of the awardees of the Qualcomm Innovation Fellowship (North America) 2022 (QIF 2022). Innovation title: "Real-time Visual Processing for Autonomous Driving via Video Transformer with Data-Model-Accelerator Tri-design".
  • We won 3rd place in the University Demo Best Demonstration at the 59th Design Automation Conference (DAC 2022). We demo for a multi-task vision transformer on FPGA.
  • Our CVPR'22 (CADTransformer) is selected as oral presentation.
  • Our paper for CVPR'20 (Cascade Cost Volume) is selected as oral presentation.

Commitment

  • I am dedicated to collecting high-quality, accessible online courses in artificial intelligence, machine learning and computer vision. Check the note page: LINK.
  • I dedicate 1 hour weekly (2:00pm-2:30pm, 2:30pm-3:00pm, PST on Friday) to mentor and guide students from underrepresented groups or those in need. Interested? Fill out this form.

Researches and Projects

traditional_3d_new_3d...

Humans possess the remarkable ability to perceive and interact with their environment. This ability is driven by an internal understanding of how the scene's structure is formulated and the inherent properties of the environment. To equip future intelligent machines with this capability, it is essential to perceive geometry directly from visual inputs for interaction with the physical world, rather than relying on offline algorithms for preprocessing camera poses, which limits the scalability of foundation models for safe and reliable planning.
The next-generation learning algorithms that equipped with visual sensors, shall inherently perceive geometric structures that pre-train 3D foundation models that leverage Internet-scale video data; align with existing VLMs for reliable reasoning and planning by leveraging physical geometry from visual inputs; investigate novel architectures that can efficiently process, interpret, and reason over high-resolution visual streams in the temporal dimension.

end-to-end 3D...

The emergence of reasoning or strong generalization capabilities within foundation models are built upon the ability of processing and compressing large-scale data into well-designed, scalable models. However, 3D learning typically requires lengthy, modular, and non-differentiable pipelines for calibrating image or video data. This paradigm significantly hinders the scaling of 3D models to learn from web-scale, unannotated video data that lacks image annotations and camera poses. My researches address a practical and compelling challenge: 3D reconstruction from pose-free, unannotated image data.

multi-modal 3D assets creation...

My research tackles the challenge of data sparsity in photorealistic 3D digital environments, where dense scene capture with annotated poses is often unavailable. By combining Geometric Principles with Generative Priors—learned from large datasets—My reseaches fill in missing information using statistical patterns in shape and appearance. This approach leverages both the deterministic nature of geometry and the probabilistic power of generative models, enabling architectures to effectively learn from limited data.

My research has been demonstrated on platforms such as Quest 3, implemented within IARPA projects, and integrated into multiple commercial products.

Selected Publications
Full publication list at Google Scholar

* denotes equal contribution, † denotes project lead.
Large Spatial Model: Real-time Unposed Images to Semantic 3D
Zhiwen Fan* , Jian Zhang*, Wenyan Cong, Peihao Wang, Renjie Li, Kairun Wen, Shijie Zhou, Achuta Kadambi, Zhangyang Wang, Danfei Xu, Boris Ivanovic, Marco Pavone, Yue Wang
NeurIPS 2024 
LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS
Zhiwen Fan*Kevin Wang*Kairun WenZehao ZhuDejia XuZhangyang Wang
NeurIPS 2024(Spotlight)  
Expressive Gaussian Human Avatars from Monocular RGB Video
Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang Wang
NeurIPS 2024 
MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements
Lisong C. Sun, Neel P. Bhatt, Jonathan C. Liu, Zhiwen Fan, Zhangyang Wang, Todd E. Humphreys, Ufuk Topcu
IROS 2024 (Oral Pitch Highlight) 
4K4DGen: Panoramic 4D Generation at 4K Resolution
Renjie Li, Panwang Pan, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhiwen Fan
Preprint 
InstantSplat: Sparse-view Pose-free Gaussian Splatting in Seconds
Zhiwen Fan*, Wenyan Cong*, Kairun Wen*, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, Yue Wang
Preprint 
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Shijie Zhou*, Zhiwen Fan*, Dejia Xu*, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi
ECCV 2024 
Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi,
CVPR 2024(Highlight, 2.8% of 11532) 
NeRF-SOS: Any-View Self-supervised Object Segmentation from Complex Real-World Scenes
Zhiwen FanPeihao WangYifan JiangXinyu GongDejia XuZhangyang Wang
ICLR2023 
NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360° Views
Dejia XuYifan JiangPeihao WangZhiwen FanYi WangZhangyang Wang
CVPR2023  (Highlight, 2.5% of 9155)
Unified Implicit Neural Stylization
Zhiwen Fan*Yifan Jiang*Peihao Wang*Xinyu GongDejia XuZhangyang Wang
ECCV2022 
M^3ViT Video thumbnail loading...
M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Zhiwen Fan*Hanxue Liang*Rishov SarkarZiyu JiangTianlong ChenKai ZouYu ChengCong HaoZhangyang Wang
NeurIPS2022  (QIF 2022 Award & DAC 3rd best demo)
CADTransformer...
CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings
Zhiwen FanTianlong ChenPeihao WangZhangyang Wang
CVPR2022 (Oral Presentation, top 5% in all submissions)
CascadeCostVolume...
Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching
Zhiwen Fan* Xiaodong Gu* Siyu Zhu Zuozhuo Dai Feitong Tan Ping Tan
CVPR2020 (Oral Presentation, top 5% in all submissions)
CascadeCostVolume...
Joint CS-MRI Reconstruction and Segmentation with a Unified Deep Network
Liyan Sun* Zhiwen Fan* Xinghao Ding Yue Huang John Paisley
IPMI2019 

Invited Talks

  • Empowering Machines to Understand 3D @ Stanford, ASU, JHU. October 2024.
  • 3D Computer Vision @ TAMU Guest Lecture. October 2024.
  • From Efficient 3D Learning to 3D Foundation Models @ UCLA and CalTech. October 2024.
  • Towards Universal, Real-Time 3D Construction and Interaction @ TAMU AI Lunch. October 2024.
  • Spatial Intelligence via Reconstruction, Distillation, and Generation @ Shanghai AI Lab. July 2024.
  • "Streamlined 3D/4D: From Hours to Seconds to Millisecond @ Google Research, VALSE Webinar . May 2024.
  • "Streamlined 3D/4D: From Hours to Seconds to Millisecond @ Google Research. May 2024.
  • "Real-Time Few-shot View Synthesis w/ Gaussian Splatting @ IARPA WRIVA Workshop. April 2024.
  • "Data-efficient and Rendering-efficient Neural Rendering @ IFML Workshop on Gen AI. November 2023.
  • "Unified Implicit Neural Stylization @ Xiamen University; Kungfu.ai.. July 2022.

Experience

  • Meta, Reality Lab, Burlingame:
    Research Intern (year of 2024).
  • NVIDIA Research (remote):
    Research Intern (year of 2024).
  • Meta, Reality Lab, Redmond:
    Research Intern (year of 2023).
  • Google AR, San Francisco:
    Research Intern (year of 2022).
  • Alibaba Group, Hangzhou:
    Senior Algorithm Engineer(2019 - 2021).

Services

  • Journal Reviewers: 
    TPAMI, TIP, IJCV, Neurocomputing.
  • Conference Reviewers: 
    NeurIPS 22/23, ICML 22/23, CVPR 22/23, ICCV 21/23, AAAI 21, ICME 2019.