I am a Ph.D. student in Electrical Engineering at The University of Texas at Austin advised by Prof. Zhangyang Wang at VITA group. Previously, I was senior algorithm engineer at Alibaba Cloud worked with Prof. Ping Tan and Siyu Zhu.
CV / Email / Google Scholar
Synthesizing photo-realistic images has been one of the most essential goals in the area of computer vision.
NeRF Augmentations: NeRF often yields inconsistent and visually non-smooth geometric results due to the generalization gap between seen and unseen views. We propose to blend triple level worest-case perturbations to the NeRF training pipeline with physical grouds. In our Aug-NeRF paper, we effectively boosts NeRF accuracy in both novel view synthesis (up to 1.5dB PSNR gain) and underlying geometry reconstruction.
Single view NeRF: NeRF is impeded by the stringent requirement of the dense views captured from multiple well-calibrated cameras, whereas it could be challenging or even infeasible to collect a sufficiently dense coverage of a scene. In our SinNeRF paper, We push the setting of sparse views to the extreme, by training a neural radiance field on only one view with depth information. By generating pseudo labels according to the available single view, the learned radiance field generate more satisfying synthesized results on novel views.
NeRF Stylizatation: In addtion to improve the synthesized image quality, in our INS paper, we conduct a pilot study for training stylized implicit representations (e.g., SIREN, NeRF, SDF) We propose to decouples the ordinary implicit function into a style implicit module and a content implicit module, in order to separately encode the representations from the style image and input scene. An amalgamation module is then applied to aggregate these information and synthesize the stylized output. Consequently, we can synthesize faithful stylization for SIREN, NeRF and SDF. In addition, we can interpolate between different styles and generating images with new mixed style.
CAD symbol spotting can be use in architecture, engineering and construction (AEC) industries to accelerate the efficiency of 3D modeling from CAD drawings.
We release the first large-scale real-world dataset of over 15,000 CAD drawings with line-grained annotations (35 classes), covering various types of builds.
We introduce the new task of Panoptic Symbol Spotting, which is a relaxation of the traditional symbol spotting problem. It spots
and parse both countable object instances (windows, doors, tables, etc.) and uncountable stuff (wall, railing, etc.) from CAD drawings,
Moreover, we propose the Panoptic Quality (PQ) as the evaluation criteria of panotic symbol spotting results.
To tackle the new proposed problem, we first present a CNN-GCN method in our ICCV2021 which unified a GCN head and a detection head for semantic and instance symbol spotting respectively.
Recently, we present a transformer-based framework named CADTransformer, in our CVPR2022, by painlessly modifying existing vision transformer (ViT)
backbones to tackle the panoptic symbol spotting task. The PQ is boosted from 0.595 in the GCN-CNN based methods to a new state-of-the-art 0.685.
To tackle the high computaitonal cost of the existing cost volume-based deep MVS and stereo matching methods, we propose a memory and run time efficient cost volume formulation
which is built upon a standard feature pyramid encoding geometry and context at gradually finer scales.
Within our new design, We narrow the depth (or disparity) range of each stage by the depth (or disparity) map from the previous stage to recover the output in a coarser to fine manner.
By applying the cascade cost volume to the representative MVS-Net, and obtain a 23.1% improvement on
DTU dataset with 50.6% and 74.2% reduction in GPU memory and run-time.
It is also rank 1st within all learning-based methods on Tanks and Temples benchmark.
In addition, we adapt GwcNet with our proposed cost volume design, and the accuracy ranking rises from 29 to 17 with 37.0% memory reduction on
KITTI 2015 test set. See CVPR2020 for more details.
I have also worked on low-level computer vision tasks (e.g. Compressed Sensing MRI and Single Image Deraining) using Deep Neural Network before the year 2019. See IPMI2019, ACM MM2019, ECCV 2018, AAAI 2018 and TIP 2019, MRI 2019, MRI 2019 for details.
I'm interested in devleoping Neural Radiance Field, Efficient 3D Models, Graph Neural Networks for vector graphics and 3D data and Low-level Computer Vision.
Research Intern
05/2022--08/2022
Senior Algorithm Engineer
07/2019--08/2021