We present a method that achieves view-consistent stylized scenes from novel views, with visually impressive texture details on novel view synthesis, signed distance function and 2D coordinate-based mapping function.
Our algorithm is a fully-connected (non-convolutional) deep network, whose input is a single continuous coordinate and pre-defined one-hot style embedding and whose output is a stylized color value.
We introduce a Style Implicit Module (SIM) to the ordinary implicit representation, and coin the later one as Content Implicit Module (CIM) in our framework. During the training process, the stylized information and content scene are encoded as one continuous representation, and then fused by another Amalgamation Module (AM).
Here are stylization results on NeRF-Synthetic and LLFF datasets. We can see INS generates faithful and view-consistent results for new viewpoints, with rich textures across scenes and styles.
We provide more detailed comparisons among INS on NeRF, neural radiance based method (e.g. Style3D), single-image-based style transfer method (e.g. Perceptual Loss) and video-based methods (e.g. MCCNet and ReReVST). We can see that stylizations from image-based methods produce noisy and inconsistent stylization as they transfer styles based on single image. Video-based methods generate moderate view-consistency results but failed to generate faithful textures. Method in Style3D generates blur results as it still relies on convolution networks (a.k.a. hypernetwork) to generate the MLP weights for the subsequent volume rendering. Different from our pure implicit-based representation, Style3D requires CNNs and a hypernetwork to generate the model weights, which requires a much large storage space than ours (125MB vs. 14MB).
@article{fan2022unified,
title={Unified Implicit Neural Stylization},,
author={Fan, Zhiwen and Jiang, Yifan and Wang, Peihao and Gong, Xinyu and Xu, Dejia and Wang, Zhangyang},
journal={arXiv preprint arXiv:2204.01943},
year={2022}
}