Neural Radiance Fields (NeRF) were first developed, greatly enhancing the quality of new vision synthesis. It was first suggested as a way to rebuild a static picture using a series of posed photographs. However, it has been swiftly expanded to include dynamic and uncalibrated scenarios. With the assistance of sizable controlled datasets, recent work additionally concentrate on animating these human radiance field models, thereby broadening the application domain of radiance-field-based modeling to provide augmented reality experiences. In this study, They are focused on the case when just one video is given. They aim to rebuild the human and static scene models and enable unique posture rendering of the person without the need for pricey multi-camera setups or manual annotations.
Neural Actor can create inventive human poses, but it needs several films. Even with the most recent improvements in NeRF techniques, this is far from a simple task. The NeRF models must be trained using many cameras, constant lighting and exposure, transparent backgrounds, and precise human geometry. According to the table below, HyperNeRF cannot be controlled by human postures but instead creates a dynamic scene based on a single video. ST-NeRF uses many cameras to rebuild each person using a time-dependent NeRF model, although the editing is only done to change the bounding box. HumanNeRF creates a human model from a single video with masks that have been carefully annotated; however, it does not demonstrate generalization to novel postures.
With a model trained on a single video, Vid2Actor can produce new human poses, but it cannot model the surroundings. They solve these issues by proposing NeuMan, a system that can create unique human stances and novel viewpoints while reconstructing the person and the scene from a single in-the-wild video. Figure 1’s high-quality pose-driven rendering is made possible by NeuMan, a cutting-edge framework for training NeRF models for both the human and the scene. They first estimate the camera poses, the sparse scene model, the depth maps, the human stance, the human form, and the human masks from a moving camera’s video.
Next, two NeRF models are trained, one for the subject and one for the scene, both of which are aided by the segmentation masks calculated by Mask-RCNN. Additionally, They use depth estimates from both multi-view reconstruction and monocular depth regression to regularise the scene NeRF model. They train the human NeRF model in a posture-independent canonical volume using a human statistical shape and pose model (SMPL). In order to better serve the training, They modify the SMPL estimations from ROMP. These improved estimations are not flawless either. As a result, They jointly optimize the SMPL estimates and the human NeRF model from beginning to finish.
Additionally, They build an error-correction network to counter it, as their static canonical human NeRF cannot reflect the dynamics that are not included in the SMPL model. During training, the SMPL estimates and the error-correction network are concurrently tuned. In conclusion, They propose a framework for the neural rendering of a human and a scene from a single video without any additional devices or annotations; They demonstrate that their method allows for high-quality rendering of the human in novel poses and from novel viewpoints along with the scene; They introduce an end-to-end SMPL optimization and an error-correction network to enable training with inaccurate estimates of the human geometry; and finally, their method enables the composition of the human and the scene.
The code implementation of this research paper is freely available on GitHub of Apple.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'NeuMan: Neural Human Radiance Field from a Single Video'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and github link. Please Don't Forget To Join Our ML Subreddit
Content Writing Consultant Intern at Marktechpost.