Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop

Nikos Kolotouros*1 Georgios Pavlakos*1 Michael J. Black2 Kostas Daniilidis1
1University of Pennsylvania 2Max Planck Institute for Intelligent Systems
*Equal contribution

Abstract

Model-based human pose estimation is currently approached through two different paradigms. Optimization-based methods fit a parametric body model to 2D observations in an iterative manner, leading to accurate image-model alignments, but are often slow and sensitive to the initialization. In contrast, regression-based methods, that use a deep network to directly estimate the model parameters from pixels, tend to provide reasonable, but not pixel accurate, results while requiring huge amounts of supervision. In this work, instead of investigating which approach is better, our key insight is that the two paradigms can form a strong collaboration. A reasonable, directly regressed estimate from the network can initialize the iterative optimization making the fitting faster and more accurate. Similarly, a pixel accurate fit from iterative optimization can act as strong supervision for the network. This is the core of our proposed approach SPIN (SMPL oPtimization IN the loop). The deep network initializes an iterative optimization routine that fits the body model to 2D joints within the training loop, and the fitted estimate is subsequently used to supervise the network. Our approach is self-improving by nature, since better network estimates can lead the optimization to better solutions, while more accurate optimization fits provide better supervision for the network. We demonstrate the effectiveness of our approach in different settings, where 3D ground truth is scarce, or not available, and we consistently outperform the state-of-the-art model-based pose estimation approaches by significant margins.


SPIN: SMPL oPtimization IN the loop

Given an input image containing a person, a neural network regresses the full 3D shape of the person. This regressed shape is used to initialize an iterative optimization procedure that fits the body model to the 2D joints within the training loop. Starting from a reasonable initialization the fitting procedure is faster and more accurate. The output of the optimization is then used to supervise the network with a direct parameter loss. Our approach is self-improving by nature, since better network estimates lead the optimization to better solutions, while more accurate optimization fits provide better supervision for the network


Results

Here we show some results on videos from the 3DPW dataset. We note that our network was not trained with data from this dataset. Also we do not perform any postprocessing, e.g., temporal smoothing.


Acknowledgements

NK, GP and KD gratefully appreciate support through the following grants: NSF-IIP-1439681 (I/UCRC), NSF-IIS-1703319, NSF MRI 1626008, ARL RCTA W911NF-10-2-0016, ONR N00014-17-1-2093, ARL DCIST CRA W911NF-17-2-0181, the DARPA-SRC C-BRIC, by Honda Research Institute and a Google Daydream Research Award.

Disclosure: MJB has received research gift funds from Intel, Nvidia, Adobe, Facebook, and Amazon. While MJB is a part-time employee of Amazon, his research was performed solely at, and funded solely by, MPI. MJB has financial interests in Amazon and Meshcapade GmbH.

The design of this project page was based on this website.