ZeDO
Back to Optimization: Diffusion-based
Zero-Shot 3D Human Pose Estimation

Zhongyu Jiang*, Zhuoran Zhou*, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang
University of Washington, University of Copenhagen
WACV 2024

*Indicates Equal Contribution

The optimization process of ZeDO. Red skeletons are ground truth poses, and blue skeletons are optimized poses.

Abstract

Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE 51.4mm, without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 40.3mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW.

Our follow-up paper in 3D infant pose estimation

Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation

Zhuoran Zhou, Zhongyu Jiang, Wenhao Chai, Cheng-Yen Yang, Lei Li, Jenq-Neng Hwang
University of Washington, University of Copenhagen
WACVWorkshop 2024

Abstract

Although 3D human pose estimation has gained impressive development in recent years, only a few works focus on infants, that have different bone lengths and also have limited data. Directly applying adult pose estimation models typically achieves low performance in the infant domain and suffers from out-of-distribution issues. Moreover, the limitation of infant pose data collection also heavily constrains the efficiency of learning-based models to lift 2D poses to 3D. To deal with the issues of small datasets, domain adaptation and data augmentation are commonly used techniques. Following this paradigm, we take advantage of an optimization-based method that utilizes generative priors to predict 3D infant keypoints from 2D keypoints without the need of large training data. We further apply a guided diffusion model to domain adapt 3D adult pose to infant pose to supplement small datasets. Besides, we also prove that our method, ZeDO-i, could attain efficient domain adaptation, even if only a small number of data is given. Quantitatively, we claim that our model attains state-of-the-art MPJPE performance of 43.6 mm on the SyRIP dataset 21.2 mm on the MINI-RGBD dataset.

BibTeX


@inproceedings{Jiang2024ZeDO,
  title={Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation},
  author={Jiang, Zhongyu and Zhou, Zhuoran and Li, Lei and Chai, Wenhao and Yang, Cheng-Yen and Hwang, Jenq-Neng},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  year={2024}
}
      
@misc{zhou2023efficient,
      title={Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation}, 
      author={Zhuoran Zhou and Zhongyu Jiang and Wenhao Chai and Cheng-Yen Yang and Lei Li and Jenq-Neng Hwang},
      year={2023},
      eprint={2311.12043},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}