...

Zeping Ren

Master Student, Tsinghua University
Hometown: Chongqing,China
Email:rzp22@mails.tsinghua.edu.com


Bio: I am a master student studied in Tsinghua Shenzhen International Graduate School, advised by Prof.Xiu Li. I got my undergraduate degree in Department of Automation, Tsinghua University, under the supervision of Prof.Yebin Liu. My research is focus on 3D pose estimation and motion generation.


Research

Realistic Human Motion Generation with Cross-Diffusion Models
Zeping Ren, Shaoli Huang, Xiu Li
ECCV 2024.
@article{ren2023realistic,
    title={Realistic Human Motion Generation with Cross-Diffusion Models},
    author={Ren, Zeping and Huang, Shaoli and Li, Xiu},
    journal={arXiv preprint arXiv:2312.10993},
    year={2023}
}
                                

We introduce the Cross Human Motion Diffusion Model (CrossDiff), a novel approach for generating high-quality human motion based on textual descriptions. Our method integrates 3D and 2D information using a shared transformer network within the training of the diffusion model, unifying motion noise into a single feature space. This enables cross-decoding of features into both 3D and 2D motion representations, regardless of their original dimension. The primary advantage of CrossDiff is its cross-diffusion mechanism, which allows the model to reverse either 2D or 3D noise into clean motion during training. This capability leverages the complementary information in both motion representations, capturing intricate human movement details often missed by models relying solely on 3D information. Consequently, CrossDiff effectively combines the strengths of both representations to generate more realistic motion sequences. In our experiments, our model demonstrates competitive state-of-the-art performance on text-to-motion benchmarks. Moreover, our method consistently provides enhanced motion generation quality, capturing complex full-body movement intricacies. Additionally, our approach accommodates using 2D motion data without 3D motion ground truth during training to generate 3D motion, highlighting its potential for broader applications and efficient use of available data resources.

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
Ronghui Li*, Junfan Zhao*, Yachao Zhang, Mingyang Su, Zeping Ren, Han Zhang, Yansong Tang, Xiu Li
ICCV 2023.
@InProceedings{Li_2023_ICCV,
    author    = {Li, Ronghui and Zhao, Junfan and Zhang, Yachao and Su, Mingyang and Ren, Zeping and Zhang, Han and Tang, Yansong and Li, Xiu},
    title     = {FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {10234-10243}
}
                            

Generating full-body and multi-genre dance sequences from given music is a challenging task, due to the limitations of existing datasets and the inherent complexity of the fine-grained hand motion and dance genres. To address these problems, we propose FineDance, which contains 14.6 hours of music-dance paired data, with fine-grained hand motions, fine-grained genres (22 dance genres), and accurate posture. To the best of our knowledge, FineDance is the largest music-dance paired dataset with the most dance genres. Additionally, to address monotonous and unnatural hand movements existing in previous methods, we propose a full-body dance generation network, which utilizes the diverse generation capabilities of the diffusion model to solve monotonous problems, and use expert nets to solve unreal problems. To further enhance the genrematching and long-term stability of generated dances, we propose a Genre&Coherent aware Retrieval Module. Besides, we propose a new metric named Genre Matching Score to measure the genre matching between dance and music. Quantitative and qualitative experiments demonstrate the quality of FineDance, and the state-of-the-art performance of FineNet.

Real-time Sparse-view Multi-person Total Motion Capture
Yuxiang Zhang, Zeping Ren, Liang An, Hongwen Zhang, Tao Yu, Yebin Liu
2022.

Real-time multi-person total motion capture is one of the most challenging tasks for human motion capture. The multi-view configuration reduces occlusions and depth ambiguity yet further complicates the problem by introducing cross-view association into consideration. In this paper, we contribute the first real-time multi-person total motion capture system under sparse views. To enable full body cross-view association in real-time, we propose a highly efficient association algorithm, named Clique Unfolding, by reformulating the widely used fast unfolding algorithm for community detection. Moreover, an adaptive motion prior based on human motion prediction is proposed to improve the SMPL-X fitting performance in the final step. Benefiting from the proposed association and fitting methods, our system achieves robust, efficient and accurate multi-person total motion capture results. Experiments and results demonstrate the efficiency and effectiveness of the proposed method.


More

I'm not a magician, but I never stop creating miracles.