Exploring Shape Embedding for Cloth-Changing Person Re-Identification via 2D-3D Correspondences

Abstract

Cloth-Changing Person Re-Identification (CC-ReID) is a common and realistic problem since fashion constantly changes over time and people's aesthetic preferences are not set in stone. While most existing cloth-changing ReID methods focus on learning cloth-agnostic identity representations from coarse semantic cues (e.g. silhouettes and part segmentation maps), they neglect the continuous shape distributions at the pixel level.

In this paper, we propose Continuous Surface Correspondence Learning (CSCL), a new shape embedding paradigm for cloth-changing ReID. CSCL establishes continuous correspondences between a 2D image plane and a canonical 3D body surface via pixel-to-vertex classification, which naturally aligns a person image to the surface of a 3D human model and simultaneously obtains pixel-wise surface embeddings. We further extract fine-grained shape features from the learned surface embeddings and then integrate them with global RGB features via a carefully designed cross-modality fusion module. The shape embedding paradigm based on 2D-3D correspondences remarkably enhances the model's global understanding of human body shape.

To promote the study of ReID under clothing change, we construct 3D Dense Persons (DP3D), which is the first large-scale cloth-changing ReID dataset that provides densely annotated 2D-3D correspondences and a precise 3D mesh for each person image, while containing diverse cloth-changing cases over all four seasons. Experiments on both cloth-changing and cloth-consistent ReID benchmarks validate the effectiveness of our method.

Model Architecture

3D Dense Persons (DP3D)

Data Collection

DP3D comprises 39,100 person images belonging to 413 different persons, which were captured over the course of a year (during four distinct seasons). A total of 15 cameras were selected, with 5 of them having a resolution of 4K, 2 having a resolution of 2K, and the remainder being set to a resolution of 1080P.

Annotation Pipeline

We annotated dense 2D-3D correspondences for each person image via a carefully designed annotation system, ensuring 80 to 125 annotations for each image. The sampling method avoids seams between body parts and ensures a sufficient number of sampling points for smaller parts.

BibTeX

@InProceedings{wang2023cscl,
  author    = {Yubin, Wang and Huimin, Yu and Yuming, Yan and Shuyi, Song and Biyang, Liu and Yichong, Lu},
  title     = {Exploring Shape Embedding for Cloth-Changing Person Re-Identification via 2D-3D Correspondences},
  booktitle = {Proceedings of the 31th ACM International Conference on Multimedia},
  year      = {2023}
}