Multimodal Compatibility Modeling via Exploring the Consistent and Complementary Correlations

Publisher:
ACM
Publication Type:
Conference Proceeding
Citation:
MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 2299-2307
Issue Date:
2021-10-17
Filename Description Size
3474085.3475392.pdfPublished version1.8 MB
Adobe PDF
Full metadata record
Existing methods towards outfit compatibility modeling seldom explicitly consider multimodal correlations. In this work, we explore the consistent and complementary correlations for better compatibility modeling. This is, however, non-trivial due to the following challenges: 1) how to separate and model these two kinds of correlations; 2) how to leverage the derived complementary cues to strengthen the text and vision-oriented representations of the given item; and 3) how to reinforce the compatibility modeling with text and vision-oriented representations. To address these challenges, we present a comprehensive multimodal outfit compatibility modeling scheme. It first nonlinearly projects each modality into separable consistent and complementary spaces via multi-layer perceptron, and then models the consistent and complementary correlations between two modalities by parallel and orthogonal regularization. Thereafter, we strengthen the visual and textual representation of items with complementary information, and further induct both the text-oriented and vision- oriented outfit compatibility modeling. We ultimately employ the mutual learning strategy to reinforce the final performance of compatibility modeling. Extensive experiments demonstrate the superiority of our scheme.
Please use this identifier to cite or link to this item: