Empowering the Captioning of Fashion Attributes from Asian Fashion Images
Abstract
Fashion image captioning, an evolving field in AI and computer vision, generates descriptive captions for
fashion images. This paper addresses the prevalent bias in existing studies, which focus predominantly on Western fashion,
by incorporating Asian fashion into the analysis. This paper describes developing more inclusive AI technologies for the
fashion industry by bridging the gap between Western and Asian fashion in image captioning. We leverage transfer learning
techniques, combining the DeepFashion dataset (primarily Western fashion) with a newly curated Asian fashion dataset.
Our approach employs advanced deep learning methods for the encoder and decoder components to generate high-quality
captions that capture various fashion attributes, such as style, color, and garment type, tailored specifically to Asian fashion
trends. Results demonstrate the efficacy of our methods, with the model achieving accuracies of 93.63% for gender, 83.42%
for article type, and 61.34% for base color on the training dataset, and 94.13%, 79.25%, and 59.71%, respectively, on the
validation dataset. These findings highlight the importance of inclusivity and diversity in AI research, advancing the field
of fashion image captioning.