He, Wentao; Ma, Hanjie; Li, Shaohua; Dong, Hui; Zhang, Haixiang; Feng, Jie

doi:10.3390/app132212208

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction.
Authors: He, Wentao; Ma, Hanjie; Li, Shaohua; Dong, Hui; Zhang, Haixiang; Feng, Jie
Abstract: Multimodal Relation Extraction (MRE) is a core task for constructing Multimodal Knowledge images (MKGs). Most current research is based on fine-tuning small-scale single-modal image and text pre-trained models, but we find that image-text datasets from network media suffer from data scarcity, simple text data, and abstract image information, which requires a lot of external knowledge for supplementation and reasoning. We use Multimodal Relation Data augmentation (MRDA) to address the data scarcity problem in MRE, and propose a Flexible Threshold Loss (FTL) to handle the imbalanced entity pair distribution and long-tailed classes. After obtaining prompt information from the small model as a guide model, we employ a Large Language Model (LLM) as a knowledge engine to acquire common sense and reasoning abilities. Notably, both stages of our framework are flexibly replaceable, with the first stage adapting to multimodal related classification tasks for small models, and the second stage replaceable by more powerful LLMs. Through experiments, our EMRE2llm model framework achieves state-of-the-art performance on the challenging MNRE dataset, reaching an 82.95% F1 score on the test set.
Subjects: LANGUAGE models; DATA augmentation; DOCUMENT imaging systems; COMMON sense
Publication: Applied Sciences (2076-3417), 2023, Vol 13, Issue 22, p12208
ISSN: 2076-3417
Publication type: Article
DOI: 10.3390/app132212208

We found a match

Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction.

He, Wentao; Ma, Hanjie; Li, Shaohua; Dong, Hui; Zhang, Haixiang; Feng, Jie

LANGUAGE models; DATA augmentation; DOCUMENT imaging systems; COMMON sense

Applied Sciences (2076-3417), 2023, Vol 13, Issue 22, p12208

2076-3417

Article

10.3390/app132212208