Textual Entailment--Based Figure Summarization for Biomedical Articles
مقال من تأليف: Saini, Naveen ; Saha, Sriparna ; Bhattacharyya, Pushpak ; Tuteja, Himanshu ;
ملخص: This article proposes a novel unsupervised approach (FigSum++) for automatic figure summarization in biomedical scientific articles using a multi-objective evolutionary algorithm. The problem is treated as an optimization problem where relevant sentences in the summary for a given figure are selected based on various sentence scoring features (or objective functions), such as the textual entailment score between sentences in the summary and a figure's caption, the number of sentences referring to that figure, semantic similarity between sentences and a figure's caption, and the number of overlapping words between sentences and a figure's caption. These objective functions are optimized simultaneously using multi-objective binary differential evolution (MBDE). MBDE consists of a set of solutions, and each solution represents a subset of sentences to be selected in the summary. MBDE generally uses a single differential evolution variant, but in the current study, an ensemble of two different differential evolution variants measuring diversity among solutions and convergence toward global optimal solution, respectively, is employed for efficient search. Usually, in any summarization system, diversity among sentences (called anti-redundancy ) in the summary is a very critical feature, and it is calculated in terms of similarity (like cosine similarity) among sentences. In this article, a new way of measuring diversity in terms of textual entailment is proposed. To represent the sentences of the article in the form of numeric vectors, the recently proposed BioBERT pre-trained language model in biomedical text mining is utilized. An ablation study has also been presented to determine the importance of different objective functions. For evaluation of the proposed technique, two benchmark biomedical datasets containing 91 and 84 figures are considered. Our proposed system obtains 5% and 11% improvements in terms of the F -measure metric over two datasets, compared to the state-of-the-art unsupervised methods.
لغة:
إنجليزية