img

Notice détaillée

Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition

Article Ecrit par: Agrawal, Ankit ; Tripathi, Sarsij ; Vardhan, Manu ;

Résumé: In the present era, there is a large amount of new data available readily from different sources to collect and store. One of the main problems is to label these new data for various machine learning applications correctly. The active learning approach presents a unique case of machine learning which is widely used to solve the above problem by significantly minimizing the need for labeled data. It aims to select the most appropriate samples from the unlabeled data to be correctly labeled by the oracle and is passed to train the active learner incrementally. There are several different query sampling strategies that exist using which the appropriate samples are selected. One of the main problems with the active learning approach is that it is very time-consuming. So in this research work, a new multi-core-based algorithm is proposed to speed up the active learning approach, which can utilize the complete computational resources present in the system. The experiments have been performed for the problem of named entity recognition which deals with labeling the sequences of words in an unstructured text by classifying them into pre-existing categories. The proposed algorithm is evaluated in terms of both: the performance and execution time over three named entity recognition corpus of distinct biomedical domains. The evaluation results shows considerable improvement in terms of execution time for the proposed active learning algorithm than the existing active learning approach.


Langue: Anglais