img

Notice détaillée

An object detection-based few-shot learning approach for multimedia quality assessment

Article Ecrit par: Chatterjee, Rajdeep ; Khurram Khan, Muhammad ; Chatterjee, Ankita ; Islam, SK Hafizul ;

Résumé: A large portion of the global population generates various multimedia data such as texts, images, videos, etc. One of the most common categories which influences the public at large is visual multimedia content. Due to the different social media platforms (e.g., Whatsapp, Twitter, Facebook, Instagram, and YouTube), these materials are passed without censorship and national boundaries. Multimedia data containing any violent or vulgar objects could trigger public unrest, and thus, it is a serious threat to the law and order of the land. Children and teenagers use social media like never before in previous generations and create lots of multimedia data. It is important to assess the quality of multimedia content without any bias and prejudices. Although the mainstream social media platforms use different filters and moderation using human experts, it is impossible to verify the terabytes of uploaded images and videos. Thus, it is inevitable to automate the content assessment phase without incurring an increase in upload time. This study aims to prevent uploading or to tag an image/video with a reasonable percentage of a gun as content. In this paper, object detection architectures such as Faster RCNN, EfficientDet, and YOLOv5 have been used to demonstrate how these techniques can efficiently detect human faces and different types of guns in given multimedia data (images/videos). The models are tested on various test images and video clips. A comparative analysis has also been discussed based on mean average precision and frames per second metric. The YOLOv5 provides the best-performing results as high as 80.39% and 35.22% at and , respectively. A face recognition task requires thousands of samples and the usual deep learning models are data-driven. On the contrary, a few-shot learning approach has been implemented to recognize the detected faces categorizing the content as real or reel.


Langue: Anglais