img

Notice détaillée

Printed Ottoman text recognition using synthetic data and data augmentation

Article Ecrit par: Bilgin Tasdemir, Esma F. ;

Résumé: The Ottoman script, which was in use for over five centuries, is an Arabic alphabet-based writing system. It became obsolete after the change of alphabet in Turkey. There are plenty of Ottoman documents, overwhelmingly printed in Naskh style. This work presents a DL-based character recognition system for the printed Ottoman script. We first generate a synthetic text image dataset from a text corpus and then augment it using some image processing methods. We develop a hybrid convolutional neural network-bidirectional long short-term memory recognizer and train it with the original and the augmented datasets. Finally, we apply a transfer learning procedure for adapting the system to real image data. The proposed system obtains 0.11 CER on synthetic data and 0.16 CER on real data comprising of line images from a printed historical Ottoman book


Langue: Anglais