Please use this identifier to cite or link to this item: http://repo.lib.jfn.ac.lk/ujrr/handle/123456789/810
Full metadata record
DC FieldValueLanguage
dc.contributor.authorRamanan, M.-
dc.contributor.authorRamanan, A.-
dc.contributor.authorCharles Eugene, Y.-
dc.date.accessioned2016-01-02T11:11:21Z-
dc.date.accessioned2022-06-28T04:51:43Z-
dc.date.available2016-01-02T11:11:21Z-
dc.date.available2022-06-28T04:51:43Z-
dc.date.issued2014-12-15-
dc.identifier.isbn978-1-4799-6499-4-
dc.identifier.urihttp://repo.lib.jfn.ac.lk/ujrr/handle/123456789/810-
dc.description.abstractOptical Character Recognition (OCR) deals with automated recognition of characters that are in the format of digital image. OCR refers to the process by which scanned images are electronically processed and converted to an editable document. Handwritten and printed texts are the primary research areas of an OCR. Many OCR systems are commercially available for English and Arabic characters but there is still no recognition system available which yields higher recognition rate even though the scanned images are of high quality. The general framework of a Tamil OCR in the literature involves: preprocessing, line segmentation, word segmentation, character segmentation, feature extraction and recognition of characters. OCR for printed Tamil documents poses challenge owing to: one line may have different font styles, presence of pictures, multi columns, touching of adjacent characters, presence of broken characters, low print quality and complex layout. Furthermore, when comparing 26 alphabets in English, Tamil language has 247 alphabets which makes the recognition more difficult. There are few OCRs for Tamil language that are freely available with a moderate recognition rate as the performance comparisons of such OCRs are not available on a benchmark dataset. In this paper we compare OCRs for printed Tamil texts on four different types of documents: books, magazines, newspapers and pamphlets. Furthermore we propose a post-processing error correction technique to the tested OCRs which reduces the overall mean error rate by nearly 10% on those four categories.en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.subjectGoogle tesseract, i2ocr, newocr, ponvizhi, TamilOCRen_US
dc.titleA Performance Comparison and Post-processing Error Correction Technique to OCRs for Printed Tamil Textsen_US
dc.typeArticleen_US
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Postprocessing-Error-Correction.pdf293.07 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.