Authors
Sergiu Gordea, George Cristian Cotea, Frank Drauschke, Jasmin Salesevic, Marthe Lamote and Tom Vanallemeersch
Abstract
This paper presents a workflow for HTR dataset generation in Romanian using Transcribathon’s Correct HTR feature. Leveraging citizen-science transcriptions aligned with Transkribus outputs, our case study on Jurnalul lui Dumitru Nistor reduced CER from 15.26% to 0.13%. The approach enables efficient dataset curation and supports scalable model development in low-resource languages.
This paper will appear in the proceedings of IDCC 2026 (International Digital Curation Conference).
To read the full article please fill out the form below:
[gravityform id="32" title="false" description="false" ajax="true"]

