Authors
Abstract
The OCCAM project (OCR, ClassificAtion & Machine Translation, 2019-2021), funded by
the Connecting Europe Facility programme of the European Commission, aims at
combining classification, optical character recognition (OCR) and translation technologies
(i.e. machine translation and translation memories) into a single application to support
multilingual access to scanned documents. The OCCAM application, currently supporting
five languages (Dutch, French, German, Czech and English), is machine learning based,
open-sourced, fully customisable and accessible through a user-friendly interface. The
OCCAM workflow is highly relevant in various scenarios that show an urgent need for
making non-digitised texts accessible in a multilingual context. We illustrate this with two
use cases, i.e. the processing of image-based documents in business registers and the
combination of digitising and translating historical texts in the area of digital humanities.
To read the full article please fill out the form below:
"*" indicates required fields