Authors
Abstract
The OCCAM project (OCR, ClassificAtion & Machine Translation, 2019-2021), funded by
the Connecting Europe Facility programme of the European Commission, aims at
combining classification, optical character recognition (OCR) and translation technologies
(i.e. machine translation and translation memories) into a single application to support
multilingual access to scanned documents. The OCCAM application, currently supporting
five languages (Dutch, French, German, Czech and English), is machine learning based,
open-sourced, fully customisable and accessible through a user-friendly interface. The
OCCAM workflow is highly relevant in various scenarios that show an urgent need for
making non-digitised texts accessible in a multilingual context. We illustrate this with two
use cases, i.e. the processing of image-based documents in business registers and the
combination of digitising and translating historical texts in the area of digital humanities.

