Quality Assessment Method for Warping and Cropping Error Detection in Digital Repositories

  • Roman Graf AIT Austrian Institute of Technology, Vienna, Austria
  • Ross King AIT Austrian Institute of Technology, Vienna, Austria
  • Martin Suda AIT Austrian Institute of Technology, Vienna, Austria

Abstract

Two of the common challenges in the mass-digitisation of book collections are correctly cropping (removing unnecessary border from the digital image) and detection of the documents, which were warped during the automated scanning process. This paper presents a method that supports the analysis of digital collections (e.g. JPG files) for detecting common problems such as text shifted to the edge of the image, unwanted page borders, rotated text,  unwanted text from a previous page on the image, or error detection in situations when the document is physically or optically warped. One contribution of this work is a definition of evaluation use cases assessing the extent of warping and cropping problems. A second contribution is the creation of a reliable expert tool for document warping and cropping error detection based on image processing techniques. This tool can be applied in quality assessment workflows for digital book collections. Our suggested method employs evaluation parameters that can be defined for each book. By means of these parameters a preservation expert can express institutional preferences for collection analysis. We are targeting the text position annotation but are not aiming at geometric undistortion of the text. The tool works independently of the image size, format and colour. We have analysed two real world collections with correct and corrupted images, and our tool has demonstrated good recall and precision rates for both corrupted and correct images.

Published
2017-07-02
How to Cite
GRAF, Roman; KING, Ross; SUDA, Martin. Quality Assessment Method for Warping and Cropping Error Detection in Digital Repositories. Qualitative and Quantitative Methods in Libraries, [S.l.], v. 4, n. 4, p. 811-820, july 2017. ISSN 2241-1925. Available at: <http://qqml-journal.net/index.php/qqml/article/view/307>. Date accessed: 23 aug. 2019.