Quality Assessment Method for Warping and Cropping Error Detection in Digital Repositories
Two of the common challenges in the mass-digitisation of book collections are correctly cropping (removing unnecessary border from the digital image) and detection of the documents, which were warped during the automated scanning process. This paper presents a method that supports the analysis of digital collections (e.g. JPG files) for detecting common problems such as text shifted to the edge of the image, unwanted page borders, rotated text, unwanted text from a previous page on the image, or error detection in situations when the document is physically or optically warped. One contribution of this work is a deﬁnition of evaluation use cases assessing the extent of warping and cropping problems. A second contribution is the creation of a reliable expert tool for document warping and cropping error detection based on image processing techniques. This tool can be applied in quality assessment workflows for digital book collections. Our suggested method employs evaluation parameters that can be defined for each book. By means of these parameters a preservation expert can express institutional preferences for collection analysis. We are targeting the text position annotation but are not aiming at geometric undistortion of the text. The tool works independently of the image size, format and colour. We have analysed two real world collections with correct and corrupted images, and our tool has demonstrated good recall and precision rates for both corrupted and correct images.