Text and Data Mining for the National Library of Greece in consideration of Internet Security and GDPR
Text and Data Mining (TDM) as a technological option is usually leveraged upon by large libraries worldwide in the technologically enhanced processes of web-harvesting and web-archiving with the aim to collect, download, archive, and preserve content and works that are found available on the Internet. TDM is used to index, analyze, evaluate and interpret mass quantities of works including texts, sounds, images or data through an automated "tracking and pulling" process of online material. Access to the web content and works available online are subject to restrictions by legislation, especially to laws pertaining to Copyright, Industrial Property Rights and Data Privacy. As far as Data Privacy is concerned, the application of the General Data Protection Regulation (GDPR) is considered as an issue of vital importance for the smooth operation of TDM service offered by national libraries mostly in the EU Member States, which among other requirements mandates the adoption of privacy-by-design and advanced security techniques. This article focuses on the TDM deployed by National Library of Greece (NLG) and considerations for applied Internet Security solutions taking into account GDPR requirements. NLG has deployed TDM as of February 2017 in consideration of the provision of art.4(4)(b) of Law 4452/2017, as well as of the provisions of Regulation 2016/679/EU (GDPR). Art.4(4)(b) of law 4452/2017 sets the TDM activity in Greece under the responsibility of NLG, appointed as the organization to undertake, allocate and coordinate the action of archiving the Hellenic web, i.e. as the organization responsible for text and data analysis at national level in Greece. The deployment of TDM by NLG, presented by the authors, caters for a framework of technical and legal considerations, so that the electronic service enabled based on the TDM operation complies with the data protection requirements set by the new EU legislation. While the presentation elaborates upon minimum set of technical Internet Security means considered by NLG for achieving GDPR compliance, the paper (to-be-published) focuses on TDM and GDPR issues specifically in relation to art.89 of GDPR titled “Safeguards and derogations relating to processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes” that is a key-tern ruling for the operation of NLG in compliance with GDPR.