Skip to Main content Skip to Navigation
Conference papers

Image Retrieval in Digital Libraries - A Large Scale Multicollection Experimentation of Machine Learning techniques

Abstract : While historically digital heritage libraries were first powered in image mode, they quickly took advantage of OCR technology to index printed collections and consequently improve the scope and performance of the information retrieval services offered to users. But the access to iconographic resources has not progressed in the same way, and the latter remain in the shadows: manual incomplete and heterogeneous indexation, data silos by iconographic genre. Today, however, it would be possible to make better use of these resources, especially by exploiting the enormous volumes of OCR produced during the last two decades, and thus valorize these engravings, drawings, photographs, maps, etc. for their own value but also as an attractive entry point into the collections, supporting discovery and serenpidity from document to document and collection to collection. This article presents an ETL (extract-transform-load) approach to this need, that aims to: Identify and extract iconography wherever it may be found, in image collections but also in printed materials (dailies, magazines, monographies); Transform, harmonize and enrich the image descriptive metadata (in particular with machine learning classification tools); Load it all into a web app dedicated to image retrieval. The approach is pragmatically dual, since it involves leveraging existing digital resources and (virtually) on-the-shelf technologies.
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download
Contributor : jean-Philippe MOREUX Connect in order to contact the contributor
Submitted on : Thursday, April 26, 2018 - 5:35:32 PM
Last modification on : Thursday, May 12, 2022 - 3:35:58 PM


Publisher files allowed on an open archive


  • HAL Id : hal-01779654, version 1



Jean-Philippe Moreux, Guillaume​ ​ Chiron. Image Retrieval in Digital Libraries - A Large Scale Multicollection Experimentation of Machine Learning techniques. IFLA News Media Section, May 2017, Dresde, Germany. ⟨hal-01779654⟩



Record views


Files downloads