Image Retrieval in Digital Libraries - A Large Scale Multicollection Experimentation of Machine Learning techniques

Abstract : While historically digital heritage libraries were first powered in image mode, they quickly took advantage of OCR technology to index printed collections and consequently improve the scope and performance of the information retrieval services offered to users. But the access to iconographic resources has not progressed in the same way, and the latter remain in the shadows: manual incomplete and heterogeneous indexation, data silos by iconographic genre. Today, however, it would be possible to make better use of these resources, especially by exploiting the enormous volumes of OCR produced during the last two decades, and thus valorize these engravings, drawings, photographs, maps, etc. for their own value but also as an attractive entry point into the collections, supporting discovery and serenpidity from document to document and collection to collection. This article presents an ETL (extract-transform-load) approach to this need, that aims to: Identify and extract iconography wherever it may be found, in image collections but also in printed materials (dailies, magazines, monographies); Transform, harmonize and enrich the image descriptive metadata (in particular with machine learning classification tools); Load it all into a web app dedicated to image retrieval. The approach is pragmatically dual, since it involves leveraging existing digital resources and (virtually) on-the-shelf technologies.
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://hal-bnf.archives-ouvertes.fr/hal-01779654
Contributor : Jean-Philippe Moreux <>
Submitted on : Thursday, April 26, 2018 - 5:35:32 PM
Last modification on : Wednesday, May 16, 2018 - 1:02:13 AM

File

000-moreux-chiron_IFLA-Dresden...
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01779654, version 1

Collections

Citation

Jean-Philippe Moreux, Guillaume​ Chiron. Image Retrieval in Digital Libraries - A Large Scale Multicollection Experimentation of Machine Learning techniques. IFLA News Media Section, May 2017, Dresde, Germany. ⟨hal-01779654⟩

Share

Metrics

Record views

96

Files downloads

130