Analysis of Gallica and Data BnF logs and Modelling of Behaviour Patterns: Presentation of the Main Results

Abstract : Gallica ( is one of the major digital libraries available for free via the Internet. It provides access to million of documents of any type and receive around 1.5 million visits per month. In the context of a research partnership between the BnF and Télécom ParisTech, an analysis of Gallica servers’ connection logs was carried out, applying machine-learning methods to them. The aim was not to collect information on users or their profiles but rather to use logs, which act as records of usage, as a basis for identifying typical clickstreams. For 15 months, a data clusterisation algorithm was developed, enabling grouping of Gallica sessions with similarities in sequencing and duration of actions . Logs analysed covered a range of durations, from a week to a month, with systematic checking of the stability of models obtained. Such learning methods take advantage of the very factor that undermines traditional methods for gathering information on usage: the extremely high numbers of connections. Despite the power of the algorithms involved, machine learning also requires numerous decisions to be taken, necessitating availability of other sources of knowledge on usages and users. For this reason, the preferred methodological choice was to have statistical models dialogue with results obtained from other approaches (ethnographic observations, interviews, etc.). The interest of the work carried out on the Gallica logs persuaded the BnF and Télécom ParisTech to add a further stage to the research devoted to Data BnF logs as well as clickstreams between Gallica, Data BnF and BnF General Catalogue.
Complete list of metadatas
Contributor : Philippe Chevallier <>
Submitted on : Thursday, January 3, 2019 - 11:14:47 AM
Last modification on : Wednesday, July 3, 2019 - 3:02:03 PM
Long-term archiving on : Thursday, April 4, 2019 - 2:06:34 PM


Analysis of Gallica and Data B...
Files produced by the author(s)


  • HAL Id : hal-01968742, version 1


Florence d'Alché-Buc, Valérie Beaudouin, Emmanuelle Bermès, Philippe Chevallier, Aude Le Moullec-Rieux, et al.. Analysis of Gallica and Data BnF logs and Modelling of Behaviour Patterns: Presentation of the Main Results. [Research Report] Bibliothèque nationale de France (Paris); Télécom ParisTech. 2017. ⟨hal-01968742v1⟩



Record views


Files downloads