Analysis of Gallica and Data BnF logs and Modelling of Behaviour Patterns - BnF - Bibliothèque nationale de France Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2017

Analysis of Gallica and Data BnF logs and Modelling of Behaviour Patterns

Résumé

Gallica (http://gallica.bnf.fr) is one of the major digital libraries available for free via the Internet. It provides access to million of documents of any type and receive around 1.5 million visits per month. In the context of a research partnership between the BnF and Télécom ParisTech, an analysis of Gallica servers’ connection logs was carried out, applying machine-learning methods to them. The aim was not to collect information on users or their profiles but rather to use logs, which act as records of usage, as a basis for identifying typical clickstreams. For 15 months, a data clusterisation algorithm was developed, enabling grouping of Gallica sessions with similarities in sequencing and duration of actions . Logs analysed covered a range of durations, from a week to a month, with systematic checking of the stability of models obtained. Such learning methods take advantage of the very factor that undermines traditional methods for gathering information on usage: the extremely high numbers of connections. Despite the power of the algorithms involved, machine learning also requires numerous decisions to be taken, necessitating availability of other sources of knowledge on usages and users. For this reason, the preferred methodological choice was to have statistical models dialogue with results obtained from other approaches (ethnographic observations, interviews, etc.). The interest of the work carried out on the Gallica logs persuaded the BnF and Télécom ParisTech to add a further stage to the research devoted to Data BnF logs as well as clickstreams between Gallica, Data BnF and BnF General Catalogue.
Fichier principal
Vignette du fichier
Analysis of Gallica and Data BnF logs and modelling of behaviour patterns.pdf (238.75 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01968742 , version 1 (03-01-2019)
hal-01968742 , version 2 (18-04-2019)

Identifiants

  • HAL Id : hal-01968742 , version 2

Citer

Florence d'Alché-Buc, Valérie Beaudouin, Emmanuelle Bermès, Philippe Chevallier, Aude Le Moullec-Rieux, et al.. Analysis of Gallica and Data BnF logs and Modelling of Behaviour Patterns: Presentation of the Main Results. [Research Report] Bibliothèque nationale de France (Paris); Télécom ParisTech. 2017. ⟨hal-01968742v2⟩
510 Consultations
263 Téléchargements

Partager

Gmail Facebook X LinkedIn More