Analysis of Gallica and Data BnF logs and Modelling of Behaviour Patterns: Presentation of the Main Results

Abstract : Gallica (http://gallica.bnf.fr) is one of the major digital libraries available for free via the Internet. It provides access to million of documents of any type and receive around 1.5 million visits per month. In the context of a research partnership between the BnF and Télécom ParisTech, an analysis of Gallica servers’ connection logs was carried out, applying machine-learning methods to them. The aim was not to collect information on users or their profiles but rather to use logs, which act as records of usage, as a basis for identifying typical clickstreams. For 15 months, a data clusterisation algorithm was developed, enabling grouping of Gallica sessions with similarities in sequencing and duration of actions . Logs analysed covered a range of durations, from a week to a month, with systematic checking of the stability of models obtained. Such learning methods take advantage of the very factor that undermines traditional methods for gathering information on usage: the extremely high numbers of connections. Despite the power of the algorithms involved, machine learning also requires numerous decisions to be taken, necessitating availability of other sources of knowledge on usages and users. For this reason, the preferred methodological choice was to have statistical models dialogue with results obtained from other approaches (ethnographic observations, interviews, etc.). The interest of the work carried out on the Gallica logs persuaded the BnF and Télécom ParisTech to add a further stage to the research devoted to Data BnF logs as well as clickstreams between Gallica, Data BnF and BnF General Catalogue.
Type de document :
Rapport
[Research Report] Bibliothèque nationale de France (Paris); Télécom ParisTech. 2017
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01968742
Contributeur : Philippe Chevallier <>
Soumis le : jeudi 3 janvier 2019 - 11:14:47
Dernière modification le : jeudi 7 février 2019 - 15:38:57

Fichier

Analysis of Gallica and Data B...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01968742, version 1

Citation

Florence D'Alché-Buc, Emmanuelle Bermès, Aude Le Moullec-Rieux, Christophe Prieur, Valérie Beaudouin, et al.. Analysis of Gallica and Data BnF logs and Modelling of Behaviour Patterns: Presentation of the Main Results. [Research Report] Bibliothèque nationale de France (Paris); Télécom ParisTech. 2017. 〈hal-01968742〉

Partager

Métriques

Consultations de la notice

21

Téléchargements de fichiers

12