Harvesting digital newspapers at the Bibliothèque nationale de France: All we need is news preservation

Abstract : Acquiring, promoting and giving access to press collections is a major objective for heritage institutions, which need to address the accelerating shift from analogue to digital documentation in order to maintain the continuity of their missions. At the National library of France (Bibliothèque nationale de France or BnF), this mission has mainly been performed in the framework of legal deposit. In 2006, a new law on copyright extended this legal deposit to the internet: its scope covers all kinds of news websites, from digital equivalents of printed newspapers to journalists' blogs and news aggregation portals. During the last ten years, the BnF has experimented two different approaches to ensure the preservation of online news: direct deposit of electronic publications and web harvesting of freely accessible new websites; the latter has been more successful than the former. In order to cover subscription based content, the BnF is experimenting currently a third solution, as a mix of what worked in the two first approaches: web harvesting through agreements with producers. This paper intends to present this third approach, and to explain how the BnF tried to implement it through a dedicated project, the "subscription-based press project". This project launched in late 2012 relies on the possibility of giving the robot a login and a password, in order to let it identify itself as a subscriber. Then, the robot is able to access and copy the protected content. Even though the crawling part was technically the most critical one, this project covered all parts of the documentary lifecycle: from selection to long term preservation, including quality control, cataloguing and access in reading rooms. The paper presents the different steps of the project, its successes and achievements (in terms of collection, technical innovation and human resources), its limits, and considers its future evolutions.
Type de document :
Communication dans un congrès
IFLA World Library and Information Congress, Aug 2014, Lyon, France. 〈http://conference.ifla.org/past-wlic/2014/ifla80.html〉
Liste complète des métadonnées

Littérature citée [3 références]  Voir  Masquer  Télécharger

https://hal-bnf.archives-ouvertes.fr/hal-01098523
Contributeur : Clément Oury <>
Soumis le : vendredi 26 décembre 2014 - 12:44:05
Dernière modification le : jeudi 19 octobre 2017 - 14:36:03
Document(s) archivé(s) le : vendredi 27 mars 2015 - 11:05:50

Fichiers

Oury-IFLA-2014-en.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

  • HAL Id : hal-01098523, version 1

Collections

Citation

Clément Oury. Harvesting digital newspapers at the Bibliothèque nationale de France: All we need is news preservation. IFLA World Library and Information Congress, Aug 2014, Lyon, France. 〈http://conference.ifla.org/past-wlic/2014/ifla80.html〉. 〈hal-01098523〉

Partager

Métriques

Consultations de la notice

157

Téléchargements de fichiers

202