Legal deposit of the French Web: harvesting strategies for a national domain

Abstract : According to French Copyright Law voted on August 1 st , 2006, the Bibliothèque nationale de France ("BnF", or "the Library") is in charge of collecting and preserving the French Internet. The Library has established a "mixed model" of Web archiving, which combines broad crawls of the .fr domain, focused crawls and e-deposits. Thanks to its research partnership with the Internet Archive, BnF has performed four annual broad crawls since 2004. The last one has been made with noticeably different features: one of the most important was the use of the all-comprehensive list of the .fr domain names, given to BnF by the AFNIC (“Association française pour le nommage Internet en cooperation”, the registry for the .fr) after an agreement was signed between both institutions in September 2007. The technical choices made before and during a crawl have a decisive impact on the future shape of the collection. These decisions must therefore be taken according to the legal and intellectual frame within which the crawl is performed: for BnF, it is the five-centuries-old tradition of the legal deposit. To assess the consequences and the outcomes of the different technical solutions available, we propose to analyze the results of the BnF’s last crawl and to compare them to those of previous harvests. These studies also prove to be useful in our attempt to characterize the 2007 French Web.
Type de document :
Communication dans un congrès
International Web Archiving Workshop, Sep 2008, Aarhus, Denmark
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal-bnf.archives-ouvertes.fr/hal-01098538
Contributeur : Clément Oury <>
Soumis le : vendredi 26 décembre 2014 - 14:00:53
Dernière modification le : jeudi 19 octobre 2017 - 14:36:03
Document(s) archivé(s) le : vendredi 27 mars 2015 - 12:35:10

Fichiers

LasfarguesOuryWendland-IWAW-20...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

  • HAL Id : hal-01098538, version 1

Collections

Citation

France Lasfargues, Clément Oury, Bert Wendland. Legal deposit of the French Web: harvesting strategies for a national domain. International Web Archiving Workshop, Sep 2008, Aarhus, Denmark. 〈hal-01098538〉

Partager

Métriques

Consultations de la notice

411

Téléchargements de fichiers

1175