Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Legal deposit of the French Web: harvesting strategies for a national domain

Abstract : According to French Copyright Law voted on August 1 st , 2006, the Bibliothèque nationale de France ("BnF", or "the Library") is in charge of collecting and preserving the French Internet. The Library has established a "mixed model" of Web archiving, which combines broad crawls of the .fr domain, focused crawls and e-deposits. Thanks to its research partnership with the Internet Archive, BnF has performed four annual broad crawls since 2004. The last one has been made with noticeably different features: one of the most important was the use of the all-comprehensive list of the .fr domain names, given to BnF by the AFNIC (“Association française pour le nommage Internet en cooperation”, the registry for the .fr) after an agreement was signed between both institutions in September 2007. The technical choices made before and during a crawl have a decisive impact on the future shape of the collection. These decisions must therefore be taken according to the legal and intellectual frame within which the crawl is performed: for BnF, it is the five-centuries-old tradition of the legal deposit. To assess the consequences and the outcomes of the different technical solutions available, we propose to analyze the results of the BnF’s last crawl and to compare them to those of previous harvests. These studies also prove to be useful in our attempt to characterize the 2007 French Web.
Document type :
Conference papers
Complete list of metadata

Cited literature [21 references]  Display  Hide  Download
Contributor : Clément Oury Connect in order to contact the contributor
Submitted on : Friday, December 26, 2014 - 2:00:53 PM
Last modification on : Monday, October 19, 2020 - 11:04:51 AM
Long-term archiving on: : Friday, March 27, 2015 - 12:35:10 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License


  • HAL Id : hal-01098538, version 1



France Lasfargues, Clément Oury, Bert Wendland. Legal deposit of the French Web: harvesting strategies for a national domain. International Web Archiving Workshop, Sep 2008, Aarhus, Denmark. ⟨hal-01098538⟩



Record views


Files downloads