Legal deposit of the French Web: harvesting strategies for a national domain

Abstract : According to French Copyright Law voted on August 1 st , 2006, the Bibliothèque nationale de France ("BnF", or "the Library") is in charge of collecting and preserving the French Internet. The Library has established a "mixed model" of Web archiving, which combines broad crawls of the .fr domain, focused crawls and e-deposits. Thanks to its research partnership with the Internet Archive, BnF has performed four annual broad crawls since 2004. The last one has been made with noticeably different features: one of the most important was the use of the all-comprehensive list of the .fr domain names, given to BnF by the AFNIC (“Association française pour le nommage Internet en cooperation”, the registry for the .fr) after an agreement was signed between both institutions in September 2007. The technical choices made before and during a crawl have a decisive impact on the future shape of the collection. These decisions must therefore be taken according to the legal and intellectual frame within which the crawl is performed: for BnF, it is the five-centuries-old tradition of the legal deposit. To assess the consequences and the outcomes of the different technical solutions available, we propose to analyze the results of the BnF’s last crawl and to compare them to those of previous harvests. These studies also prove to be useful in our attempt to characterize the 2007 French Web.
Document type :
Conference papers
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal-bnf.archives-ouvertes.fr/hal-01098538
Contributor : Clément Oury <>
Submitted on : Friday, December 26, 2014 - 2:00:53 PM
Last modification on : Thursday, October 19, 2017 - 2:36:03 PM
Long-term archiving on : Friday, March 27, 2015 - 12:35:10 PM

Files

LasfarguesOuryWendland-IWAW-20...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : hal-01098538, version 1

Collections

Citation

France Lasfargues, Clément Oury, Bert Wendland. Legal deposit of the French Web: harvesting strategies for a national domain. International Web Archiving Workshop, Sep 2008, Aarhus, Denmark. ⟨hal-01098538⟩

Share

Metrics

Record views

463

Files downloads

1336