Large-scale collections under the magnifying glass

Clément Oury

Communication Dans Un Congrès Année : 2010

Large-scale collections under the magnifying glass

Des collections de grande échelle à la loupe

(1, 2)

1
2

Clément Oury

Fonction : Auteur
PersonId : 1906
IdHAL : clement-oury
ORCID : 0000-0002-0313-9919
IdRef : 113898312

Bibliothèque nationale de France

Bibliothèque nationale de France, Département du Dépôt légal

Résumé

Institutions that perform web crawls in order to gather heritage collections have millions - or even billions - of files encoded in thousands of different formats about which they barely know anything. Many of these heritage institutions are members of the International Internet Preservation Consortium, whose Preservation Working Group decided to address the issues related to format identification in web archive. Its first goal is to design an overview of the formats to be found in different types of collections (large-, small-scale...) over time. It shows that the web seems to be becoming a more standardized space. A small number of formats - frequently open - cover from 90 to 95% of web archive collections, and we can reasonably hope to find preservation strategies for them. However, this survey is mainly built on a source - the MIME type of the file sent in the server response - that gives good statistical trends but is not fully reliable for every file. This is the reason why it appears necessary to study how to use, for web archives, identification tools developed for other kinds of digital assets.

Mots clés

Web archives file formats digital preservation

Archives Web formats de fichiers préservation numérique

Domaines

Sciences de l'information et de la communication

Fichier principal

FormatWebArchives_Oury_ipres2010.pdf (114.33 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Clément Oury : Connectez-vous pour contacter le contributeur

https://bnf.hal.science/hal-00769091

Soumis le : vendredi 28 décembre 2012-10:45:38

Dernière modification le : jeudi 16 novembre 2023-03:17:53

Archivage à long terme le : vendredi 29 mars 2013-03:48:50

Dates et versions

hal-00769091 , version 1 (28-12-2012)

Identifiants

HAL Id : hal-00769091 , version 1

Citer

Clément Oury. Large-scale collections under the magnifying glass: Format identification for web archives. 7th International Conference on Preservation of Digital Objects (iPRES), Sep 2010, Vienne, Austria. 8 p. ⟨hal-00769091⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

BNF

419 Consultations

481 Téléchargements

Large-scale collections under the magnifying glass

Des collections de grande échelle à la loupe

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager