Large-scale collections under the magnifying glass: Format identification for web archives

Abstract : Institutions that perform web crawls in order to gather heritage collections have millions - or even billions - of files encoded in thousands of different formats about which they barely know anything. Many of these heritage institutions are members of the International Internet Preservation Consortium, whose Preservation Working Group decided to address the issues related to format identification in web archive. Its first goal is to design an overview of the formats to be found in different types of collections (large-, small-scale...) over time. It shows that the web seems to be becoming a more standardized space. A small number of formats - frequently open - cover from 90 to 95% of web archive collections, and we can reasonably hope to find preservation strategies for them. However, this survey is mainly built on a source - the MIME type of the file sent in the server response - that gives good statistical trends but is not fully reliable for every file. This is the reason why it appears necessary to study how to use, for web archives, identification tools developed for other kinds of digital assets.
Type de document :
Communication dans un congrès
7th International Conference on Preservation of Digital Objects (iPRES), Sep 2010, Vienne, Austria. 8 p., 2010
Liste complète des métadonnées

Littérature citée [8 références]  Voir  Masquer  Télécharger

https://hal-bnf.archives-ouvertes.fr/hal-00769091
Contributeur : Clément Oury <>
Soumis le : vendredi 28 décembre 2012 - 10:45:38
Dernière modification le : jeudi 19 octobre 2017 - 14:36:03
Document(s) archivé(s) le : vendredi 29 mars 2013 - 03:48:50

Fichier

FormatWebArchives_Oury_ipres20...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00769091, version 1

Collections

Citation

Clément Oury. Large-scale collections under the magnifying glass: Format identification for web archives. 7th International Conference on Preservation of Digital Objects (iPRES), Sep 2010, Vienne, Austria. 8 p., 2010. 〈hal-00769091〉

Partager

Métriques

Consultations de la notice

678

Téléchargements de fichiers

322