Digital Archiving in the Hungarian Széchényi Library

The story and the plans of the Hungarian Electronic Library

István Moldován

moldovan@oszk.hu

Rome, 21. october 2002.

 

 

Ladies and Gentlemen...

As we all know and feel for a couple of years: we are in the middle of a revolution. The libraries and librarians have to adopt their skills to the era of new, digital documents and information sources. In my presentation I try to give you a brief overview of the Hungarian Electronic Library - a department of our National Library, responsible for the archiving of Hungary-related digital documents.

 

Review

The main points of this presentation are:

- the importance of digital preservation

- difference between digitalisation and digital archiving

- possible approaches to the archiving of digital objects

- the Hungarian Electronic Library (how it is organised and how it works?)

- and finally: a brief summary.

 

Digital preservation

In June 2000 the Conference of Directors of the National Libraries set up a “Committee on Digital Preservation”. One year later this Committee has proposed a “UNESCO Resolution on Digital Preservation” which urges the member states of UNESCO to take appropriate actions for safeguarding the preservation of the world’s growing digital heritage. Here is a brief citation from the final resolution: “The world’s cultural, educational, scientific, public and administrative resources ... are increasingly produced, distributed and accessed only in digital form. ... Digital information is highly susceptible to technical obsolescence and physical decay and maintaining ongoing access to digital resources requires long-term commitment.” The initiative and organising role of the national libraries is especially important on this field, as this document stresses: http://www.unesco.org/webworld/portal_archives/analysis_131101.shtml

 

Digitisation and archiving

There are many digitisation projects in the libraries and outside the libraries as well. More and more old paper documents are being converted to electronic form. But much more new documents are produced and published originally on computers. In the developed countries there are big projects and financial resources to digitalise the most important part of their cultural heritage, but this is not enough. Collecting and preserving the original electronic documents are as much important as the digitisation, because the information on the Internet is changing fast or simply disappear; the off-line storage media-types become obsolete after a couple of years, as well as the document formats; the publishers often don’t keep the electronic version of their publications, etc.

 

Possible approaches in the world

Here are three different approaches to this problem in three different countries:

Netarchive.dk

It was a one-year project (from August 1, 2001 to July 31, 2002) investigating strategies for collecting and archiving Danish Internet-materials, which simultaneously were evaluated with regard to their research value: http://netarchive.dk

The Internet Archive

The Internet Archive’s collections include Web pages (since October 1996, 40 terabytes), Usenet bulletin boards and FTP sites: http://www.archive.org

Kulturarw3

The Royal Library of Sweden has started a project with the aim of long time preservation of electronic information. The goal is to collect, preserve and make available Swedish documents from the Internet: http://www.kb.se/kw3/ENG/Default.htm

 

One possible approach: The story of the Hungarian Electronic Library

In Hungary the Internet appeared in the early nineties, and in 1994 a few librarians launched the Hungarian Electronic Library (abbreviated in Hungarian: “MEK”), with the support of the Information Infrastructure Development Project (abbreviated: “IIF”). The main goal was “to collect and organise Hungarian and Hungary-related electronic documents that are freely available for scientific, educational or culture-related activities.” Homepage: http://mek.iif.hu

 

The organisation

Between 1994 and 1999 the MEK project was a civil initiative and later an association, maintained by only enthusiastic volunteers: librarians, computer experts, students and other networkers. In 1999 the project was taken over by the Hungarian National Library, and currently the MEK Department consist of 5 employees, 7-8 part-time contributors and still many not-paid volunteers.

 

Acquisition sources

We get electronic documents from various sources in various formats:

- from Web-sites (homepages of libraries, universities, research institutes, or from personal pages);

- from CD-ROMs (sometimes we buy large full-text databases from CD-ROM publishers);

- directly from authors (as the MEK becomes more popular, more and more authors send their works directly to us by e-mail or on a floppy disc, or sometimes just in printed form and we digitalise them);

- directly from universities, scientific institutes, libraries (some institutions regularly send their publications in electronic form);

- directly from publishers (occasionally we get electronic texts from ordinary book-publishers, frequently in special DTP formats);

- from volunteers (the original goal of the MEK is not the digitisation, but the growing collection of the electronic library encouraged many people on the Internet to digitalise their favourite books - mostly classical Hungarian literature - for us).

 

The content of the collection

We collect all kinds of electronic texts and some non-text document types as well, the only requirement is the scientific, cultural or educational value of the document. We have for example:

- reference books; lexicons, bibliographies, dictionaries

- classical and contemporary Hungarian literature (novels, poems, short stories)

- scientific literature (articles, books, conference or research papers)

- Hungarian literature in foreign languages

- maps, music scores...

 

The size of the collection

Currently there are more than 4.500 documents with metadata in the collection. The total size of the files is about 2,5 Gigabyte. Most of the documents are available in two or more alternative formats for on-line browsing or downloading and printing (plain text, HTML, PDF, Word, RTF etc.)

Statistics

The Hungarian Electronic Library is very popular and one of the most famous Hungarian information sources on the Internet. We have about 4.500 visitors a day from a hundred countries, and the main Web-server gets more than 100.000 hits every day. Some of the main user groups are: students, teachers, parents, blind people, Hungarians living in foreign countries...

 

Copyright

Since 1999 there is a new copyright law in Hungary which controls the Internet as well. It defines the on-line publication of the electronic documents similar to the television or radio broadcasting. The MEK has a special contract with the Hungarian Bureau for the Protection of Authors’ Rights (ARTISJUS) and has got a generic permission from it to make available novels, short stories and poetry publicly over the Internet. For scientific literature we have to ask for individual permissions from the authors or copyright owners.

 

The process of acquisition

When we find or get or buy a document we download it into a temporal storage area. Then we ask for permission from the owner of the document if it is outside the general license we have got from the ARTISJUS. The next step is the quality control (based on the printed edition, if it exists), followed by a format conversion. We convert Word, QuarkExpress, Ventura, LaTeX, Folio etc. documents to HTML, PDF, RTF and - recently - LIT e-book format (XML is planned in the near future), and use the ISO-8859-2 standard for the Hungarian accented letters and the UNICODE for other foreign or special characters. Finally we add the bibliographical metadata and other supplemental elements (cover image, book-review, related Web-sites etc.) and upload the document to the central MEK-server into the public collection.

 

A new document type: the Electronic Periodical Archive

Beside the monographic documents we have plans and a pilot project for the digital preservation of electronic periodicals: journals, newspapers, magazines, newsletters... The planned Electronic Periodical Archive (abbreviated: EPA) will have three levels:

1. Collection of the URLs and bibliographical data of every Hungarian electronic periodical (currently we have more than 1.200 addresses sorted by subjects).

2. Selective archive of the most important scientific and cultural journal issues (now there is a small collection of them here: http://epa.oszk.hu ).

3. Full-text archive of selected journal articles with detailed metadata and full-text search (planned).

 

Plans for the future

The new, second version of the Hungarian Electronic Library will be opened in 2003, but you can already test it at the http://mek.oszk.hu address. We are developing an integrated digital library system for this new service using only open standards and free software. The most important new features are:

- interoperatibility with other Internet sources: to be suitable for common search and distributed work,

- to give each document an identifier which does not change and can be refered from outside the system as well,

- improving accessibility for blind people and for people with poor Internet connection.

The most important improvements are: Dublin Core metadata for each documents, data exchange in MARC format with other library catalogues, Z39.50 server support for common search with other OPACs, Open Archive Initiative compatibility for common search with other Hungarian and international electronic document archives, persistent location identifiers using the URN server in the National Library, a special interface for blind or partially sighted people and on-line text to speech conversion, chat forums for the visitors...

 

Summary

As you can see, ladies and gentlemen, there are big challenges and great demands in the field of the digital preservation. The task of collecting, processing and servicing various types of electronic documents can be undertake effectively only by national libraries. I would like to emphasise the need of selection, quality control and post-processing of these documents. These

are new duties to the librarians, formerly done by editors, publishers and typographers. The long time preservation and the easy availability in the present of these e-documents are also equally important, and the handling of the various formats requires more and more skills from the librarian in the field of the informatics as well. I hope, we all will be able to cope with this challenge, and the libraries will be with us in the end of this century too - even in electronic form.

Thank you for your attention!