The authors present a general architecture for a domain and language specific application for the concise storage and presentation of the information retrieved from a wide spectrum of web-based information sources. The proposed architecture was influenced by particular challenges of knowledge intensive domain, gathering relevant documents from the internet, mining the knowledge content of unstructured textual information, demands for context driven, multi-faceted, up-to-date query and presentation of required information, furthermore by intricacies of the Hungarian language, calling for special solutions to a number of linguistic problems.

The proposed system is developed in the framework of the Information and Knowledge Fusion international EUREKA project, that globally aims at the design and implementation of new intelligent knowledge warehousing environments, which would allow advanced knowledge management in various application domains [1]. The Hungarian IKF project (IKF-H) concentrates on developing a financial advisory application that transforms data from heterogeneous and unstructured Hungarian language information sources into an integrated internal knowledge repository. This repository would serve as a decision support system for Hungarian banks and revenue services.

In order to surpass the performance of a typical information retrieval system, the process of a human information retrieval is being studied and - at least partially - followed. Even the shallowest analysis of the human performance shows that its advantage consists mainly of (1) the use of linguistic competence and (2) the benefits of background knowledge. Since linguistic techniques are rapidly being added to implemented information retrieval systems, the construction of mapping and incorporation of background knowledge becomes the biggest challenge. A suitable and efficient solution for representing part of the (human's) background knowledge is the use of ontologies [2]. One of the main goals of the IKF information retrieval system is to create a well-defined ontology that can be integrated with several document analysis techniques (indexing and searching, linguistic parser, etc.) to increase the performance of the whole information retrieval and extraction process.

Another way to utilise useful background knowledge in the retrieval system is to model some aspects of the searching and extraction mechanism of a human. The proposed system contains an autonomous document retrieval and analyser subsystem that involves efficient document searching and information extraction techniques with a source model based approach [3].

In the framework of the Hungarian IKF project, a prototype system is under development in order to implement our ideas in a real-world application.

 

[1] EUREKA PROJECT "IKF - Information and Knowledge Fusion", March 2000.

[2] N. Guarino, "Formal Ontology in Information Systems," In N.Guarino (ed.) Formal Ontology in Information Systems. Proceedings of FOIS'98, Trento, Italy, 6-8 June 1998. IOS Press, Amsterdam: 3-15.

[3] P. Varga, T. Mészáros, Cs. Dezsényi, T.P. Dobrowiecki, "An Ontology-based Information Retrieval System", The 16th International Conference on Industrial & Engineering Applications of Artificial Intelligence and Expert Systems, Loughborough, UK, June 23-26, 2003.