Multilingual Indexing, Navigation and Editing Extensions
for the World-Wide Web
Gregor Erbach, G?nter Neumann, Hans Uszkoreit
Language Technology Lab
This paper gives an overview of the project MULINEX, which is a "leading-edge application project" funded in the Telematics Application Programme (Language Engineering Sector) of the European Union. The goal of the project is the development of a set of tools to allow crosslanguage text retrieval for the WWW, concept-based indexing, navigation tools and webiste management facilities for multilingual WWW sites. The project takes a user-centered approach in which the user needs drive the development activities and set the research agenda.
1 Overview and Objectives
MULINEX is a "leading-edge application project" which addresses the requirements of two kinds of users: web content providers and service operators who wish to provide multilingual information, and the customers of such multilingual information services (henceforth referred to as end users). The objective of the project is to provide multilingual search, retrieval and navigation functionalities for the WWW.
Leading-edge application projects aim at advanced applications based on existing or emerging IC components and novel Language Engineering technologies. The goal is to meet user requirements dictated by socio-economic changes over the next few years. (from the call for project proposals for the Telematics Application Programme).
The socio-economic changes addressed by the MULINEX project are the emergence and widespread acceptance of the WWW, the increasing availability of gigabytes of information in different languages, and the increasing number of people with different mother tongues who need to find information on the web.
Providers of web search engines are already producing
localised versions for different countries (e.g., lycos.de
for Germany), but so far these provide only the user
interface and the advertisements in the local language,
but the search and retrieval process itself is not languageaware.
The technologies to be used in the project include a stateof-the-art
information retrieval system, advanced
linguistic processing tools (morphological analysis, information
extraction, lexical semantics), algorithms for
alignment of translated texts and terminology extraction,
and machine translation systems.
The intended prototype application can run entirely on the server of a content provider or search service operator, so that the end user needs only a standard web browser such as Netscape Navigator, Alis Tango or Microsoft Explorer. The project is committed to supporting open web standards and will avoid dependence on proprietary formats and solutions, in order to make the results applicable to a wider user base. The application will be realised as a group of interacting tools which improve access to information (search and navigation) in multilingual web document collections, and support the creation and maintenance of multilingual content for the web by information providers. The set of tools will provide the following search, retrieval and navigation functionality for the end user:
1. search by a combination of keywords, phrases, and concepts
2. retrieval of documents in different languages with one monolingual query through multilingual indexing
3. online generation and presentation of navigation maps or menus for supporting interactive refinement of query and search
4. exploitation of context and user profiling information for selecting relevant documents
In addition, it will offer functionalities for the management of multilingual websites. These will only be