Japanese


Generic Engine for Transposable Association (GETA)


last update: Dec 19, 2003

 Introduction

Generic Engine for Transposable Association (GETA) has been developed under the auspices of Dokusouteki Jouhougijutu Ikusei Jigyo (Innovative Inforamtion Technology Incubation Project) of IPA (Information-technology Promotion Agency of Japan), aiming at an efficient tool for manipulating very large dimensional sparse matrices, which typically appear as index files for large scale text retrieval.

This engine can be directly used to realize associative searching systems, which accept a group of texts as queries, and return highly related texts in the relevance order. The usefulness of this type of associative search has long been recognized because this covers the weakness of ordinary key-word search which sometimes returns no hit and sometimes returns too many hits. But a serious problem of associative search is its very high computational cost, and that has long prevented associative search from prevailing as a standard searching method.

GETA is expected to solve this problem. We built an experimental associative searching system using GETA for about one million texts, and verified that real-time associative search (less than 10 sec. response time) is possible with an ordinary single CPU PC.

In order for higher scalability, we have also developed parallel processing type of GETA, with which a real time associative search is possible up to about 10 million texts, using, for example, 8-nodes PC cluster.

The use of GETA is not limited to associative search, but it can be applied to a large variety of text processing techniques, such as text categorization, text clustering, and text summarization. We hope this tool will be used for accelerating research and practical application of these and other text processing techniques.

GETA was developed under the cooperation of following institutions:

  • Central Research Lab, Hitachi, Ltd.
  • National Institute of Informatics
  • Tokyo Institute of Technology
  • Japan Advanced Institute of Science and Technology
  • The National Institute of Japanese Literature

 Software Package

  • GETA Ver. 3 (for plural CPUs of plural PCs)
    dw.082803.tar.gz    (2003/04/10, 08/28 : minor bugs fixed.)

  • GETA Ver. 2 (for single PC)
    geta2_200202.tgz

    All attached documents are in Japanese (sorry).

    OS: FreeBSD is most recommended. (Major Linux's are also OK.)

 Remarks

    The above packages are freely available.
    In case of commertial use or use at publicly-accessible web sites, please attach the folloing logo at a major point of the products or the main web pages.


    Following logos are also available.
               
           

    We will highly appreciate if you will send us a brief note of the use of GETA (see the contact address at the bottom of this page).

 Related Paper (What can be done using GETA rather than what is GETA.)


For more information, please contact aki@nii.ac.jp (Takano @ National Institute of Informatics)
or yniwa@harl.hitachi.co.jp (Niwa @ Central Research Lab. of Hitachi, Ltd).