Japanese |
Generic Engine for Transposable Association (GETA) has been developed under the auspices of Dokusouteki Jouhougijutu Ikusei Jigyo (Innovative Inforamtion Technology Incubation Project) of IPA (Information-technology Promotion Agency of Japan), aiming at an efficient tool for manipulating very large dimensional sparse matrices, which typically appear as index files for large scale text retrieval. This engine can be directly used to realize associative searching systems, which accept a group of texts as queries, and return highly related texts in the relevance order. The usefulness of this type of associative search has long been recognized because this covers the weakness of ordinary key-word search which sometimes returns no hit and sometimes returns too many hits. But a serious problem of associative search is its very high computational cost, and that has long prevented associative search from prevailing as a standard searching method. GETA is expected to solve this problem. We built an experimental associative searching system using GETA for about one million texts, and verified that real-time associative search (less than 10 sec. response time) is possible with an ordinary single CPU PC. In order for higher scalability, we have also developed parallel processing type of GETA, with which a real time associative search is possible up to about 10 million texts, using, for example, 8-nodes PC cluster. The use of GETA is not limited to associative search, but it can be applied to a large variety of text processing techniques, such as text categorization, text clustering, and text summarization. We hope this tool will be used for accelerating research and practical application of these and other text processing techniques. GETA was developed under the cooperation of following institutions:
|
|
|
For more information, please contact
aki@nii.ac.jp
(Takano @ National Institute of Informatics)
or yniwa@harl.hitachi.co.jp (Niwa @ Central Research Lab. of Hitachi, Ltd). |