6th Natural Language Processing Pacific Rim Symposium Post-Conference Workshop

Language Resources in Asia

November 30, 2001
National Center of Sciences
Tokyo, Japan

Preface

This volume contains the papers presented at the workshop on Language Resources in Asia, held on 30 November 2001 in conjunction with the 6th Natural Language Processing Pacific Rim Symposium (NLPRS 2001).

Language resources play an essential role in empirical approaches to natural language processing (NLP). Previous concerted efforts on construction of language resources, particularly in the US and EU, have laid a solid foundation for the pioneering NLP researches in these two communities over the last decade. In comparison, availability and accessibility of Asian language resources is still very limited; even though Asia can boast of much richer linguistic contents in terms of cultural, historical, and structural variations.

The purpose of this workshop is to give a chance to investigate and discuss many problems related to the construction, dissemination and NLP research based on Asian language resources. According to the increase of not only the demand of multi-lingual NLP but also the size of language resources, it is very important for us to look for the way of collaborations among Asian countries in developing, sharing, and exchanging Asian language resources. We hope this first workshop can also contribute to solve these issues.


Program Committee


Table of Contents

A multilingual news database and its application to a translation memory system
Isao Goto, Naoto Kato and Terumasa Ehara
..........1

The language resources development and language processing service for Thai
Asanee Kawtrakul, Yuen Poovorawan, Frederic Andres, Mukda Suktarajarn, Patcharee Varasrai, Nithiwat Kampanya, Supavat Vongwatthaporn, Nattakan Pengphon and Chaiwat Ketsuvarn
..........7

Development of very large corpora in Thailand
Rachod Thongprasirt, Thatsanee Charoenporn, Wasin Sinthupinyo and Virach Sortlertlamvanich
..........15

Japanese-English paraphrase corpus
Satoshi Shirai, Kazuhide Yamamoto and Francis Bond
..........23

The open language archives community and Asian language resources
Steven Bird, Gary Simons and Chu-Ren Huang
..........31    (PostScript file for printing)

A bilingual corpus in the legal domain and its applications
Oi Yee Kwong, Benjamin K. Tsou, Tom B.Y. Lai, Robert W.P. Luk, Lawrence Y.L. Cheung and Francis C.Y. Chik
..........39

Defining principled but practically manageable lexical units in Japanese textual corpora
Maho Okada, Koichi Takeuchi, Masaharu Yoshioka, Kyo Kageura and Teruo Koyama
..........47

Towards a reference tagset for Japanese
Yasuhiro Kawata
..........55

Using multiple pivots to align Korean and Japanese lexical resources
Kyonghee Paik, Francis Bond and Shirai Satoshi
..........63

LERIL : Collaborative effort for creating lexical resources
Akshar Bharati, Dipti M Sharma, Vineet Chaitanya, Amba P Kulkarni and Rajeev Sangal
..........71

Combining the lexicon knowledge base with Chinese corpus processing
Duan Huiming, Hu Junfeng, Zhu Xuefeng and Yu Shiwen
..........81