research abstract: "With the information overload in genome-related field, there is an infreest need for natural language processing technology to extract information from literature and various attemps of informing extraction using NLP has been being made. We are developing the necessary resources including domain antology and annotated corpus from reseaech abstract in MEDLINE database (GENIA corpus). We are building the ontology and corpus simultaneously, using each other. In this paper we report on our new corpus, its ontological basis, annotation scheme, and statistics of annotated objects. We also describe the tools used for corpus annotations and management."

