START Conference Manager    

Identifying the Epistemic Value of Discourse Segments in Biology Texts

Anita de Waard, Paul Buitelaar and Thomas Eigner

Eighth International Conference on Computational Semantics (IWCS-8 2009)
Tilburg University, Netherlands, January 7-9, 2009


Our research concerns the classification of sentences in biology texts by ’epistemic segment type’, with the purpose of enabling a better way to summarize, mine and compare statements within biology texts. The current paper describes a first venture into doing this in a computational way. The identification of segment types is useful for ascertaining the epistemic value of a specific segment. For example, Fact segments are taken from another source of knowledge (explicitly referred to or presumed to be known) and therefore not experimentally ascertained in the research paper, whereas Result segments are obtained by measurements discussed in the paper itself. To investigate if we could use a set of manually defined markers for the automatic identification of segment type, we applied them to an independently developed data set of PubMed articles. We then randomly selected and evaluated 100 sentences to which one out of five segment types (Hypothesis, Implication, Method, Goal, Result) was assigned. Results were encouraging as only 30 out of 100 assignments were incorrect.

START Conference Manager (V2.56.8 - Rev. 414)