With respect to the text mining community, the organizers of BioCreative IV have initiated the BioC project in attempt to propel researches within this area by providing a universal format for text mining tools. In this fashion, various tools performing distinct tasks can be integrated seamlessly in a less time- and effort-consuming manner. As a participant, we develop a semantic role labeling BioC module, which provides semantic analysis of biomedical literatures, hoping to benefit researchers with similar interest within the text mining field. The service is available at

Semantic role labeling (SRL) is a considerable technique in natural language processing, especially for life scientists who are interested in uncovering information related to biological processes within literatures. SRL represents a sentence by one or more predicate argument structures (PAS) [1]. Each PAS is composed of a predicate (e.g., a verb) and several arguments (e.g., noun phrases) that possess different semantic roles, including main arguments such as an agent1 and a patient2, as well as adjunct arguments such as time, manner, and location. For example, the sentence “IL4 and IL13 receptors activate STAT6, STAT3, and STAT5 proteins in the human B cells” describes a molecular activation process. It can be represented by a PAS in which “activate” is the predicate, the noun phrase “IL4 and IL13 receptors” constitutes the agent, “STAT6, STAT3, and STAT5 proteins” acts as the patient, and “in the human B cells” indicates the location of occurrence. Thus, the agent, patient, and location are all arguments of the predicate. SRL not only identifies the subjects involved in these processes, but also confirms the direction of existing interactions, along with supplementary manner, location or time details. Such knowledge is essential in comprehending signaling pathways behind versatile biological mechanisms and phenomena.

To make a contribution to the BioC repository of the BioCreative IV BioC track, we developed a SRL BioC module for biomedical literatures. The BioC module is an augmentation of our previous SRL system developed under the BioProp standard and corpus [2]. We also used it in our previous web services, including BIOSMILE Web Search [3], PubMed-EX [4] and T-HOD [5]. The module supports 82 predicates and 32 argument types, with the latter manually defined as location, manner, temporal etc. Please refer to for further details. The developed module is available at .




  1. P. Kingsbury and M. Palmer (2002) From Treebank to PropBank. Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC-2002), pp. 1989–1993.
  2. W.-C. Chou, R. T.-H. Tsai, Y.-S. Su, W. Ku, T.-Y. Sung, and W.-L. Hsu (2006) A Semi-Automatic Method for Annotating a Biomedical Proposition Bank. the Proceedings of ACL Workshop on Frontiers in Linguistically Annotated Corpora, Sydney, Australia
  3. H.-J. Dai, C.-H. Huang, R. T. K. Lin, R. T.-H. Tsai, and W.-L. Hsu (2008) BIOSMILE web search: a web application for annotating biomedical entities and relations. Nucl. Acids Res., vol. 36, pp. W390-W398.
  4. R. T.-H. Tsai, H.-J. Dai, P.-T. Lai, and C.-H. Huang (2009) PubMed-EX: A web browser extension to enhance PubMed search with text mining features. Bioinformatics, vol. 25, pp. 3031-3032.
  5. H.-J. Dai, C.-Y. Wu, R. T.-H. Tsai, W.-H. Pan, and W.-L. Hsu (2013) T-HOD: A Literature-based Candidate Gene Database for Hypertension, Obesity, and Diabetes. Database: The Journal of Biological Databases and Curation.


Choose your XML file.

Result will be shown here.