Clinical Named Entity Recognition (Clinical-NER), which aims at identifying and classifying clinical named entities into predefined categories, is a critical pre-processing task in health information
systems. Different machine learning approaches have been used to extract and classify clinical named entities. Each approach has its own strength as well as weakness when considered individually.
Ensemble technique uses the strength of one approach to overcome the weakness of another approach by combining the
outputs of different classifiers in order to make the decision thereby improving the results. Segment representation is a technique that is used to add a tag for each token in a given text. In this paper, we propose an ensemble approach to combine the outputs of four different base classifiers in two different ways, namely, majority voting and stacking. We have used support vector machines to train the base classifiers with different segment representation models namely IOB2, IOE2, IOBE and IOBES. The proposed algorithm is evaluated on a well-known clinical dataset i2b2 2010 corpus and results obtained illustrate that the proposed approach outperforms the performance of each of the base classifiers. |