Authors
Zhuoran Wang,
John Shawe-Taylor,
Anoop Shah,
Publication date
2010
Publisher
IEEE
Total citations
Description
This paper is focused on the automated identification of the clinical free-text records that contain useful information (e.g. symptoms, modifiers, diagnosis, etc) of a certain disease. We introduce a novel semi-supervised machine learning algorithm to address this problem, by training the set covering machine in a bootstrapping procedure. The advantage of the proposed technique is that not only can it find the documents of interest more accurately than searching based on diagnostic codes, the features it learned could also be directly used as a knowledge representation of the given topic and to assist either further machine learning algorithms or manual post-processing and analysis.