Authors
William Martin,
John Shawe-Taylor,
Publication date
2012
Publisher
Total citations
Description
A set of texts is often a poor representation of the language it is written in, and resultantly topics can seem nonsensical to domain experts. This can be for several reasons: misspellings or ‘accidental words’ can be given statistical significance in the case that too many topics are learned; words can appear related or unrelated in the text, even though the opposite is true in the language; too few topics or too many topics are used. In this position paper we present a novel approach by applying biases derived from external sources during the training process, in order to improve the coherence of topics. This has the effect of improving topic coherence [Newman et al., 2009, 2010], ironing out many of the issues that a sub-optimal number of topics can cause, and imbuing resultant models with real-world word-relationships.