David R Hardoon,
Sandor Szedmak,
John Shawe-Taylor,
Publication date
Technical Report SOTON-TR-05-07, School of Electronics and Computer Science, Image, Speech and Intelligent Systems Research Group, University of Southampton
Total citations
FACULTY OF ENGINEERING, SCIENCE AND MATHEMATICS SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE IMAGE, SPEECH AND INTELLIGENT SYSTEMS GROUP by David R. Hardoon, Sandor Szedmak & John S. Shawe-Taylor Online images and their surrounding text present a particularly complex problem in image annotation. The surrounding text may only contain partial information about the image and most likely relate to the image in the general context. In this paper we propose an approach to learn the association between images and their surrounding text to automatically generate category-based documents to new image queries. The document generation is done without any image-word annotation before or during the training. We learn a semantic representation between the images and their associated documents using kernel Canonical Correlation Analysis. The semantic space provides a common representation and enables a comparison between the documents and images. This representation is then used for the generation of a new document that best fits the image query. We use text frequency and Term Frequency Inverse Document Frequency as our word representation and compare our proposed method with a standard crossrepresentation retrieval technique known as Generalised Vector Space Model.