Effectiveness of Query Expansion using Flickr Related Tags

Information retrieval is an iterative and interactive process. In other words, searchers who are not satisfied with their initial search results revise their queries through addition, deletion, or change of their search terms. Through the analysis of a 2001 Excite Web search log, the author found that 26% of initial queries for image search were revised. Therefore, one possible approach for enhancing image retrieval effectiveness would be to design an image retrieval system which can guide users to reformulate their queries.

In the information retrieval field, query expansion (either automatic or end-user controlled) via semantically related terms which are reflected on a thesaurus has been explored as a way of improving search effectiveness (Efthimiadis, 2000; Greenberg, 2001). Although there are several thesauri for image documents, there is no study which examines the effectiveness of those thesauri for query expansion. In addition, since those thesauri are designed for specialized image collections (art image, historical image, architecture, and so on), limitations of those thesauri for general image collections has been discussed.

Recently, user-supplied tags received researchers' attention as a user-centered indexing mechanism, and many studies have been conducted to investigate the potential and features of user-supplied tags. The author's previous study (Yoon, in press) examined semantic features of Flickr tags, and their related tags, and demonstrated that semantic relations represented in Flickr related tags have potential in expanding search queries and representing users' unexpressed image needs. The following support the authors' suggestion:

• Semantic relations represented in Flickr related tags showed that a concept holds semantically related concepts not only in its own image attributes but also in other attributes. For example, related tags for 'Happy' are 'girl,' 'smile,' 'fun,' 'kids,' and so on and related tags for 'birthday' are 'fun,' 'happy,' 'party,' 'cake,' and so on.

• Comparison between the Library of Congress Thesaurus of Graphic Materials (LCTGM) related terms (including NT, BT, and RT) and Flickr related tags demonstrated that approximately 10% of LCTGM related terms matched with Flickr related tags. Compared to Flickr related tags, LCTGM related terms show more conceptually structured semantic relations and include more conceptual, abstract and symbolic terms. It was also possible to observe the potential of Flickr related tags. For instance, for a user who tried to query "Lazy," which related terms would be more useful in finding images among "mental states," "deadly sins," "cat," and "sleepy"? (The first two are from LCTGM and the last two are from Flickr related tags.)

Therefore, even though Flickr related tags may not be alternatives of successions of traditional indexing approaches, it is worthwhile to investigate how Flickr tags can be utilized in helping users search image collections and be incorporated with traditional approach.

Based on the previous study results, the current study aims to demonstrate the effectiveness of Flickr related tags for query expansion.

• RQ1: To what extent expanded queries through Flickr related tags are useful in improving search effectiveness?

• RQ2: Which are more useful in query expansion, BT, NT, and RT of LCTGM or Flickr related tags?

• RQ3: What are the characteristics of useful expanded queries during the image search process?

For exploring the three research questions, two surveys and a post-interview are administered to approximately 50 participants. First, participants are asked to provide image search queries which could be sent to a search engine and to explain their search motivations. Then, by adopting rules used in previous studies on query expansion (Greenberg, 2001; Kristensen, 1993), the researcher manually extends each query using three Flickr related tags. Initial queries which mapped with LCTGM are also expanded using BT, NT and RT. An initial query given by a participant and expanded queries are sent through Imagery, an image search engine, a set of 20 images are selected per each query, and each set of images are arranged to be fitted in one page of screen. Second, participants are asked to response the following questions: (1) which set of images is the most appropriate to your search, (2) which image is the most appropriate to your search, and (3) which images are appropriate to your search (i.e., indicate images which would be used for your search request). Third, a post-interview may be conducted if clearer explanations are necessary.

2009 International Conference on Dublin Core, Proceedings (DCMI), p. 133-134