Improving visual vocabularies: a more discriminative, representative and compact bag of visual words
Abstract
In this paper, we introduce three properties and their corresponding quantitative evaluation measures to assess the ability of a visual word to represent and discriminate an object class, in the context of the BoW approach. Also, based on these properties, we propose a methodology for reducing the size of the visual vocabulary, retaining those visual words that best describe an object class. Reducing the vocabulary will provide a more reliable and compact image representation. Our proposal does not depend on the quantization method used for building the set of visual words, the feature descriptor or the weighting scheme used, which makes our approach suitable to any visual vocabulary. Throughout the experiments we show that using only the most discriminative and representative visual words obtained by our proposed methodology improves the classification performance; the best results obtained with our proposed method are statistically superior to those obtained with the entire vocabularies. In the Caltech-101 dataset, average best results outperformed the baseline by a 4.6% and 4.8% in mean classification accuracy using SVM and KNN, respectively. In the Pascal VOC 2006 dataset there was a 3.2% and 7% improvement for SVM and KNN, respectively.Furthermore, these accuracy improvements were always obtained with more compact representations. Vocabularies 10 times smaller always obtained better accuracy results than the baseline vocabularies in the Caltech-101 dataset, and in the 78.1% of the experiments on the Pascal VOCdataset.References
Concepts and applications of inferential statistics, 2013.
Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst., 110(3):346--359, 2008.
A Bosch, Andrew Zisserman, and X Munoz. Image classification using random forests and ferns. IEEE 11th International Conference on Computer Vision (2007), 23(1):1{8,007.
Siddhartha Chandra, Shailesh Kumar, and C. V. Jawahar. Learning hierarchical bag of words using naive bayes clustering. In Asian Conference on Computer Vision, pages 382--395, 2012.
Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, and Cdric Bray. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1--22, 2004.
Charles Elkan. Using the triangle inequality to accelerate k-means. In Tom Fawcett and Nina Mishra, editors, ICML, pages 147--153. AAAI Press, 2003.
Peter Emerson. The original borda count and partial voting. Social Choice and Welfare, 40(2):353--358, 2013.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results. http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html.
M. Everingham, A. Zisserman, C. K. I. Williams, and L. Van Gool. The PASCAL Visual Object Classes Challenge 2006 (VOC2006) Results. http://www.pascal-network.org/challenges/VOC/voc2006/results.pdf.
Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst., 106(1):59--70, April 2007.
Basura Fernando, lisa Fromont, Damien Muselet, and Marc Sebban. Supervised learning of gaussian mixture models for visual vocabulary generation. Pattern Recognition, 45(2):897--907, 2012.
Peter V. Gehler and Sebastian Nowozin. On feature combination for multiclass object classification. In ICCV, pages 221--228. IEEE, 2009.
Y. Gong, S. Kumar, H. A. Rowley, and S. Lazebnik. Learning binary codes for high-dimensional data using bilinear projections. In CVPR 2013, 2013.
H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. Pattern Analysis and Machine Intellingence, 33(1):117--128, 2011.
Herv Jgou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell., 33(1):117--128, 2011.
Mingyuan Jiu, Christian Wolf, Christophe Garcia, and Atilla Baskurt. Supervised learning and codebook optimization for bag of words models. Cognitive Computation, 4:409--419, December 2012.
Kraisak Kesorn and Stefan Poslad. An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Transactions on Multimedia, 14(1):211--222, 2012.
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. 2006 IEEEComputer Society Conference on Computer Vision and Pattern Recognition Volume 2 CVPR06, 2(2169-2178):2169--2178, 2006.
Gang Liu. Improved bags-of-words algorithm for scene recognition. Journal of Computational Information Systems, 6(14):4933--4940, 2010.
Jingen Liu and Mubarak Shah. Learning human actions via information maximization. 2013 IEEE Conference on Computer Vision and Pattern Recognition, 0:1--8, 2008.
R.J. Lopez-Sastre, T. Tuytelaars, F.J. Acevedo-Rodriguez, and S. Maldonado-Bascon. Towards a more discriminative and semantic visual vocabulary. Computer Vision and Image Understanding, 115(3):415--425, 2011. Special issue on Feature-Oriented Image and Video Computing for Extracting Contexts and Semantics.
David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004.
Zhiwu Lu and Horace Ho-Shing Ip. Image categorization with spatial mismatch kernels. In CVPR, pages 397--404. IEEE, 2009.
Jianzhao Qin and Nelson Hon Ching Yung. Feature fusion within local region using localized maximum-margin learning for scene categorization. Pattern Recognition, 45(4):1671--1683, 2012.
Chih-Fong Tsai. Bag-of-words representation in image annotation: A review. ISRN Artificial Intelligence, 2012, 2012.
A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. Pattern Analysis and Machine Intellingence, 34(3), 2011.
Jianxin Wu, Wei-Chian Tan, and James M. Rehg. Efficient and effective visual codebook generation using additive kernels. Journal of Machine Learning Research, 2:3097--3118, 2011.
Shiliang Zhang, Qi Tian, Gang Hua, Qing-ming Huang, and Wen Gao. Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Transactions on Image Processing, 20(9):2664--2677, 2011.
Shiliang Zhang, Qi Tian, Gang Hua, Wengang Zhou, Qingming Huang, Houqiang Li, and Wen Gao. Modeling spatial and semantic cues for large-scale near-duplicated image retrieval. Computer Vision and Image Understanding, 115(3):403--414, 2011.
Y. Zhang, J.Wu, and J. Cai. Compact representation for image classification: To choose or to compress? In CVPR 2014, 2014.
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika