Skeleton-aware Multi-scale Heatmap Regression for 2D Hand Pose Estimation
DOI:
https://doi.org/10.31449/inf.v45i4.3470Abstract
Hand pose estimation plays an essential role in sign language understanding and human-computer interaction. Existing RGB-based 2D hand pose estimation methods learn the joint locations from a single resolution, which is not suitable for different hand sizes. To tackle this problem, we propose a new deep learning-based framework that consists of two main modules. The first one presents a segmentation-based approach to detect the hand skeleton and localize the hand bounding box. The second module regresses the 2D joint locations through a multi-scale heatmap regression approach that exploits the predicted hand skeleton as a constraint to guide the model. Moreover, we construct a new dataset that is suitable for both hand detection and pose estimation tasks. It includes the hand bounding boxes, the 2D keypoints, the 3D poses and their corresponding RGB images. We conduct extensive experiments on two datasets to validate our method. Qualitative and quantitative results demonstrate that the proposed method outperforms the state-of-the-art and recovers the pose even in cluttered images and complex poses.References
El-Sawah A, Georganas ND, Petriu EM. A prototype for 3-D hand tracking and posture estimation. IEEE Transactions on Instrumentation and Measurement. 2008 Jun 27;57(8):1627-1636.
Supancic JS, Rogez G, Yang Y, Shotton J, Ramanan D. Depth-based hand pose estimation: data, methods, and challenges. In Proceedings of the IEEE international conference on computer vision 2015 (pp. 1868-1876).
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition 2016 (pp. 770-778).
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition 2015 (pp. 3431-3440).
Ren S, He K, Girshick R, Sun J. Faster r-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems 2015 (pp. 91-99).
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems 2012 (pp. 1097-1105).
Tompson J, Stein M, Lecun Y, Perlin K. Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics (ToG). 2014 Sep 23;33(5):169-179.
Spurr A, Song J, Park S, Hilliges O. Cross-modal deep variational hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018 (pp. 89-98).
Wan C, Probst T, Van Gool L, Yao A. Dense 3d regression for hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018 (pp. 5147-5156).
Zimmermann C, Brox T. Learning to estimate 3d hand pose from single RGB images. In Proceedings of the IEEE International Conference on Computer Vision 2017 (pp. 4903-4911).
Spurr A, Song J, Park S, Hilliges O. Cross-modal deep variational hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018 (pp. 89-98)
Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S et al. Ganerated hands for real-time 3d hand tracking from monocular RGB. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018 (pp. 49-59).
Gomez-Donoso F, Orts-Escolano S, Cazorla M. Large-scale Multiview 3D hand pose dataset. Image and Vision Computing. 2019 Jan 1; 81:25-33.
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention; Springer, Cham; 2015 Oct 5 (pp. 234-241).
Ren Z, Meng J, Yuan J, Zhang Z. Robust hand gesture recognition with Kinect sensor. In Proceedings of the 19th ACM international conference on Multimedia 2011 Nov 28 (pp. 759-760).
Hammer JH, Voit M, Beyerer J. Motion segmentation and appearance change detection based 2D hand tracking. In2016 19th International Conference on Information Fusion (FUSION) 2016 Jul 5 (pp. 1743-1750).
Kumar A, Zhang D. Personal recognition using hand shape and texture. IEEE Transactions on image processing. 2006 Jul 17;15(8):2454-2461.
Ong EJ, Bowden R. A boosted classifier tree for hand shape detection. In Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings. 2004 May 19 (pp.889-894).
Liu Z, Chai X, Liu Z, Chen X. Continuous gesture recognition with a hand-oriented spatiotemporal feature. In Proceedings of the IEEE International Conference on Computer Vision 2017 (pp. 3056-3064).
Hoang Ngan Le T, Zheng Y, Zhu C, Luu K, Savvides M. Multiple scale faster-RCNN approach to driver's cell-phone usage and hands-on steering wheel detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2016 (pp. 46-53).
Carreira J, Agrawal P, Fragkiadaki K, Malik J. Human pose estimation with iterative error feedback. In Proceedings of the IEEE conference on computer vision and pattern recognition 2016 (pp. 4733-4742).
Bulat A, Tzimiropoulos G. Human pose estimation via convolutional part heatmap regression. In European Conference on Computer Vision; Springer, Cham; 2016 Oct 8 (pp. 717-732).
Garcia-Hernando G, Yuan S, Baek S, Ki,m TK. First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018 (pp. 409-419).
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K. Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017 (pp.4903-4911).
Iqbal U, Molchanov P, Breuel Juergen Gall T, Kautz J. Hand pose estimation via latent 2.5 d heatmap regression. In Proceedings of the European Conference on Computer Vision (ECCV) 2018 (pp. 118-134).
Kong D, Chen Y, Ma H, Yan X, Xie X. Adaptive graphical model network for 2d hand pose estimation. arXiv preprint arXiv:1909.08205. 2019 Sep 18.
Li S, Chan AB. 3d human pose estimation from monocular images with a deep convolutional neural network. In Asian Conference on Computer Vision 2014 Nov 1 (pp. 332-347). Springer, Cham.
Duan L, Shen M, Cui S, Guo Z, Deussen O. Estimating 2d multi-hand poses from single depth images. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops 2018.
Wang Y, Peng C, Liu Y. Mask-pose cascaded CNN for 2d hand pose estimation from a single color image. IEEE Transactions on Circuits and Systems for Video Technology. 2018 Nov 9;29(11):3258-68.
Wang Y, Zhang B, Peng C. Srhandnet: Real-time 2d hand pose estimation with simultaneous region localization. IEEE transactions on image processing. 2019 Nov 28; 29:2977-86.
Kong D, Ma H, Xie X. Sia-GCN: A spatial information aware graph neural network with 2d convolutions for hand pose estimation. arXiv preprint arXiv:2009.12473. 2020 Sep 25.
Simon T, Joo H, Matthews I, Sheikh Y. Hand key-point detection in single images using Multiview bootstrapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017 (pp. 1145-1153).
Pisharady PK, Vadakkepat P, Poh LA. Hand Posture and Face Recognition Using Fuzzy Rough Approach. In Computational Intelligence in Multi-Feature Visual Pattern Recognition; Springer, Singapore; 2014 (pp. 63-80).
Potter LE, Araullo J, Carter L. The leap motion controller: a view on sign language. In Proceedings of the 25th Australian computer-human interaction conference: augmentation, application, innovation, collaboration 2013 Nov 25 (pp. 175-178).
Beardsley P, Murray D, Zisserman A. Camera calibration using multiple images. In European Conference on Computer Vision; Springer, Berlin, Heidelberg; 1992 May 19 (pp. 312-320).
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika