Learning the pattern-based CRF for prediction of a protein local structure
DOI:
https://doi.org/10.31449/inf.v46i6.3787Abstract
We describe a pattern-based conditional random field model for the prediction of dihedral angles of an all-alpha protein from its primary structure. Such conditional random fields appear naturally in sequence labeling problems of bioinformatics and can be considered relative to the Hidden Markov Models. The learning of parameters of the model is done by the structural SVM technique. The accuracy that we achieved in predicting dihedral angles, φ and ψ, equals 22.8 and 48.3 degrees, respectively. The MDA score, defined as the percentage of residues that are found in correctly predicted eight-residue segments, attained 56.5%.References
Yasemin Altun, Ioannis Tsochantaridis, and Thomas Hofmann. Hidden markov support vector machines. Proceedings, Twentieth International Conference on Machine Learning, 1, 07 2003.
C B Anfinsen. The formation and stabilization of protein structure. Biochemical Journal, 128(4):737–749, 07 1972.
Zhenisbek Assylbekov and Rustem Takhanov. Reusing weights in subword-aware neural language models. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1413–1423, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.
Christopher Bystroff, Vesteinn Thorsson, and David Baker. Hmmstr: a hidden markov model for local sequence-structure correlations in proteins. edited by j. thornton. Journal of Molecular Biology, 301(1):173 –190, 2000.
Peter Y. Chou and Gerald D. Fasman. Prediction of protein conformation. Biochemistry, 13(2):222–245, 1974. PMID: 4358940.
A.G. de Brevern, C. Etchebest, and S. Hazout. Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins: Structure, Function, and Bioinformatics, 41(3):271–287, 2000.
R. Fletcher. Newton-Like Methods, chapter 3, pages 44–79. John Wiley and Sons, Ltd, 2000.
J. Garnier, D.J. Osguthorpe, and B. Robson. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology, 120(1):97 – 120, 1978.
Blaise Gassend, Charles O’Donnell, William Thies, Andrew Lee, Marten van Dijk, and Srinivas Devadas. Learning biophysically-motivated parameters for alpha helix prediction. BMC bioinformatics, 8 Suppl 5:S3, 02 2007.
Misha Gromov. Crystals, proteins, stability and isoperimetry. Bulletin of the American Mathematical Society, 48(2):229–257, 2011. Copyright: Copyright 2011 Elsevier B.V., All rights reserved.
DT Jones. Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology, 292(2):195—202, September 1999.
Vladimir Kolmogorov, Michal Rolı́nek, and Rustem Takhanov. Effectiveness of structural restrictions for hybrid csps. In Khaled Elbassioni and Kazuhisa Makino, editors, Algorithms and Computation - 26th International Symposium, ISAAC 2015, Proceedings, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics), pages 566–577, Germany, January 2015. Springer Verlag. 26th International Symposium on Algorithms and Computation, ISAAC 2015 ; Conference date: 09-12-2015 Through 11-12-2015.
Petros Kountouris, Petros Kountouris, and Jonathan D. Hirst. Prediction of backbone dihedral angles and protein secondary structure using support vector machines. BMC Bioinformatics, 10(2):437, 2009.
Jooyoung Lee, Sitao Wu, and Yang Zhang. Ab Initio Protein Structure Prediction, pages 3–25. Springer Netherlands, Dordrecht, 2009.
Sebastian Nowozin and Christoph H. Lampert. Structured learning and prediction Trends in ® computer in Computer vision. Graphics Foundations and Vision, and 6(3–4):185–365, 2011.
Xian Qian, Xiaoqian Jiang, Qi Zhang, Xuanjing Huang, and Lide Wu. Sparse higher order conditional random fields for improved sequence labeling. In ICML, 2009.
Rustem Takhanov. Hybrid vcsps with crisp and valued conservative templates. In Takeshi Tokuyama and Yoshio Okamoto, editors, 28th International Symposium on Algorithms and Computation, ISAAC 2017, volume 92, Germany, December 2017. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 28th International Symposium on Algorithms and Computation, ISAAC 2017 ; Conference date: 09-12-2017 Through 22-12-2017.
Rustem Takhanov. Searching for an algebra on csp solutions, 2017.
Rustem Takhanov and Zhenisbek Assylbekov. Patterns versus characters in subword-aware neural language modeling. In Derong Liu, Shengli Xie, Yuanqing Li, Dongbin Zhao, and El-Sayed M. El-Alfy, editors, Neural Information Processing, pages 157–166, Cham, 2017. Springer International Publishing.
Rustem Takhanov and Vladimir Kolmogorov. Inference algorithms for pattern-based crfs on sequence data. pages 1182–1190, January 2013. 30th International Conference on Machine Learning, ICML 2013 ; Conference date: 16-06-2013 Through 21-06-2013.
Rustem Takhanov and Vladimir Kolmogorov. Combining pattern-based crfs and weighted context-free grammars, 2014.
Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, page 104, New York, NY, USA, 2004. Association for Computing Machinery.
Yuedong Yang, Jianzhao Gao, Jihua Wang, Rhys Heffernan, Jack Hanson, Kuldip Paliwal, and Yaoqi Zhou. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Briefings in
Bioinformatics, 19(3):482–494, 12 2016.
Nan Ye, Wee Sun Lee, Hai Leong Chieu, and Dan Wu. Conditional random fields with high-order features for sequence labeling. In
NIPS, 2009.
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika