Application of LASSO Algorithm and GBDT Algorithm in Predicting Financial Distress of Companies
DOI:
https://doi.org/10.31449/inf.v48i17.6493Abstract
With the global economy in a downward cycle under the influence of the epidemic, companies are facing a crisis in their business and financial conditions, and most companies are more likely to be in financial distress in a poor economic environment. The existence of concept drift problem makes the actual prediction of financial distress prediction poor or can only solve limited types of concept drift. Most existing research on financial distress prediction methods use machine learning methods, such as random forests, but there are limitations in dealing with concept drift problems, such as difficulty in model updating and data imbalance. Therefore, a study proposes a model that combines the minimum absolute shrinkage and selection operator with gradient boosting tree algorithm to solve the problem of dynamic concept drift and accurately predict the financial difficulties of enterprises. The study selected financial datasets from Chinese A-share listed companies from 2019 to 2022, with selection criteria including but not limited to the company's market value, industry representativeness, and financial information. In order to reduce potential sample bias caused by market structure changes, policy adjustments, and other factors, the study adopts time series and industry stratified sampling methods to ensure the representativeness of the samples. Firstly, conduct a thorough analysis of the two algorithms and apply them to dynamic financial indicator selection in financial samples. Secondly, a comprehensive prediction model is established using the sample similarity index. The experimental results show that the model has high accuracy rates of 92.47% and 92.31% in dynamic environments, with high F values of 85.33% and 85.12%, and G values of 91.78%, 91.65%, and 91.92%, respectively. This prediction model has high accuracy and dynamic stability in solving the concept drift problem in financial distress. The study achieved effective processing of dynamic concept drift for the first time by combining two algorithms and using sample similarity index.References
Kuerten B G, Samuel B, Bonner M J, Ayuku D O, Njuguna F, Taylor S M, Puffer E S. Psychosocial burden of childhood sickle cell disease on caregivers in Kenya. Journal of Pediatric Psychology, 2020, 45(5):561-572.
Cuesta-González M, Paredes-Gazquez J, Ruza C, Fernandez-Olit B. The relationship between vulnerable financial consumers and banking institutions. A qualitative study in Spain. Geoforum, 2021, 119(3):163-176.
Lavikainen P, Aarnio E, Niskanen L, Mantyselka P, Martikainen J. Short-term impact of co-payment level increase on the use of medication and patient- reported outcomes in Finnish patients with type 2 diabetes. Health Policy, 2020, 124(12):1310-1316.
Ohishi M, Fukui K, Okamura K, Itoh Y, Yanagiharaa H. Coordinate optimization for generalized fused Lasso. Communications in Statistics-Theory and Methods, 2021, 50(24):5955-5973.
Luo S, Zhao W, Pan L. Online GBDT with chunk dynamic weighted majority learners for noisy and drifting data streams. Neural Processing Letters, 2021, 53(5):3783-3799.
Kang J, Choi Y J, Kim I, Lee H, Kim H S, Baik S H, Kim N K, Lee K Y. LASSO-based machine learning algorithm for prediction of lymph node metastasis in T1 colorectal cancer. Cancer Research and Treatment: Official Journal of Korean Cancer Association, 2021, 53(3):773-783.
Motamedi F, Pérez-Sánchez H, Mehridehnavi A, Fassihi A, Ghasemi F. Accelerating big data analysis through LASSO-random forest algorithm in QSAR studies. Bioinformatics, 2022, 38(2):469-475.
Jiang C, Jiang W. Lasso algorithm and support vector machine strategy to screen pulmonary arterial hypertension gene diagnostic markers. Scottish Medical Journal, 2023, 68(1):21-31.
Miswan N H, Chan C S, Ng C G. Hospital readmission prediction based on improved feature selection using grey relational analysis and LASSO. Grey Systems: Theory and Application, 2021, 11(4):796-812.
Arumugam P, Kuppan V. A GBDT-SOA approach for the system modelling of optimal energy management in grid-connected micro -grid system. International Journal of Energy Research, 2021, 45(5):6765-6783.
Jing Y, Guo S, Chen F, Wang X, Li K. Dynamic differential pricing of high-speed railway based on improved GBDT train classification and bootstrap time node determination. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(9):16854-16866.
Huang P, Wang L, Hou D, Lin W, Yu J, Zhang G, Zhang H. A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification. Journal of Hydroinformatics, 2021, 23(5):1050-1065.
Ma L, Xiao H, Tao J, Su Z. Intelligent lithology classification method based on GBDT algorithm. Editorial Department of Petroleum Geology and Recovery Efficiency, 2022, 29(1): 21-29.
Li R, Chang C, Justesen J M, Tanigawa Y, Tibshirani R J. Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank. Biostatistics, 2022, 23(2):522-540.
Zhang N, Zhang Y, Sun D, Kim-Chuan T. An efficient linearly convergent regularized proximal point algorithm for fused multiple graphical lasso problems. SIAM Journal on Mathematics of Data Science, 2021, 3(2):524-543.
Luo S, Zhao W, Pan L. Online GBDT with chunk dynamic weighted majority learners for noisy and drifting data streams. Neural Processing Letters, 2021, 53(5):3783-3799.
Zhu H, Li H. Predict prices of second-hand house using gbdt algorithm and PSO algorithm. Frontiers in Economics and Management, 2021, 2(11):513-524.
Slavova-Azmanova N S, Newton J C, Saunders C, Johnson C E. 'Biggest factors in having cancer were costs and no entitlement to compensation'-The determinants of out-of-pocket costs for cancer care through the lenses of rural and outer metropolitan Western Australians. Australian Journal of Rural Health, 2020, 28(6):588-602.
Guo Y, Mustafaoglu Z, Koundal D. Spam detection using bidirectional transformers and machine learning classifier algorithms. Journal of Computational and Cognitive Engineering, 2023, 2(1):5-9.
Afrin S, Shamrat F M J M, Nibir T I, Muntasim M F, Moharram M S, Imran M M, Applicable A N. Supervised machine learning based liver disease prediction approach with LASSO feature selection. Bulletin of Electrical Engineering and Informatics, 2021, 10(6):3369-3376.
Lei H. Financial Index Data Prediction Based on Improved GBDT Model, IEEE international conference on artificial intelligence and computer applications. IEEE, 2021, 13(2): 697-702.
Nykamp K, Anderson M, Powers M, Garcia J, Herrera B, Ho Y Y, et al. Sherloc: a comprehensive refinement of the ACMG-AMP variant classification criteria. Genetics in medicine, 2020, 22(1): 240-241.
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika