AI-DRIVEN CREDIT SCORING IN INDIAN FINANCIAL INSTITUTIONS: OPPORTUNITIES, RISKS, AND REGULATORY IMPLICATIONS
Abstract
This paper examines whether machine learning models can meaningfully improve upon logistic regression for credit default prediction in the Indian banking context, and whether such improvements come at the cost of interpretability required by regulators. The study uses a panel of approximately 1.24 million loan accounts from three scheduled commercial banks, covering Q1 2015 to Q4 2023, including the COVID-19 disruption period to test model robustness under distribution shift.
The findings show that Gradient Boosting Machines, specifically Boost, outperform logistic regression on discrimination and calibration, with AUROC improving from 0.739 to 0.926 and Brier Score improving from 0.081 to 0.053. When alternative digital data such as GST filing regularity, UPI transaction frequency, and utility payment histories sourced through the Account Aggregator framework are included, AUROC further rises to 0.934. For borrowers with thin credit files, alternative data variables account for over 40 percent of the model’s explanatory weight, highlighting important implications for financial inclusion.
The paper also addresses the black-box concern in machine learning-based credit decisioning through SHAP explanations at both global and instance levels. The SHAP-LIME concordance across a validation subsample shows strong method robustness. Fairness analysis finds no statistically detectable disparate impact along gender or rural-urban lines after conditioning on financial fundamentals, making the study relevant to the RBI’s emerging AI governance framework.
Authors
Praveen Soneja, Abhijeet Raj, Abhilasha Kumari, Kusum Lata, Mayank Raj
Institution
Noida Institute of Engineering & Technology (MCA Institute), Greater Noida

Leave A Comment