Divided into two groups:
Sometimes, practitioners compare the p-value of a test at the rating scale level with the p-value of a test at the portfolio level. However, are they comparable?
What is the testing hypothesis for each group and test?
Test statistic:
\[HL = \sum_{g = 1}^{G}\frac{(N_gPD_g - d_g)^2}{N_gPD_g(1 - PD_g)}\]
where:
Under the assumption that the \(HL\) test statistic follows the chi-square distribution with \(G\) degrees of freedom, a \(p-value\) is calculated accordingly.
Testing hypothesis - the calibrated PD is true.
Test statistic:
\[Z_{score} = \frac{ODR - PD}{\sqrt{\frac{PD(1-PD)}{n}}}\]
where:
Under the assumption that the \(Z_{score}\) test statistic follows the standard normal distribution, a \(p-value\) is calculated accordingly.
The most commonly used testing hypothesis - the calibrated PD is not underestimated.
Are they comparable?
## Rating # obs. # defaults ODR PD
## 1 RG1 47 3 0.0638 0.0307
## 2 RG2 95 20 0.2105 0.1161
## 3 RG3 68 17 0.2500 0.2907
## 4 RG4 53 24 0.4528 0.5514
## 5 RG5 37 28 0.7568 0.7648
p-values:
## Hosmer-Lemeshow Z-score on portfolio level
## 1 2.70% 38.89%
Can we adjust inputs to account only for the underestimation?
## Rating # obs. # defaults ODR PD PD adjusted
## 1 RG1 47 3 0.0638 0.0307 0.0307
## 2 RG2 95 20 0.2105 0.1161 0.1161
## 3 RG3 68 17 0.2500 0.2907 0.2500
## 4 RG4 53 24 0.4528 0.5514 0.4528
## 5 RG5 37 28 0.7568 0.7648 0.7568
p-values:
## Hosmer-Lemeshow Z-score portfolio level
## 1 7.53% 8.58%