6.1 Credit Feature Generation

Generate features related to credit scoring based on the anonymized data.

Principle:

Feature Selection: Select features related to credit scoring from the anonymized data. Choosing features strongly correlated with credit scoring can better reflect individuals’ credit status.

Feature Scaling: Standardize or normalize the selected features to ensure they have similar scales. This helps the model better understand and learn the relationships between features, improving the accuracy of credit scoring.

Dimensionality Reduction: PCA (Principal Component Analysis), converts high-dimensional features into lower-dimensional features. This helps to reduce the complexity and redundancy of the data, improving the generalization performance of the model.

Credit Features Simulating: Simulate or generate new features in some way that makes them relevant to credit scoring. This may include simulating repayment capacity, credit history stability, etc., to provide a more comprehensive description of an individual’s credit state.

Architecture:

Data Preprocessing: Perform preliminary processing on anonymized data, including missing values imputing, outlier processing, etc. Ensure the quality and availability of the data.

Feature Selection: Select features related to credit scoring from the preprocessed data. This can be done based on domain knowledge or by using feature selection algorithms for automatic selection.

Feature Scaling: Standardize or normalize the selected features to ensure that they have similar scales. Use Z-score for standardization and Min-Max for normalization.

Dimensionality Reduction: PCA (Principal Component Analysis), is used to reduce the dimensionality of data. This can help reduce the redundancy of features and improve the training efficiency of models.

Credit Features Generating: Simulate or generate new features based on the selected features to make them relevant to credit scoring. This can be done based on certain business rules or some probability distributions.

Output:

Output the generated credit features for use in the credit scoring model.

Last updated