Abstract |
Fraud considered as the most common problem in insurance companies. Detecting frauds is a difficult problem for insurance companies. This study presents a statistical and data mining techniques. The statistical and data mining techniques helps in predicting fraud in this data. The data was cleaned and pre-processed by removing duplication, filling the missing data, managing the categorical data by label encoding and detecting the outliers. Then the data was split into train and test data. After that, using the standardization feature scaling for the data. Finally, the data was evaluated by some data mining models and the best two models are the Adaptive Boost and Gradient Boost. The Ada Boost model achieves the highest values of accuracy (95.556%), recall (92.308%), precision (87.805%), F1_score (90%) and MCC (Matthews Correlation Coefficient) (87.190%). the Gradient Boost model achieves the second highest values of accuracy (92.778%), recall (76.923%), precision (88.235%), F1_score (82.192%) and Matthews Correlation Coefficient MCC (77.976%). So, a new model was proposed in this research called GA which is a combination of Gradient Boost and Adaptive Boost by the hybrid classifier. |