A MACHINE LEARNING-BASED FRAMEWORK FOR DETECTING FINANCIAL FRAUDULENT TRANSACTIONS
Abstract
The objective of this study was to develop and evaluate machine learning models for the detection of financial fraud in transactions, aiming to mitigate economic losses and support decision-making in banking institutions. The methodology involved the use of the public Credit Card Fraud Detection dataset, consisting of 284,807 transactions, of which only 492 were fraudulent (≈ 0.172%). The algorithms Random Forest and XGBoost were tested, with and without the application of the SMOTE balancing technique. The evaluation was conducted using metrics such as precision, recall, F1-score, MCC, in addition to ROC and Precision–Recall curves. Complementarily, a qualitative validation was carried out through interviews with four financial sector specialists, in order to analyze the practical applicability of the models. The results showed that all models presented high overall performance, with areas under the ROC curve above 0.96. XGBoost with SMOTE achieved greater sensitivity, with a recall of 85% and 15 false negatives, but with an increase in false positives (22). On the other hand, Random Forest without SMOTE obtained better precision (0.94) and the highest F1-score (0.87), but failed to detect 18 fraud cases. Random Forest with SMOTE showed intermediate performance. The qualitative validation confirmed the relevance of the models, with 75% of the specialists prioritizing maximum fraud detection, even with more false alarms, and 25% valuing the reduction of false alarms, even at the cost of lower sensitivity. It is concluded that the choice of the model should consider the balance between recall and precision, aligned with institutional priorities between reducing financial losses and minimizing operational overload. The study also highlights limitations, such as the use of a specific temporal dataset and the absence of advanced hyperparameter optimization. For future work, it is suggested to explore parameter tuning, incremental learning, and validation on contemporary datasets, aiming for greater robustness and practical applicability of the models.
