Trustworthy Machine Learning Evaluation Framework for Robust and Interpretable Intelligent Systems

Ninda Lutfiani; Sutarto Wijono; Rifqa Nabila Muti; Yasir Mustafa Kareem

doi:10.33050/italic.v4i2.1067

Authors

Ninda Lutfiani Satya Wacana Christian University https://orcid.org/0000-0001-7019-0020
Sutarto Wijono Satya Wacana Christian University https://orcid.org/0000-0003-2154-6056
Rifqa Nabila Muti CAI Sejahtera Indonesia, Indonesia https://orcid.org/0009-0008-2980-3823
Yasir Mustafa Kareem EESP Group https://orcid.org/0009-0008-5096-2300

DOI:

https://doi.org/10.33050/italic.v4i2.1067

Keywords:

Machine Learning, Intelligent Algorithms, Interpretability, Robustness, Sustainable Development

Abstract

Artificial intelligence (AI) deployment in critical domains requires machine learning systems that are not only accurate but also robust, interpretable, fair, and aligned with responsible governance principles. However, conventional machine learning evaluation approaches often prioritize predictive performance and computational efficiency while giving limited attention to ethical accountability, transparency, regulatory compliance, and sustainability. This study aims to develop a trustworthy machine learning evaluation framework for robust and interpretable AI systems. The focus of the study is the evaluation of intelligent systems across healthcare, finance, and transportation, where reliability and accountability are essential for real-world deployment. A qualitative case study approach was employed through expert interviews, literature analysis, document review, and cross-domain case comparisons to identify key evaluation dimensions. The findings show that trustworthy evaluation should integrate technical indicators, including accuracy, robustness, and interpretability, with broader dimensions such as fairness, accountability, governance compliance, and social responsibility. The proposed framework provides a structured model for assessing intelligent systems beyond conventional performance metrics. It also supports better consistency in interpretability assessment, stronger fairness evaluation, and improved alignment with international AI governance expectations. This study contributes to the development of responsible AI by offering a practi- cal evaluation framework that can guide researchers, developers, and institutions in designing machine learning systems that are reliable, transparent, and socially accountable. The framework has implications for sustainable and compliant AI implementation in high-impact sectors.

References

[1] M. Wahyudi, W. Bismi, M. Raharjo, U. Rahardja, L. Pujiastuti et al., “Gender recognition based on face image using deep learning method,” in 2023 11th International Conference on Cyber and IT Service Management (CITSM). IEEE, 2023, pp. 1–6.

[2] B. Li, P. Qi, B. Liu, S. Di, J. Liu, J. Pei, J. Yi, and B. Zhou, “Trustworthy ai: From principles to practices,” ACM Computing Surveys, vol. 55, no. 9, pp. 1–46, 2023.

[3] M. M. Ferdaus, M. Abdelguerfi, E. Loup, K. N. Niles, K. Pathak, and S. Sloan, “Towards trustworthy ai: a review of ethical and robust large language models,” ACM Computing Surveys, vol. 58, no. 7, pp. 1–43, 2026.

[4] A. Nastoska, B. Jancheska, M. Rizinski, and D. Trajanov, “Evaluating trustworthiness in ai: Risks, metrics, and applications across industries,” Electronics, vol. 14, no. 13, p. 2717, 2025.

[5] N. A. Abu, Z. Kedah, U. Rahardja, B. E. Sibarani, S. Kosasi, S. Dewi, and I. S. Fadli, “Digital ringgit: A new digital currency with traditional attributes,” in 2023 11th International Conference on Cyber and IT Service Management (CITSM). IEEE, 2023, pp. 1–6.

[6] P. Goktas and A. Grzybowski, “Shaping the future of healthcare: ethical clinical challenges and pathways to trustworthy ai,” Journal of Clinical Medicine, vol. 14, no. 5, p. 1605, 2025.

[7] G. Manias, D. Apostolopoulos, S. Athanassopoulos, S. Borotis, C. Chatzimallis, T. Chatzipantelis, M. C. Compagnucci, T. Z. Draksler, F. Fournier, M. Goralczyk et al., “Ai4gov: Trusted ai for transparent public governance fostering democratic values,” in 2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT). IEEE, 2023, pp. 548–555.

[8] J. B. Hendrawidjaja, B. W. Soetjipto, R. D. Kusumastuti, and O. Jayanagara, “Ecosystem exchange, strategic capabilities, and firm performance with agility and innovation mediators,” Aptisi Transactions on Technopreneurship (ATT), vol. 8, no. 1, pp. 226–238, 2026.

[9] S. T. Boppiniti, “Data ethics in ai: Addressing challenges in machine learning and data governance for responsible data science,” International Scientific Journal for Research, vol. 5, no. 5, pp. 1–29, 2023.

[10] Q. Aini, H. D. Purnomo, I. Setyawan, D. Manongga, U. Rahardja, I. Sembiring, S. Maulana et al., “The effect of perceived costs on blockchain adoption intention: an empirical study,” in 2023 11th International Conference on Cyber and IT Service Management (CITSM). IEEE, 2023, pp. 1–6.

[11] S. Kotyan, “A reading survey on adversarial machine learning: Adversarial attacks and their understanding,” arXiv preprint arXiv:2308.03363, 2023.

[12] N. Boucher, I. Shumailov, R. Anderson, and N. Papernot, “Bad characters: Imperceptible nlp attacks,” in 2022 IEEE symposium on security and privacy (SP). IEEE, 2022, pp. 1987–2004.

[13] M. Paolanti, S. Tiribelli, B. Giovanola, A. Mancini, E. Frontoni, and R. Pierdicca, “Ethical framework to assess and quantify the trustworthiness of artificial intelligence techniques: Application case in remote sensing,” Remote Sensing, vol. 16, no. 23, p. 4529, 2024.

[14] Q. Aini, E. Sediyono, K. D. Hartomo, D. Manongga, U. Rahardja, I. Sembiring, and N. A. Santoso, “Relationship quality analysis using technology in the business sector,” in 2023 11th International Conference on Cyber and IT Service Management (CITSM). IEEE, 2023, pp. 1–6.

[15] C. Lahusen, M. Maggetti, and M. Slavkovik, “Trust, trustworthiness and ai governance,” Scientific Reports, vol. 14, no. 1, p. 20752, 2024.

[16] J. Siswanto, U. Rahardja, I. Sembiring, K. D. Hartomo, H. D. Purnomo, A. Iriani et al., “Number of road accidents predicting using deep learning-based lstm development models,” in 2023 11th International Conference on Cyber and IT Service Management (CITSM). IEEE, 2023, pp. 1–6.

[17] L. McCormack and M. Bendechache, “The trustworthy ai maturity model (taimm): Integrating ethics and regulation across the ai lifecycle,” Journal of Responsible Technology, p. 100156, 2026.

[18] M. Leon, “Investing in ai interpretability, control, and robustness,” Algorithms, vol. 19, no. 2, p. 136, 2026.

[19] M. Hort, Z. Chen, J. M. Zhang, M. Harman, and F. Sarro, “Bias mitigation for machine learning classifiers: A comprehensive survey,” ACM Journal on Responsible Computing, vol. 1, no. 2, pp. 1–52, 2024.

[20] Y. D. Anna, H. Djajadikerta, and A. Setiawan, “Strengthening the foundations of socialpreneurship through integrated reporting a systematic bibliometric perspective,” Aptisi Transactions on Technopreneurship (ATT), vol. 8, no. 1, pp. 296–309, 2026.

[21] Y. Mei, Q. Chen, A. Lensen, B. Xue, and M. Zhang, “Explainable artificial intelligence by genetic programming: A survey,” IEEE Transactions on Evolutionary Computation, vol. 27, no. 3, pp. 621–641, 2022.

[22] A. R. Javed, W. Ahmed, S. Pandya, P. K. R. Maddikunta, M. Alazab, and T. R. Gadekallu, “A survey of explainable artificial intelligence for smart cities,” Electronics, vol. 12, no. 4, p. 1020, 2023.

[23] A. A. Setyawan, E. Setyawati, and J. S. P. Tyoso, “Digital resilience framework for msme development in facing global market volatility,” Aptisi Transactions on Technopreneurship (ATT), vol. 8, no. 1, pp. 239–252, 2026.

[24] M. R. Islam, M. U. Ahmed, S. Barua, and S. Begum, “A systematic review of explainable artificial intelligence in terms of different application domains and tasks,” Applied Sciences, vol. 12, no. 3, p. 1353, 2022.

[25] A. Bennetot, I. Donadello, A. El Qadi El Haouari, M. Dragoni, T. Frossard, B. Wagner, A. Sarranti, S. Tulli, M. Trocan, R. Chatila et al., “A practical tutorial on explainable ai techniques,” ACM Computing Surveys, vol. 57, no. 2, pp. 1–44, 2024.

[26] M. D. T. P. Nasution, Y. Rossanty, R. Harahap, A. R. Tanjung, and T. A. M. Nasution, “Technology-driven resource utilization and integration to enhance firm performance,” Aptisi Transactions on Technopreneurship (ATT), vol. 8, no. 1, pp. 268–283, 2026.

[27] Z. Abou El Houda, B. Brik, and L. Khoukhi, ““why should i trust your ids?”: An explainable deep learning framework for intrusion detection systems in internet of things networks,” IEEE Open Journal of the Communications Society, vol. 3, pp. 1164–1176, 2022.

[28] S. Hariharan, R. Rejimol Robinson, R. R. Prasad, C. Thomas, and N. Balakrishnan, “Xai for intrusion detection system: comparing explanations based on global and local scope,” Journal of Computer Virology and Hacking Techniques, vol. 19, no. 2, pp. 217–239, 2023.

[29] U. Ahmed, Z. Jiangbin, S. Khan, and M. T. Sadiq, “Hcivad: explainable hybrid voting classifier for network intrusion detection systems,” Cluster Computing, vol. 28, no. 5, p. 343, 2025.

[30] Republic of Indonesia, “Law Number 27 of 2022 on Personal Data Protection,” Jakarta, Indonesia, 2022, national regulation on personal data protection relevant to ethical and accountable AI governance.

[31] E. Arif, S. Suherman, and A. P. Widodo, “Analyzing public sentiment on digital banks in indonesia via social media x,” Aptisi Transactions on Technopreneurship (ATT), vol. 8, no. 1, pp. 253–267, 2026.

[32] H. Chen, S. M. Lundberg, and S.-I. Lee, “Explaining a series of models by propagating shapley values,” Nature communications, vol. 13, no. 1, p. 4512, 2022.

[33] L. Schulte, B. Ledel, and S. Herbold, “Studying the explanations for the automated prediction of bug and non-bug issues using lime and shap,” Empirical Software Engineering, vol. 29, no. 4, p. 93, 2024.

[34] N. Beebe-Wang, W. Qiu, and S.-I. Lee, “Explanation-guided dynamic feature selection for medical risk prediction,” in ICML 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH), 2023.

[35] M. Muschalik, F. Fumagalli, B. Hammer, and E. H¨ullermeier, “Beyond treeshap: Efficient computation of any-order shapley interactions for tree ensembles,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 13, 2024, pp. 14 388–14 396.

[36] Z. Tan, T. Chen, Z. Zhang, and H. Liu, “Sparsity-guided holistic explanation for llms with interpretable inference-time intervention,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 19, 2024, pp. 21 619–21 627.

[37] H. Zalukhu, K. W. D. Prastiyanto, I. Ramadhan, N. R. Ramadhan et al., “Penggunaan machine learning dalam startup dengan pemanfaatan smart pls,” Jurnal MENTARI: Manajemen, Pendidikan Dan Teknologi Informasi, vol. 2, no. 2, pp. 111–122, 2024.

[38] Republic of Indonesia, “Law Number 59 of 2024 on the National Long-Term Development Plan 2025–2045,” Jakarta, Indonesia, 2024, national long-term development policy emphasizing digital transforma- tion, sustainable development, and institutional governance.

[39] D. Bennet, S. A. Anjani, O. P. Daeli, D. Martono, and C. S. Bangun, “Predictive analysis of startup ecosystems: Integration of technology acceptance models with random forest techniques,” CORISINTA, vol. 1, no. 1, pp. 70–79, 2024.

[40] T. Rinta-Kahila, I. Someh, A. Darvishi, R. Bidar, M. Indulska et al., “Closing the gaps on inscrutability: Tackling challenges with knowledge integration during ai development,” Australasian Journal of Information Systems, vol. 29, 2025.

[41] M. Hatta, W. N. Wahid, F. Yusuf, F. Hidayat, N. A. Santoso, and Q. Aini, “Enhancing predictive models in system development using machine learning algorithms,” International Journal of Cyber and IT Service Management, vol. 4, no. 2, pp. 80–87, 2024.

[42] A. Shahin Shamsabadi, M. Yaghini, N. Dullerud, S. Wyllie, U. A¨ıvodji, A. Alaagib, S. Gambs, and N. Papernot, “Washing the unwashable: On the (im) possibility of fairwashing detection,” Advances in Neural Information Processing Systems, vol. 35, pp. 14 170–14 182, 2022.

[43] M. Fernandez, A. Faturahman, and N. A. Santoso, “Harnessing machine learning to optimize renewable energy utilization in waste recycling,” International Transactions on Education Technology (ITEE), vol. 2, no. 2, pp. 173–182, 2024.

Trustworthy Machine Learning Evaluation Framework for Robust and Interpretable Intelligent Systems

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Block Name