Decision Reliability Evaluation of AI Expert Systems in High Impact Domains

Zubair Ahmad; Adam Faturahman; Meriyana Sunengsih; Noah Rangi

doi:10.33050/italic.v4i2.1074

Authors

Zubair Ahmad University of Sannio https://orcid.org/0000-0003-3754-0396
Adam Faturahman Alfabet Inkubator Indonesia https://orcid.org/0000-0001-9727-9092
Meriyana Sunengsih andawan Sejahtera Indonesia https://orcid.org/0009-0002-6480-1571
Noah Rangi Pandawan Incorporation https://orcid.org/0009-0004-6616-956X

DOI:

https://doi.org/10.33050/italic.v4i2.1074

Keywords:

Decision Reliability, AI Expert Systems, Interpretability Consistency, Repeated Execution, AI Governance

Abstract

AI expert decision support systems are increasingly used in public administration, healthcare, and financial risk management, yet conventional accuracy centered evaluations often fail to capture whether systems produce stable decisions across repeated executions. This study aims to develop a reliability oriented evaluation framework for assessing AI expert decision support systems beyond single-run predictive performance. The focus of the study is decision reliability in high-impact AI applications where inconsistent outputs may reduce accountability, weaken institutional trust, and create governance risks. A repeated experimental evaluation approach was applied using recent datasets from 2022 to 2024 representing heterogeneous and imbalanced decision conditions. The proposed framework integrates decision stability measurement, interpretability consistency assessment, confidence interval analysis, and statistical significance testing to examine system behavior under realistic operational scenarios. The results show that models with comparable predictive accuracy can demonstrate statistically significant differences in decision reliability. Confidence interval analysis indicates meaningful variability in output consistency, while interpretability evaluation reveals uneven explanatory stability across model executions. These findings confirm that reliability-oriented evaluation provides a more comprehensive and policy-relevant assessment of AI expert systems than accuracy-based evaluation alone. The study contributes to responsible AI deployment by offering an evaluation perspective that strengthens technical assessment, governance accountability, and trustworthiness in high-impact decision environments.

References

[1] S. Bayer, H. Gimpel, and M. Markgraf, “The role of domain expertise in trusting and following explainable ai decision support systems,” Journal of Decision Systems, vol. 32, no. 1, pp. 110–138, 2022.

[2] M. Ravi, A. Negi, N. S. Bommi, and N. Rouf, “Evolution of ai-driven decision making with decision support systems, expert systems, recommender systems, and xai,” IETE Technical Review, vol. 42, no. 4, pp. 428–465, 2025.

[3] S. T. H. Mortaji and M. E. Sadeghi, “Assessing the reliability of artificial intelligence systems: Challenges, metrics, and future directions,” International Journal of Innovation in Management, Economics and Social Sciences, vol. 4, no. 2, pp. 1–13, 2024.

[4] U. Rahardja, Q. Aini, A. S. Bist, S. Maulana, and S. Millah, “Examining the interplay of technology readiness and behavioural intentions in health detection safe entry station,” JDM (Jurnal Dinamika Manajemen), vol. 15, no. 1, pp. 125–143, 2024.

[5] D. Gaba, “Artificial intelligence and expert systems,” in Control and Automation in Anaesthesia. Springer, 2022, pp. 22–36.

[6] M. Ravi, A. Negi, and S. Chitnis, “A comparative review of expert systems, recommender systems, and explainable ai,” in 2022 IEEE 7th International conference for Convergence in Technology (I2CT). IEEE, 2022, pp. 1–8.

[7] A. Sabzaliyev, “Knowledge representation in expert systems: structure, classification, and applications,” Luminis Applied Science and Engineering, vol. 1, no. 2, pp. 1–15, 2024.

[8] M. A. Camilleri, “Artificial intelligence governance: Ethical considerations and implications for social responsibility,” Expert systems, vol. 41, no. 7, p. e13406, 2024.

[9] E. T. Rusmiati, L. Febrina, Y. Sari, and E. M. S. Sakti, “Adoption of ai driven ecological preaching systems using sem pls analysis,” Aptisi Transactions on Technopreneurship (ATT), vol. 8, no. 1, pp. 284–295, 2026.

[10] S. Etemadi and M. Khashei, “Accuracy versus reliability-based modelling approaches for medical decision making,” Computers in Biology and Medicine, vol. 141, p. 105138, 2022.

[11] X. Xiao, H. Zhu, J. Liang, J. Tong, and H. Wang, “A comprehensive review of human error in risk informed decision making: integrating human reliability assessment, artificial intelligence, and human performance models,” arXiv preprint arXiv:2507.01017, 2025.

[12] E. S¸ AHiN, N. N. Arslan, and D. ¨Ozdemir, “Unlocking the black box: an in-depth review on interpretability, explainability, and reliability in deep learning,” Neural computing and applications, vol. 37, no. 2, pp. 859–965, 2025.

[13] A. Hermawan, W. Sunaryo, and S. Hardhienata, “Optimal solution for ocb improvement through strengthening of servant leadership, creativity, and empowerment,” Aptisi Transactions on Technopreneurship (ATT), vol. 5, no. 1Sp, pp. 11–21, 2023.

[14] C. Pan, C. Shao, B. Hu, K. Xie, C. Li, and J. Ding, “Modeling the reserve capacity of wind power and the inherent decision-dependent uncertainty in the power system economic dispatch,” IEEE Transactions on Power Systems, vol. 38, no. 5, pp. 4404–4417, 2022.

[15] S. Mertens, M. Herberz, U. J. Hahnel, and T. Brosch, “The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains,” Proceedings of the National Academy of Sciences, vol. 119, no. 1, p. e2107346118, 2022.

[16] T. K. Andiani and O. Jayanagara, “Effect of workload, work stress, technical skills, self-efficacy, and social competence on medical personnel performance,” Aptisi Transactions on Technopreneurship (ATT), vol. 5, no. 2, pp. 118–127, 2023.

[17] X. Ma, Q. Liu, D. Jiang, G. Zhang, Z. Ma, and W. Chen, “General-reasoner: Advancing llm reasoning across all domains,” Advances in Neural Information Processing Systems, vol. 38, pp. 56 596–56 618, 2026.

[18] Y. Wang, W. Song, W. Tao, A. Liotta, D. Yang, X. Li, S. Gao, Y. Sun, W. Ge, W. Zhang et al., “A systematic review on affective computing: Emotion models, databases, and recent advances,” Information Fusion, vol. 83, pp. 19–52, 2022.

[19] S. Watini, N. Ramadhona et al., “Predicting patient satisfaction levels using artificial intelligence technology for food service at eri soedewo rspad gatot soebroto,” Aptisi Transactions on Technopreneurship (ATT), vol. 5, no. 2sp, pp. 124–134, 2023.

[20] M. Mierzwiak and K. Kroszczy´nski, “Impact of domain nesting on high-resolution forecasts of solarconditions in central and eastern europe,” Energies, vol. 16, no. 13, p. 4969, 2023.

[21] R. W. Kim, K. Barta, W. S. Begolka, K. Capozza, S. Eftekhari, K. Tullos, N. Tomaszewski, C. Snell Rood, and K. Abuabara, “The quantitative impact of atopic dermatitis on caregivers across multiple life domains,” British Journal of Dermatology, vol. 187, no. 6, pp. 1041–1043, 2022.

[22] M. C. Baaken, “Sustainability of agricultural practices in germany: a literature review along multiple environmental domains,” Regional Environmental Change, vol. 22, no. 2, p. 39, 2022.

[23] N. P. L. Santoso, B. Rawat, S. R. Ratri, D. Danang, D. F. C. Kumoro, R. Supriati, and E. A. Natalia, “Transformation of indonesian language in social media using ai expert systems and machine learning,” International Transactions on Artificial Intelligence, vol. 3, no. 2, pp. 130–139, 2025.

[24] X. Li, J. Liu, F. Xu, S. Ali, H. Wu, B. Huang, H. Deng, Y. Li, Y. Jiang, Z. Fan et al., “Interface element accumulation-induced single ferroelectric domain for high-performance neuromorphic synapse,” Advanced Functional Materials, vol. 35, no. 28, p. 2423225, 2025.

[25] R. Reddy, C. Naidoo, and N. S. Ross, “Students’ transition into higher education: incorporating high-impact practices to foster smooth transition and academic success,” African Journal of Inter/Multidisciplinary Studies, vol. 7, no. 1, pp. 1–15, 2025.

[26] National Institutes of Health, “Nih findings shed light on risks and ben- efits of integrating ai into medical decision-making,” Jul. 2024, accessed: 2026-06-02. [Online]. Available: https://www.nih.gov/news-events/news-releases/nih-findings-shed-light-risks-benefits-integrating-ai-into-medical-decision-making

[27] A. Jaya, H. Zainarthur, A. Sijabat, A. R. Dina, and A. Faturahman, “Assessing user satisfaction in hadirku through an extended tam framework,” International Transactions on Artificial Intelligence, vol. 4, no. 1, pp. 73–84, 2025.

[28] W. Zheng, J. Cheng, X. Wu, R. Sun, X. Wang, and X. Sun, “Domain knowledge-based security bug reports prediction,” Knowledge-Based Systems, vol. 241, p. 108293, 2022.

[29] H. Siuzdak, “Vocos: Closing the gap between time-domain and fourier-based neural vocoders for high quality audio synthesis,” arXiv preprint arXiv:2306.00814, 2023.

[30] National Telecommunications and Information Administration, “Ntia calls for audits and investments in trustworthy ai systems,” Mar. 2024, accessed: 2026-06-02. [Online]. Available: https://www.ntia.gov/press-release/2024/ntia-calls-audits-and-investments-trustworthy-ai-systems

[31] T. A. Prasetyo, A. Antonius, and S. Sumirin, “Experimental evaluation of modified t-stub connections for seismic applications,” Aptisi Transactions on Technopreneurship (ATT), vol. 8, no. 1, pp. 125–137, 2026.

[32] T. Hongsuchon, U. Rahardja, A. Khan, T.-H. Wu, C.-W. Hung, R.-H. Chang, C.-H. Hsu, and S.-C. Chen, “Brand experience on brand attachment: The role of interpersonal interaction, feedback, and advocacy,” Emerging Science Journal, vol. 7, no. 4, pp. 1232–1246, 2023.

[33] C. S. Bangun, S. Purnama, and A. S. Panjaitan, “Analysis of new business opportunities from online informal education mediamorphosis through digital platforms,” International Transactions on Education Technology, vol. 1, no. 1, pp. 42–52, 2022.

[34] R. J. Kiran, J. Sanil, and S. Asharaf, “A novel approach for model interpretability and domain aware fine-tuning in adaboost,” Human-Centric Intelligent Systems, vol. 4, no. 4, pp. 610–632, 2024.

[35] T. Shimoda, K. Tomida, C. Nakajima, A. Kawakami, K. Tsutsumimoto, and H. Shimada, “Prevalence and prognostic impact of multiple frailty domain in japanese older adults,” Journal of the American Medical Directors Association, vol. 25, no. 11, p. 105238, 2024.

[36] P. A. Sunarya, “The impact of gamification on idu (ilearning instruction) in expanding understudy learning inspiration,” International Transactions on Education Technology, vol. 1, no. 1, pp. 59–67, 2022.

[37] Y. Matsuzaka and R. Yashiro, “Ai-based computer vision techniques and expert systems,” Ai, vol. 4, no. 1, pp. 289–302, 2023.

[38] P. Guleria and M. Sood, “Explainable ai and machine learning: performance evaluation and explainability of classifiers on educational data mining inspired career counseling,” Education and Information Technologies, vol. 28, no. 1, pp. 1081–1116, 2023.

[39] L. M. Putri Mulyaningsih, “The impact of product quality and brand image on repurchase intention through customer satisfaction,” APTISI Transactions on Management, vol. 8, no. 1, pp. 1–13, 2024.

[40] N. Lutfiani, Q. Aini, U. Rahardja, N. Septiani, and I. K. Gunawan, “Desain aplikasi software as a service sebagai layanan perbelanjaan online,” ANDHARUPA: Jurnal Desain Komunikasi Visual & Multimedia, vol. 9, no. 02, pp. 181–194, 2023.

Decision Reliability Evaluation of AI Expert Systems in High Impact Domains

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Block Name