Secure Reinforcement Learning for AWS IAM Adaptive Policy Optimization

Talib, Abdualrahman; Mahmood, Ghassan; Ibrahim, Hazim

doi:10.25673/123683

Proceedings of International Conference on Applied Innovation in IT · 2026/03/31 · Vol. 14 · Issue 1 · pp. 1101–1113

Secure Reinforcement Learning for AWS IAM Adaptive Policy Optimization

Abdualrahman Mohammed Talib, Ghassan Sabeeh Mahmood, Hazim Noman Abed and Muhammed Ibrahim

📄 Download PDF DOI: 10.25673/123683

Abstract

Public clouds such as AWS host mission-critical workloads, yet everyday Identity and Access Management (IAM) remains fragile. Small policy errors-wildcards, missing MFA, or overly permissive trust-can open attack paths as configurations evolve faster than audits. Continuous, scalable verification and adaptive policy optimization are therefore essential.This study presents a reinforcement learning framework based on Proximal Policy Optimization (PPO) for Adaptive Policy Optimization in AWS IAM. The system integrates log-driven telemetry (CloudTrail, GuardDuty, Config, and VPC Flow Logs) with multi-feature analysis to classify policies into four risk levels, generate compliant remediations, and enforce them safely through a Policy Management Module.Trained on 53,104 IAM policies, the framework achieved ≈97% accuracy, 98.5% precision, 98% recall, and AUC = 0.97, processing about 100 policies per second on CPU. Error rates were low (FP ≈ 1.3%, FN ≈ 1.7%), minimizing legitimate-access disruption. Case studies confirmed automatic hardening of over-permissive policies.These results demonstrate that reinforcement learning enables autonomous, adaptive policy optimization-strengthening consistency, scalability, and compliance while reducing manual effort in dynamic AWS environments.

Keywords

Cloud Security Reinforcement Learning Proximal Policy Optimization (PPO) Adaptive Policy Management.

References

N2WS, “49 cloud computing statistics you must know in 2025,” N2WS Blog, 2025, [Online]. Available: https://n2ws.com/blog/Cloud-computing-statistics, [Accessed: Oct. 2025].
Check Point and DuploCloud, “Cloud security report: misconfigurations and limited visibility plague enterprises,” DuploCloud Blog, 2024, [Online]. Available: https://duploCloud.com/blog/helpful-resources/2024-cloud-security-report-misconfigurations-limited-visibility-plague-enterprises/, [Accessed: Oct. 2025].
Unit42 (Palo Alto Networks), “IAM misconfigurations: more organizations fail to take preventive measures,” Unit42 Blog, 2024, [Online]. Available: https://unit42.paloaltonetworks.com/IAM-misconfigurations/, [Accessed: Oct. 2025].
L. D’Antoni, S. Ding, A. Goel, M. Ramesh, N. Rungta, and C. Sung, “Automatically reducing privilege for access control policies,” in Proc. ACM SPLASH/OOPSLA, 2024, [Online]. Available: https://doi.org/10.1145/3689738.
Horizon3.ai, “AWS misconfiguration leads to buckets of data,” Horizon3.ai Attack Research, 2023, [Online]. Available: https://horizon3.ai/attack-research/n0-attack-paths/aws-misconfiguration-leads-to-buckets-of-data/, [Accessed: Oct. 2025].
M. Kazdagli, M. Tiwari, and A. Kumar, “Using constraint programming and graph representation learning for generating interpretable cloud security policies,” arXiv preprint arXiv:2205.01240, 2022, [Online]. Available: https://arxiv.org/abs/2205.01240, [Accessed: Oct. 2025].
B. Hameed and G. S. Mahmood, “A systematic mapping review to remote data integrity verification systems for cloud computing,” Al-Iraqia Journal for Scientific Engineering Research, 2023.
G. S. Mahmood, N. Hasan, H. N. Abed, and B. A. Jalil, “An efficient and secure auditing system of cloud storage based on BLS signature,” International Journal of Computing and Digital Systems, vol. 12, no. 7, pp. 1491-1501, 2022.
Z. Aref, S. Wei, and N. B. Mandayam, “Human-AI collaboration in cloud security: cognitive hierarchy-driven deep reinforcement learning,” arXiv preprint arXiv:2502.16054, 2025, [Online]. Available: https://arxiv.org/abs/2502.16054, [Accessed: Oct. 2025].
N. Soveizi and D. Karastoyanova, “Reinforcement learning-driven adaptation chains: a robust framework for multi-cloud workflow security,” arXiv preprint arXiv:2501.06305, 2025, [Online]. Available: https://arxiv.org/abs/2501.06305, [Accessed: Oct. 2025].
M. R. Naeem, R. Amin, M. Farhan, F. S. Alsubaei, E. Alsolami, and M. D. Zakaria, “Cybersecurity enhancements with reinforcement learning: a zero-day vulnerability identification perspective,” PLOS ONE, 2025, [Online]. Available: https://doi.org/10.1371/journal.pone.0324595.
S. K. Vemula, N. Tran, and L. Zhou, “Multi-cloud security orchestration using deep reinforcement learning,” International Journal of Pure and Applied Science and Technology, 2023, [Online]. Available: https://ijps.in/admin1/upload/10%20Vamshidhar%20Reddy%20Vemula%2001261.pdf, [Accessed: Oct. 2025].
M. Saqib, F. Yashu, D. Mehta, and S. Malhotra, “Adaptive security policy management in cloud environments using reinforcement learning,” arXiv preprint arXiv:2505.08837, 2025, [Online]. Available: https://arxiv.org/abs/2505.08837, [Accessed: Oct. 2025].
R. P. Singh, A. Kuzminykh, and B. Ghita, “Industry perception of security challenges with identity access management solutions,” arXiv preprint arXiv:2408.10634, 2024, [Online]. Available: https://arxiv.org/abs/2408.10634, [Accessed: Oct. 2025].
K. Ariu, K. Kawano, and H. Kashima, “Policy testing in Markov decision processes,” arXiv preprint arXiv:2505.15342, 2025, [Online]. Available: https://arxiv.org/abs/2505.15342, [Accessed: Oct. 2025].
A. Cassel, A. Rosenberg, and O. Shamir, “Improved regret in linear Markov decision processes,” in Proc. 38th Conf. Neural Information Processing Systems (NeurIPS), 2024.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017, [Online]. Available: https://arxiv.org/abs/1707.06347, [Accessed: Oct. 2025].
C. Jin, Z. Li, Y. Pan, and T. Zhang, “On stationary point convergence of PPO-Clip,” in Proc. Int. Conf. Learning Representations (ICLR), 2024.
I. Shevrin and O. Margalit, “Detecting multi-step IAM attacks in AWS environments via model checking,” in Proc. USENIX Security Symposium, 2023, [Online]. Available: https://www.usenix.org/conference/usenixsecurity23/presentation/shevrin, [Accessed: Oct. 2025].
Y. Hu, W. Wang, and Z. Yang, “Greybox penetration testing on cloud access control with TAC,” arXiv preprint arXiv:2304.14540, 2023, [Online]. Available: https://arxiv.org/abs/2304.14540, [Accessed: Oct. 2025].
National Institute of Standards and Technology, Technical guide to information security testing and assessment, NIST SP 800-115, 2008, [Online]. Available: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-115.pdf, [Accessed: Oct. 2025].
SANS Institute, “Incident response capabilities survey 2022,” SANS White Paper, 2022, [Online]. Available: https://www.sans.org/white-papers/, [Accessed: Oct. 2025].