Risk-Aware Reinforcement Learning with Safety Constraints

Feng, Meng

dc.contributor.advisor	Williams, Brian C.
dc.contributor.advisor	Kaelbling, Leslie
dc.contributor.advisor	Leonard, John
dc.contributor.author	Feng, Meng
dc.date.accessioned	2025-10-21T13:16:26Z
dc.date.available	2025-10-21T13:16:26Z
dc.date.issued	2025-05
dc.date.submitted	2025-06-23T14:46:16.579Z
dc.identifier.uri	https://hdl.handle.net/1721.1/163262
dc.description.abstract	Safety is a critical concern in reinforcement learning (RL) and learning-based systems more broadly, as ensuring reliable and safe decision-making is essential for their deployment in real-world applications. Traditional approaches to address safety often rely on techniques such as reward shaping, carefully curated training data, or explicit handcrafted rules to avoid unsafe actions. More recent advancements have adopted the Constrained Markov Decision Process (CMDP) framework, which trains agents while explicitly enforcing constraints on auxiliary measures such as safety or risk. However, these methods often suffer from significant constraint violations. This thesis identifies the root cause of such violations as stemming from the pursuit of maximal task performance in each policy update. Given the inherent limitations of sample-based constraint assessments in RL, where data is limited and approximation errors are inevitable, these methods often fail near constraint boundaries, leading to excessive violations. To address this, we propose a novel constrained reinforcement learning algorithm that dynamically adjusts its conservativeness during policy updates. By incorporating the risk of constraint violation into the update process, our method can shift focus toward constraint satisfaction when violations are likely, while still striving to improve task performance whenever feasible. Our algorithm reduces constraint violations by up to 99% compared to state-of-the-art baselines while achieving comparable task performance. In the second part of this thesis, we extend CMDPs to address multi-goal, long-horizon problems. We augment the CMDP formulation to incorporate goals, enabling it to handle multiple goals while preserving goal-independent constraint specifications in CMDP. To tackle the complexity of long-horizon tasks with high-dimensional inputs (e.g., visual observations), we propose a method that integrates planning with safe reinforcement learning. By leveraging deep reinforcement learning, we acquire the essential components for planning, including a low-dimensional state-space representation and planning heuristics. The planning algorithm then decomposes long-horizon problems into a sequence of shorter, easier subgoal-reaching tasks. The learned agents safely navigate toward these subgoals step by step, ultimately reaching the final goal. We evaluate our method on both single-agent and multi-agent tasks. In 2D navigation, our approach demonstrated up to 74.2% risk reduction, while in visual navigation, it achieved up to 49.3% risk reduction, all while reaching comparable or better success rates.
dc.publisher	Massachusetts Institute of Technology
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Risk-Aware Reinforcement Learning with Safety Constraints
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: feng-fengm-phd-aeroastro-2025- ...
Size:: 41.96Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record