Options
Event-Triggered and Time-Triggered Duration Calculus for Model-Free Reinforcement Learning
Journal
Proceedings - Real-Time Systems Symposium
ISSN
10528725
Date Issued
2021-01-01
Author(s)
Dole, Kalyani
Gupta, Ashutosh
Komp, John
Krishna, Shankaranarayanan
Trivedi, Ashutosh
Abstract
Reinforcement Learning (RL) is a sampling based approach to optimization, where learning agents rely on scalar reward signals to discover optimal solutions. The specification of learning objectives as scalar rewards is tedious and error prone, and more so for real-time systems with complex time-critical requirements. This paper advocates the use of Duration Calculus (DC)—a highly expressive real-time logic with duration and length modalities—in expressing the learning objectives in model-free RL for stochastic real-time systems. On the other hand, to model stochastic real-time environments, we consider probabilistic timed automata (PTA)—Markov decision processes extended with clock variables—that provide an expressive yet computationally decidable formalism to capture real-time constraints over nondeterministic and probabilistic behaviors. The key hurdle in developing a convergent RL algorithm for DC specifications is the undecidability of the synthesis problem for PTA against general DC specifications. Inspired by the dichotomy between event-triggered and time-triggered approaches to the design of real-time systems, we present two variants of DC logic—that we dub event-triggered duration calculus (EDC) and time-triggered duration calculus (TDC)—and identify their subclasses with appealing theoretical properties. We study the decidability (and exact complexity) of the satisfiability of these calculi as well as the controller synthesis against PTA models. Based on these results, we propose a reward scheme for RL agents in such a way that guarantees that any RL algorithm maximizing rewards is guaranteed to maximize the probability of satisfaction for the given DC specification. The effectiveness of the proposed approach is demonstrated via grid-world benchmarks and a proof-of-concept case study for synthesizing control for simple cardiac pacemaker directly from a set of DC specifications.
Volume
2021-December
Subjects