{"PUBLIC_ROOT":"","POST_CHAR_LIMIT":50000,"CONFIRM_MINUTES":15,"UPLOAD_LIMIT_MB":8,"UPLOAD_LIMIT_MB_PDF":5,"UPLOAD_SEC_LIMIT":15,"CHAT_LENGTH":500,"POST_BUFFER_MS":60000,"COMMENT_BUFFER_MS":30000,"POST_LIMITS":{"TITLE":200,"DESCRIPTION":2200,"CONTENT":500000,"ATTRIBUTION":250,"COMMENT_CONTENT":10000},"VOTE_TYPES":{"single_up":1},"UPLOAD_BUFFER_S":10,"UPLOAD_LIMIT_GENERIC_MB":1,"HOLD_UNLOGGED_SUBMIT_DAYS":1,"KARMA_SCALAR":0.01,"VOTE_CODES":{"rm_upvote":"removed upvote","rm_down":"removed downvote","add_upvote":"added upvote","add_down":"added downvote"},"BADGE_TYPES":{"voting":{"ranks":[1,5,10,15,20],"name":"Voter"},"strengths":{"ranks":[1,5,10,15,20,30,40,50],"name":"Upvoter"},"vulns":{"ranks":[1,5,10,15,20,30,40,50],"name":"Critic"},"received_vote":{"ranks":[1,2,3,5,8,13,21,34],"name":"Popular"}}}
Created by potrace 1.16, written by Peter Selinger 2001-2019 ai-plans
☰
🏠 Home
🏆 Compete
🏁 Leaderboard
📝 Submit a Plan
🎁 Donate
Log In
Submit a Plan
Log In
Sign Up
39

0
Planning With Uncertain Specifications (PUnS)
attributed to: Ankit Shah, Shen Li, Julie Shah
posted by: KabirKumar
Reward engineering is crucial to high performance in reinforcement learning systems. Prior research into reward design has largely focused on Markovian functions representing the reward. While there has been research into expressing non-Markov rewards as linear temporal logic (LTL) formulas, this has focused on task specifications directly defined by the user. However, in many real-world applications, task specifications are ambiguous, and can only be expressed as a belief over LTL formulas. In this paper, we introduce planning with uncertain specifications (PUnS), a novel formulation that addresses the challenge posed by non-Markovian specifications expressed as beliefs over LTL formulas. We present four criteria that capture the semantics of satisfying a belief over specifications for different applications, and analyze the qualitative implications of these criteria within a synthetic domain. We demonstrate the existence of an equivalent Markov decision process (MDP) for any instance of PUnS. Finally, we demonstrate our approach on the real-world task of setting a dinner table automatically with a robot that inferred task specifications from human demonstrations.
open PDF in new tab
0

Vulnerabilities & Strengths

Add a Vulnerability/Strength