close
close

first Drop

Com TW NOw News 2024

(R) RL – Large Action Space – Bootstrapping without argmax
news

(R) RL – Large Action Space – Bootstrapping without argmax

(R) RL – Large Action Space – Bootstrapping without argmax

https://preview.redd.it/6af2z7806fhd1.png?width=665&format=png&auto=webp&s=0fbc6df90a040e33d3938740c93955ee614d3b15

I have been running some experiments and am encountering suboptimal results. Can you provide insight into potential issues with my approach?

I use a state-action-input architecture, where I perform a forward pass for each action. The network generates a single Q-value estimate per action. This approach was chosen because of the impracticality of a state-input architecture given the large action space.

The rationale behind this method is to efficiently manage the large action space. Computing the arg max of all actions in state S’ is computationally intensive. My method involves performing a forward pass for a subset of available actions and directly using the V output of the dueling network. To estimate V(s’), I average the V(s’) values ​​produced by this subset of actions.

Any feedback on possible pitfalls or improvements is greatly appreciated.

submitted by /u/RjRdrG
(link) (reactions)