news

(R) RL – Large Action Space – Bootstrapping without argmax

Vaseline August 9, 2024

https://preview.redd.it/6af2z7806fhd1.png?width=665&format=png&auto=webp&s=0fbc6df90a040e33d3938740c93955ee614d3b15

I have been running some experiments and am encountering suboptimal results. Can you provide insight into potential issues with my approach?

I use a state-action-input architecture, where I perform a forward pass for each action. The network generates a single Q-value estimate per action. This approach was chosen because of the impracticality of a state-input architecture given the large action space.

The rationale behind this method is to efficiently manage the large action space. Computing the arg max of all actions in state S’ is computationally intensive. My method involves performing a forward pass for a subset of available actions and directly using the V output of the dueling network. To estimate V(s’), I average the V(s’) values produced by this subset of actions.

Any feedback on possible pitfalls or improvements is greatly appreciated.

submitted by /u/RjRdrG
(link) (reactions)

Inter Miami player ratings vs. Columbus Crew: Lionel Messi and Drake Callender are the heroes of South Beach as Herons achieve 2024 MLS Supporter’s Shield

Vaseline October 3, 2024

news

The ending of The Last Dance explained

Vaseline October 25, 2024

first Drop

first Drop

(R) RL – Large Action Space – Bootstrapping without argmax

Vaseline

Yankees vs. Nationals Best Bets: Odds, Predictions, Recent Stats and Trends for August 26

‘A Great Run’ – Former US National Time Trial Champion Lawson Craddock Announces He Will Retire Soon

Inter Miami player ratings vs. Columbus Crew: Lionel Messi and Drake Callender are the heroes of South Beach as Herons achieve 2024 MLS Supporter’s Shield

The ending of The Last Dance explained

(R) RL – Large Action Space – Bootstrapping without argmax

Vaseline

You Might Also Like

Yankees vs. Nationals Best Bets: Odds, Predictions, Recent Stats and Trends for August 26

‘A Great Run’ – Former US National Time Trial Champion Lawson Craddock Announces He Will Retire Soon

Inter Miami player ratings vs. Columbus Crew: Lionel Messi and Drake Callender are the heroes of South Beach as Herons achieve 2024 MLS Supporter’s Shield

The ending of The Last Dance explained

Latest Trending News