上QQ阅读APP看书，第一时间看更新

RL algorithm

The steps involved in typical RL algorithm are as follows:

First, the agent interacts with the environment by performing an action
The agent performs an action and moves from one state to another
And then the agent will receive a reward based on the action it performed
Based on the reward, the agent will understand whether the action was good or bad
If the action was good, that is, if the agent received a positive reward, then the agent will prefer performing that action or else the agent will try performing an other action which results in a positive reward. So it is basically a trial and error learning process

本周热推：