![Hands-On Intelligent Agents with OpenAI Gym](https://wfqqreader-1252317822.image.myqcloud.com/cover/567/36699567/b_36699567.jpg)
上QQ阅读APP看书,第一时间看更新
Model
A model is an agent's representation of the environment. It is similar to the mental models we have about people and things around us. An agent uses its model of the environment to predict what will happen next. There are two key pieces to it:
: The state transition model/probability
: The reward model
The state transition model is a probability distribution or a function that predicts the probability of ending up in a state
in the next time step
given the state
and the action
at time step
. Mathematically, it is expressed as follows:
![](https://epubservercos.yuewen.com/F87B1A/19470390008867106/epubprivate/OEBPS/Images/642ccea5-ab29-4ba7-b4eb-cb99735ec651.png?sign=1739125063-iPYxEGBy92EGnngNjKsALZ6JRUhwwNnn-0-76023bed8d59d7480e855a438565c1f7)
The agent uses the reward model to predict the immediate next reward that it would get if it were to take action
while in state
at time step
. This expectation of the reward at the next time step
can be mathematically expressed as follows:
![](https://epubservercos.yuewen.com/F87B1A/19470390008867106/epubprivate/OEBPS/Images/f8e79745-55ec-4f2a-9b2c-c9ae60570595.png?sign=1739125063-gFvx7DvtL5pTzKi4Z22aQSP1t0jOj3vn-0-c16ecfbd62c69a7bbba2c45bf66789e2)