上QQ阅读APP看书，第一时间看更新

Convergence

Building the system was fun. Finding the factors that make the system go wrong is another story.

The model presented so far can be summed up as follows:

From lv to R, the process creates the reward matrix (Chapter 2, Think Like a Machine) required for the reinforcement learning program (Chapter 1, Become an Adaptive Thinker), which runs from reading R (reward matrix) to the results. Gamma is the learning parameter, Q is the Q learning function, and the results are the states of Q described in the first chapter.

The parameters to be measured are as follows:

The company's input data. The training sets found on the Web such as MNIST are designed to be efficient. These ready-made datasets often contain some noise (unreliable data) to make them realistic. The same process must be achieved with raw company data. The only problem is that you cannot download a corporate dataset from somewhere. You have to build the datasets.
The weights and biases that will be applied.
The activation function (a logistic function or other).
The choices to make after the one-hot process.
The learning parameter.
Episode management through convergence.

The best way to start relies on measuring the quality of convergence of the system, the last step of the whole process.

If the system provides good convergence, it will avoid the headache of having to go back and check everything.