Time for the update on the prototype. Since the last blog post was added I have been implementing the learning part of the AI agent. In my project plan my deliveries for MS2 are a terminal output that the AI agent chose actions and learn.
The AI agent uses the technique called Q-learning. Q-learning is a technique that iteratively calculates a quality value from estimations of the actions, Qvalues. The quality values between two updates are calculated and used to calculate an error value. The error value is used to train the neural network.
target Qvalue = reinforcement + discount * max(exploit Qvalue, leave Qvalue)
output Qvalue = ((1-learning rate) * previous output Qvalue) + (learning rate * target Qvalue)
error = 0.5 * (target Qvalue – previous Qvalue)^2
- current suspicious value
- detection rate
- detected value
- Quality value
The ANN have 2 hidden layers and each layer consists of 4 nodes.
The ANN is used in the function to calculate the Qvalues for the actions;
Qvalue = Qfunction(current, rate, max, action)
update: Added missing content to this post.