Week 3 – Prototype update

Time for the update on the prototype. Since the last blog post was added I have been implementing the learning part of the AI agent. In my project plan my deliveries for MS2 are a terminal output that the AI agent chose actions and learn.


From left to right; Generation, current detection, detection rate, detected value, reinforcement reward, exploit Qvalue, leave Qvalue, action taken, target Qvalue, output Qvalue and error.

The AI agent uses the technique called Q-learning. Q-learning is a technique that iteratively calculates a quality value from estimations of the actions, Qvalues. The quality values between two updates are calculated and used to calculate an error value. The error value is used to train the neural network.
target Qvalue = reinforcement + discount * max(exploit Qvalue, leave Qvalue)
output Qvalue = ((1-learning rate) * previous output Qvalue) + (learning rate * target Qvalue)
error = 0.5 * (target Qvalue – previous Qvalue)^2

ANN design

  • current suspicious value
  • detection rate
  • detected value
  • action


  • Quality value

The ANN have 2 hidden layers and each layer consists of 4 nodes.

The ANN is used in the function to calculate the Qvalues for the actions;
Qvalue = Qfunction(current, rate, max, action)


update: Added missing content to this post.

Leave a Reply

Your email address will not be published. Required fields are marked *