RAVE = Rapid Action Value Estimation
One of the main problems of UCT is that it needs in the beginning some time to collect enough information to concentrate on the promising move. Beforehand there is much random in the algorithm. To guide MCTS in the initial phase of UCT, RAVE will be used. RAVE is similar to UCT but RAVE rates moves with respect to the sequences.
For example: in the most cases it is a good idea to capture the queen of the opponent in a chess game. This can be learned relative fast but this heuristic is more rough.
The formulas are quite similar to UCT:
m(s,a) = Number of times the move a was chosen in a sequence beginning in s.
m(s) = sum over a in m(s,a)
To weight UCT and RAVE the following formulas will be used:
Literature: PHD THESIS by SYLVAIN GELLY (A CONTRIBUTION TO REINFORCEMENT LEARNING; APPLICATION TO COMPUTER-GO)



