The fundamental idea behind reinforcement learning (RL) is that the learning processes of all organisms are based on interaction with environment. RL aims to imitate this behaviour in artificial systems, so that they become fully self-organised and "intelligent". In other words, the goal of any RL algorithm is to create a function which maps perceived situations or states of the system to actions which need to be taken in them. This mapping function is known as the policy. It is developed through the system's trial-and-error experience, without the need for any prior knowledge of the environment's input-output model.
The most challenging part of my Master's project was to implement a real-time RL algorithm that would enable a rotary inverted pendulum (RIP) system, the same one as that shown on the control demo page, to learn a balancing strategy from scratch. The result is demonstrated in the video below.
After several days of running the Q(λ) algorithm, the RIP system had learned a relatively robust strategy to balance the pendulum for one minute before a swing-down controller was activated to return the pendulum to its original position in preparation for a new learning episode. This was a demonstration of pure artificial intelligence controlling a physical system, where a computer program had figured out how to do it without using any human knowledge or intervention. But do not worry, it is not taking over the world any time soon! Assuming it is plugged in, its perception of the entire world is limited to digital readings from two sensors tracking the angular positions of the horizontally rotating arm and the vertically rotating pendulum, and its only way of communicating with the world is transmitting one of seven possible numbers which happens to represent a voltage applied to the DC motor driving the movement of the RIP system's arm.