AI to master Ms. Pac-Man (NLP+customer targeting application) (English

Monday, June 26, 2017

AI to master Ms. Pac-Man (NLP+customer targeting application) (English

Сокращено. Форматирование моё.

..the team divided the large problem of mastering Ms. Pac-Man into small pieces, which they then distributed among AI agents...

The method, which the Maluuba team calls Hybrid Reward Architecture, used more than 150 agents, each of which worked in parallel with the other agents to master Ms. Pac-Man. For example, some agents got rewarded for successfully finding one specific pellet, while others were tasked with staying out of the way of ghosts.

Then, the researchers created a top agent – sort of like a senior manager at a company – who took suggestions from all the agents and used them to decide where to move Ms. Pac-Man.
The top agent took into account how many agents advocated for going in a certain direction, but it also looked at the intensity with which they wanted to make that move. For example, if 100 agents wanted to go right because that was the best path to their pellet, but three wanted to go left because there was a deadly ghost to the right, it would give more weight to the ones who had noticed the ghost and go left...

“There’s this nice interplay between how they have to, on the one hand, cooperate based on the preferences of all the agents, but at the same time each agent cares only about one particular problem,” he said. “It benefits the whole.”

Ниже есть продолжение.

For example, Mehrotra said the method they developed to beat Ms. Pac-Man could be used to help a company’s sales organization make precise predictions about which potential customers to target at a particular time or on a particular day. The system could use multiple agents, each representing one client, with a top agent weighing factors such as which clients are up for contract renewal, which contracts are worth the most to the company and whether the potential customer is typically in the office that day or available at that time...

...Van Seijen said he also could see this kind of divide-and-conquer approach being used to make advances in other promising areas of AI research, such as natural language processing...

“It really enables us to make further progress in solving these really complex problems,” he said.

...That unpredictability is especially valuable for researchers who are working in the evolving field of reinforcement learning. In AI research, reinforcement learning is the counterpart to supervised learning, a more commonly used method of artificial intelligence in which systems get better at doing something as they are fed more examples of good behavior.

With reinforcement learning, an agent gets positive or negative responses for each action it tries, and learns through trial and error to maximize the positive responses, or rewards.

An AI-based system that uses supervised learning would learn how to come up with a proper response in a conversation by feeding it examples of good and bad responses. A reinforcement learning system, on the other hand, would be expected to learn appropriate responses from only high-level feedback, such as a person saying she enjoyed the conversation–a much more difficult task.

AI experts believe reinforcement learning could be used to create AI agents that can make more decisions on their own, allowing them to do more complex work and freeing up people for even more high-value work...