Reinforcement Learning in Retail



Reinforcement Learnig (RL) is a family of Artificial Intelligence algorithms that, immersed in an environment, take decisions in order to maximize the cumulative reward. To give a practical example, RL could show to slot machine (or armed-bandit) players the best strategy on how much invest in trying different machines and how much to bet on the most promising ones. Indeed a feature of these algorithms is the search for the optimal balance between exploration of unknown situations and exploitation of the knowledge accumulated through trial and error.

RL is inspired by the human behaviour in which babies grow up exploring the surrounding world, gradually developing a greater skill in a bottom up way, which is gradually joined by a mental model of the world deep mechanisms in which they are immersed, with a top-down process. The still mysterious fusion between these two approaches is very powerful and is what RL try to replicate.

RL is an area of Machine Learning (ML) alongside the better-known "Supervised and Unsupervised Learning" approaches. In the first class, humans tag the correct result of several examples for the algorithms learning phase and the subsequent measurement of efficacy (such as recognizing the sex and age of a person from his or her face).

In the second group, the algorithms search for the best result on their own without external indications (such as the grouping of customers with their purchase data in few homogeneous classes). RL algorithms, despite their substantial autonomy, must be able to observe the success of their choices and some external knowledge can improve the learning speed.


Origins and state of the art

The Markov Decision Process (MDP) is among the mathematical origins of Reinforcement Learning but for the most complex issues the Deep Neural Networks have come into play in the last years in several variants, including the Deep Q-network (DQN) behind the successes of the young company Deep Mind in London, soon acquired by Google. DQN first applications have exceeded the human levels in many video games, including Montezuma Revenge of the old Atari 2600, often used by researchers as a reference point to compare the different development threads.

Later in 2016-2017 the striking victories of the of Deep Mind’s AlphaGO algorithm over several world champions of Go, an Eastern game far more complex than chess, seem to have "woken up" even the Chinese government, which until then had not given too much economic and strategic importance to AI. Similar awakening happened in 1957 when US learned that USSR had sent into space the Sputnik, the first artificial satellite.

One of the problems that makes RL quite difficult to adopt in business is the high number of interactions with the environment in order to achieve good results, so to even create algorithms to train other algorithms called Generative Adversarial Networks (GANs).

Another difference between RL and other forms of AI is that they can operate on the widest range of problems in a path towards the General Artificial Intelligence, which will be able to absorb huge quantity of digital data (books, treaties, images, videos, sensor logs ...) to automatically synthesize knowledge and understanding of our world.


Why Reinforcement Learning is important in Retail

In Retail chains it is useful to optimize assortment, stock levels and prices region by region or, even better, store by store, and above all it is vital to constantly adapt to the evolution of lifestyles, to the effects of commercial communications of producers and local competitors. While for many AI algorithm families the learning process should repeat periodically and the model would remain unchanged until the next rework, RL naturally pursues continuous optimization in an evolving environment. In addition, Google introduced in 2017 the concept of Federated Learning in which the learning from daily activity is delegated to the edge, modifying its behaviour immediately while later sharing the local knowledge with the Center and other edges.

To retain a customer and maximize long-term profit, sometimes it is necessary to sacrifice short-term profit, a non-natural approach for several algorithms but not for RL.

When introducing a new promotion, no data is available to understand the best correlations with the different types of customer and their last purchase. Luckily, RL immediately begins to take decisions, sometimes of an "exploratory" nature, and improve them day after day.

Given the "general" nature of RL algorithms, the application fields in the Retail sector are almost infinite and can embrace the entire supply chain, not only to maximise efficiency, but also with a Circular Economy view and therefore toward a general reduction of resources consumption.

The power of this family of algorithms is evident with the corresponding risk of a consumer privacy outbreak, an important matter in which Europe is leading the way with GDPR (General data Protection Regulation).