📖 ⏲️ = 4 min

From Space Invaders to Go

Deep Reinforcement Learning is an exciting form of machine learning. This method has been made popular over the last few years by DeepMind, a UK company founded in 2010, bought by Google in 2014 ($500m). DeepMind has been training its general purpose AI algorithms via… video games, simply feeding their models with raw pixels (no predefined rules) and a basic definition of the goals to be achieved by the end of the game (usually reach the highest score possible).

DeepMind kickstarted its endeavour with 8-bit Atari Games. See for instance how the agent quickly learns to play Breakout, in 600 sessions. Remember that the only input = the raw pixels of the game + the goal definition, pretty basic here: break the whole wall. 


After mastering the first generation of video games, DeepMind moved to more advanced scenarios and hit a major AI milestone by developing AlphaGo, which managed to beat the #1 (human) Go player, Lee Sedol, in March 2016.

How could we apply the same technique to another – serious – game, advertising?

Advertising is a threesome play

The target of display advertising, via traditional IAB banners or native creatives, is to get clicks and ideally conversions (signups, purchases,…). Both the publisher and the advertiser share the same mechanical objective (since these clicks mean revenues for the publisher and traffic for the advertiser) but the two commercial stakeholders are at odds when it comes to pricing the objective. The publisher is looking to get as much as possible from his inventory whereas the advertiser wants to spend as little as possible on – ideally quality – traffic. The exchange should balance both aspirations to determine the final price but technically the bidding responsibility is still on the advertiser’s side.

As higlighted in an academic paper dedicated to Real-time bidding by reinforcement learning in display advertising, RTB is a recurring event which happens multiple times during a campaign lifetime: “As such, each bid is strategically correlated by the constrained budget and the overall effectiveness of the campaign.”

Overly static RTB algorithms will determine the best price at a given moment but won’t pay enough attention to the holistic return of the overall campaign for all stakeholders (we’ll introduce a third one in a moment), where choices have to be made not only in the restricted framework of a single auction but also in the wider context of publishers’ and advertisers’ long term objectives. Which requires to also take into account the impact of advertising on the end-user whose response determines the success or the failure of the whole process. Advertising is a trio, not a duet.  

There should be a constant feedback loop to improve the bidding strategy based on the real time measurement of the campaign’s effectiveness (ultimately depending on the end-users’ behaviour), enriched by the results of similar past operations. The bidding process shouldn’t be considered as a single isolated event.

In a (deep) reinforcement learning model, the advertiser’s agent (the platform) would observe a state (here the auction parameters) and optimise its actions (placing a bid) based on the analysis of the rewards (the click / conversion AND the overall campaign effectiveness / ROI), influenced by a feedback loop, constantly refining the accuracy of the predictions.

But it gets far more complex when you admit that there shouldn’t be just one class of rewards but three: rewards for advertisers, publishers and end-users, which are all different by nature. All three being additionally two-dimensional: instant and long term. That’s a lot of data to consider.

triple win deep reinforcement learning advertising

Programmatic advertising is indeed as a very special game where the movements of one player (here the advertiser placing bids) should ultimately result in a triple victory, for all players: the publisher, the advertiser and the end-user. This is the ideal situation to develop a sustainable ecosystem. 

We are still far from this win-win-win outcome but early experimentation around a more traditional advertiser-centric scenario looks promising. Tests carried out by the academic researchers on a commercial RTB platform have resulted in a 44.7% improvement in click performance compared a most widely used method in the industry. Which proves that deep reinforcement learning, still in its infancy, could contribute to a significant increase in programmatic advertising effectiveness.



Click here for the source of the academic research quoted in this article (by Shanghai Jia Tong University & University College London).


Optimisez les revenus

de votre newsletter

Inscrivez-vous pour recevoir notre newsletter,
qui vous aidera Ă  optimiser la vĂ´tre.

Merci. Vous recevrez prochainement notre première newsletter.