02 Sep

Topic-based matching of advertisements and customers

Vuk Batanović Tech

In order to ensure that relevant advertisements are displayed to customers, ad serving systems typically rely on various forms of keyword matching techniques. In such systems, each ad is associated with a set of keywords. Whenever a customer visits a webpage containing any of the keywords predefined for a particular ad, that ad becomes eligible for being shown to the customer. The main strength of this approach is that it is simple and easy to control – advertisers can observe the effects and make optimization decisions for each individual keyword, and can manually expand or reduce the keyword set to achieve the desired targeting scale.

Limitations of keyword-based matching

However, a keyword-based system doesn’t understand the meanings of the words with which it operates, which prevents it from matching the specified keywords to their synonyms or other semantically similar expressions. This limitation produces two kinds of negative effects:

First, it makes it very hard to significantly increase the targeting scale without diminishing targeting coherence. If advertisers want to scale up, they have to increase the number of keywords to unreasonable levels (e.g., thousands or tens of thousands of keywords) in order to cover as many possibilities as they can, making the whole campaign hard to manage. Alternatively, advertisers can opt for using very general words (words that are very common but not very specific) as keywords, which dilutes the targeting focus and typically results in higher costs per acquisition (CPA).
Second, it causes some portions of the inventory to remain unused. Even when large sets of keywords are used for each advertisement, there will always be a sizeable number of relevant webpages not being matched to any of them, since it is almost impossible to include all relevant terms and expressions in a keyword list.

Topic-based matching

To overcome these limitations, the NLP (Natural Language Processing) team at Bravo Systems has been working on extending the existing keyword-based ad serving system with topic-based targeting. The main idea is to enable content matching based on the topics of advertisements (as selected by the advertiser) and customer webpages (as detected by a specialized multilabel classifier). Topic-based matching allows our advertisers to increase the targeting scope of their campaigns by utilizing previously unexploited parts of the inventory.

Topic taxonomies

Topic-based matching relies on a topic taxonomy, in which topics are hierarchically organized according to their level of granularity, ranging from the most general ones, such as “Medical Health”, to the most specific ones, such as “Thyroid Disorders”. Our company currently uses the IAB (Interactive Advertising Bureau) content taxonomy, an industry open standard.

However, all publicly available topic taxonomies, including the IAB one, have numerous systemic and structural deficiencies. One is that they do not provide any topic definitions, which makes it hard to determine the adequate scope of some topics. This can be particularly problematic for topics whose names are unclear (e.g., “Information Services Industry”). The existing taxonomies also often have redundant topics – for instance, the IAB content taxonomy has two restaurant-related topics, “Bars & Restaurants” and “Dining Out”, belonging to different topic groups (“Attractions” and “Food & Drink”, respectively). Distinct topics also sometimes cover the same products or services. On the other hand, taxonomies often lack the topics for some common products or services (e.g., cryptocurrencies), forcing users to either mislabel content or invent ad-hoc solutions. Finally, the positions of topics within the taxonomy can be inconsistent – for example, the IAB taxonomy places “Musicals” in the “Attractions” topic group, but the closely related “Theater” topic belongs to the “Fine Art” topic group. All of these issues reduce topic labeling consistency for both automated models and humans.

Resolving these issues was one of the key goals of our NLP team. It was accomplished by developing a new topic taxonomy, using the IAB standard as a starting point. The new taxonomy contains around 550 topics arranged in five levels and 27 topic groups, and was designed to support multilabel classification. The topic-based matching experiments we performed were based on this new topic taxonomy.

Preliminary test results

In our preliminary tests, we supplemented the existing keyword logic with topic-driven traffic on a group of Banking and Finance ads and achieved very promising results. Over two thirds of the test campaigns reached the desired performance goals, with successful campaigns showing an average increase in conversion numbers and revenue of around 140% while retaining CPA values below the target level. We are currently preparing a second round of tests, with additional Banking and Finance test campaigns, as well as campaigns belonging to different topic groups. We will also evaluate the joint use of topic targeting and the CPA optimization algorithm.

Written by

Vuk Batanović

Engineering Lead