Analysis of data aggregated preferences. The ranking of the elements as separate and static elements. Evaluation of the algorithms, assumptions, weight and shifting implicit preferences. The essence of ranking elements as a function of their attributes.
Аннотация к работе
NATIONAL RESEARCH UNIVERSITY HIGHER SCHOOL of ECONOMICS Faculty of Business and Management School of Business Informatics MASTER THESIS Learning to rank through user preference data Author: David Cortes Supervisor Attila Kertesz-Farkas Moscow 2016 Abstract This work examined approaches for ranking items in electronic catalogs under the intuitive idea of respecting user preferences. Since it is difficult to ask users to state their preferences, these were deduced from click data instead, in the form of “Item A is preferred to Item B”, following ideas from search engines. For example, a clicked item in a ranked list was assumed to be preferred to unclicked items ranked above it. Preferences deduced from these methods have been shown to agree with explicitly-stated preferences, but are biased and not always accurate. It was proposed to treat items either as elements on their own or as defined by their attributes. For the first case, algorithms based on iteratively improving an order by swapping one pair at a time were proposed, among these “Metropolis-Hastings swapping”, and for the second case, an algorithm similar to a one-class SVM that devises utility functions to rank with them was proposed. These were found to generate rankings that respect the preferences to a greater extent than sorting by clicks. ranking algorithm aggregated preference 1. Introduction Traditionally, retail companies have tried different ideas for placing items in a physical store in a manner that leads to more purchases or increased user satisfaction, and many online retailers have tried ideas to do the same in their electronic “stores”, resulting in developments such as recommender systems and advanced search systems to help search in very wide catalogs such as Amazon, but the problem of ordering the items in the catalogs of more common or ordinary businesses such as clothing stores has not received much attention. When people browse the online catalogs of a store, they typically do not inspect all the items in the list. Many visitors form their first impression of a retailer based on the items that they see on their first visit. Visitors may abandon a site if nothing of their interest is found among the first items shown to them, and along with them potential purchases are missed. It is reasonable to think that there is something to gain by displaying the items of any given section under some order where the items more likely to be interesting to users appear at the top. Some of the most common ways of arranging items in online catalogs are to sort them by clicks, purchases, date of introduction, alphabetical order, or price, yet - apart from sorting by clicks or sales - these methods do not seem to favor more popular or more attractive items over less attractive or popular ones. Sorting by clicks or by purchases seems to achieve the goal of putting more popular items first, but intuitively, one would think that items that are displayed first are more likely to be seen and clicked/purchased than items displayed last. This might create a loop where initially high-ranked items are able to receive more clicks/purchases and are deemed even more attractive, while initially lower-ranked items do not get the chance to be clicked/purchased, maintaining the initial ordering regardless of items’ attractiveness. Research in the context of search engines’ results has shown that this is indeed the case (Craswell et al., 2008). Thus, there is some logic in thinking that the rank (ordinal position in the list) at which an item is clicked is also a useful indicator, but this is not accounted for when sorting items by raw number of clicks/purchases. 10-star ratings of movies or ordered preference lists) is generally difficult, expensive and might be burdensome to some users, but so called implicit feedback taken from logs of user activity - that is, customer behavior data which is collected without asking anything to the user, such as hours spent watching a TV series - has been successfully used to make product recommendations (Hu, Koren and Volinsky, 2008) and to rank results in search engines (Joachims, 2002). Implicit user data has the advantage of being easy to collect in large quantities without associated costs other than data storage, unlike explicit data which usually requires questionnaires, stratified sampling, etc., and without posing any hassles to users. Research has found implicit feedback to be highly correlated with preference in different contexts (Oard and Kim, 2001, Konstan et al., 1997, Joachims et al., 2007, Parra and Amatrian, 2011). Implicit feedback can be interpreted or processed in different ways. One paradigm that has shown to be useful in some contexts is to interpret certain observed behaviors as stemming from user preferences of the form “Item A is preferred to Item B” (Holland, Ester and Kie?ling, 2003, Joachims, 2002), and the same approach of taking pairwise preferences has also been shown to work for product recomme