Impact

Amazon

Leading Amazon's foundational research team for Alexa voice shopping

Leading Amazon's Alexa Shopping Research and Science team from product search to purchase

Alexa Shopping

Alexa is Amazon’s cloud-based voice service, and annotating its billions of daily customer interactions is critical to training and continually improving its voice shopping experience. Given the nascency of voice shopping, this is an open area of research and therefore, an OptimalAI data scientist leads the Alexa Shopping Research and Science team for Amazon. From product search to purchase, the team is responsible for the end-to-end Alexa shopping lifecycle and for continually improving the overall shopping experience.

NLP and Machine Learning Research

Focused on customer impact and working backwards, the team's research analyses user actions, such as the inputting of search queries, reading snippets, or scrolling through a search engine result page. This is further complicated by an ever-increasing array of smart speakers, smartphones and other Alexa devices. For instance, asking an Echo device to add an item to a shopping list is quite different from clicking and zooming in on an image on a tablet. The team applies state-of-the-art natural language processing and machine learning models to predict customer satisfaction across all of these experiences. The models look at the implicit criteria to evaluate whether Alexa has helped customers meet their goals. The criteria includes search query reformulations, how much time customers spend interacting with search results, or whether they zoomed in to study a product image in greater detail. Studying the patterns in user interactions, drives improvements to the Alexa voice shopping experience at scale.

Foundational Research

As a part of a select group of scientists working on Amazon's large-scale technical challenges, OptimalAI's scientist has help develop multiple novel models to help improve the Alexa shopping experience.

Schema-Guided User Satisfaction Modeling

Schema-Guided User Satisfaction Modeling (SG-USM) was developed to better evaluate user satisfaction in task-oriented dialogue systems. Unlike previous research, this new method takes into account how well the user's task goals, defined by task attributes, are fulfilled by the system, using a pre-trained language model. SG-USM involves a layer that learns how many task attributes have been fulfilled during the dialogue and a component that calculates the importance of each task attribute. It then predicts user satisfaction based on these factors. Tests on benchmark datasets showed SG-USM to be more effective and interpretable than existing methods, and it proved to be scalable and capable of working efficiently in situations with limited resources by using unlabeled data.

Publication

GitHub

Unsupervised NLP for Sentence-Pair Modelling

Trans-encoder, is a new unsupervised NLP model for tasks that require comparison between two sequences, like sentence similarity and paraphrase identification. The model combines the benefits of both bi-encoders, which are computationally efficient, and cross-encoders, which perform better but are more resource-intensive. The process involves starting with a bi-encoder, then alternating between bi- and cross-encoder tasks. Each task creates "pseudo-labels" that help the other task learn better. This approach is also extended to multiple pre-trained language models in parallel for mutual distillation. As a result, Trans-encoder presents the first completely unsupervised cross-encoder and a top-performing unsupervised bi-encoder for sentence similarity. It outperforms recent unsupervised sentence encoders like Mirror-BERT and SimCSE by up to 5% on sentence similarity benchmarks.

Publication

GitHub

Mixing Samples with Implicit Group Distribution

Machine learning models often rely too heavily on simple patterns within the data, which can lead to poor performance when the data changes. This is particularly problematic when dealing with different subgroups within the data. Although robust optimization tools can help, they require that these subgroups be clearly marked, which is a time-consuming process. Just Mix Once (JM1) is a new method that uses implicit group distribution, self-supervision, and oversampling to improve performance on underrepresented groups. The JM1 technique is a variation of MixUp, and it aims to improve performance on the worst-performing subgroups by blending the training distribution with a continuous group distribution. It's designed to work across different domains, is efficient to run, and its performance is competitive with or better than current leading methods in terms of enhancing the performance on poorly performing subgroups.

Publication

Uncertainty and Traffic-Aware Active Learning

The traditional method of training a semantic parser - a tool used to understand natural language - involves collecting and annotating large amounts of data, which is expensive, time-consuming and raises privacy concerns as customer data needs to be handled by human annotators. Uncertainty and Traffic-Aware Active Learning is a new method that selects utterances (or spoken phrases) for annotation based on how often they appear in customer interactions and how confident the model is in understanding them. The technique proved significantly better than previous methods when tested on both an internal customer dataset and the Facebook Task Oriented Parsing (TOP) dataset. Remarkably, it achieved the same accuracy as the traditional random sampling method, but used 2,000 fewer annotations, thereby proving more efficient.

Publication

Learn More

Amazon Alexa

Amazon Alexa Blog

Amazon Science Publications

Amazon Science: Conversational AI NLP

Research Areas

LLM

Data Mining & Modeling

Algorithms & Theory