Industry Track

Part of my series of notes from NAACL-HLT 2019 in Minneapolis.

Locale-agnostic Universal Domain Classification

ambiguities in large-scale NLU in Alexa
- multiple domains can properly handle an utterance
- e.g. “I want a pepperoni pizza” – order takeout, recipe, restaurant reservation?
shortlister Kim et al., NAACL 2018 model to solve this problem
expanding to other locales
challenges
- maintaining separate model for each locale
- lack of training data (no usage history)
locale / domain maturity – how long since model deployed, how much data collected
locale specificity – overlapping domains between locales, but can have locale-specific slots, etc.
- sharing data is tricky
both shared and locale-specific encoders for utterances (BiLSTM)
shared encoder tuned to be locale-invariant – adversarial locale prediction loss
route utterance to locale-specific encoder without knowing domain using attention
then concat representations and predict domain
I don’t really understand how this addresses the challenges?
- maintaining different models – still have to do this I guess?
- lack of training data – presumably can rely on the shared encoder
experiments with different locales, different amounts of data, different kinds of domains, different encoder architectures

conclusion

paper here
executable semantic parsing – sentences -> logical forms
- unify Alexa’s primary problems of SLU and Q&A
SLU Alexa data – annotated for intent/slot tagging
- think this is proprietary, sadly
Q&A data – Overnight, NLmaps
- much more low-resource than SLU datasets
convert this data to trees
- allows more complex requests / multiple intents

annotation

transition-based parser [Cheng et al. 2017] to parse into trees
- add character embeddings and mechanism to copy directly from input
evaluate on Q&A and SLU tasks, including ablation studies
this is training on each domain independently
transfer learning from high- to low-resource domains
- tried fine-tuning and multi-task learning, both seem to help
transfer from SLU to Q&A also seems to work (preliminary)
avoided more data-hungry architectures, e.g. seq2seq
different annotation schemes among datasets
full match metric is quite harsh

paper here
- this one was really interesting and really relevant to my work, so I may or may not have taken photos of like every single slide

problem

task

understanding

dialogue comprehension task
existing datasets – reading comprehension, very limited data on open-domain dialogues
need for data augmentation
- seed with real-world data
- annotate with inquiry types & response types

types

from analysis of seed data, create templates to generate new data
- pair inquiry and response types
- entity value pool
- enrich verbal expressions and get them linguistically / clinically validated

augmentation

model – standard neural architecture thing
error analysis
- nurses and patients do chit-chat, even within task-oriented dialogue
- patients speculate on potential causal relations

errors

proof-of-concept that even without a lot of data you can get a reasonable prototype
- lots of work to be done!

pipeline

IE in the real world isn’t just text – can be visually rich
for these documents visual attributes (font size, color, etc.) are very relevant
need multimodal IE

visual

diverse tasks with low-development efficiency – techniques for one might not transfer well
contributions: visual text embeddings and unified model
documents are graphs of text segments
combine graph convolution with BiLSTM+CRF
- text embeddings for node features, edge features from relative positions
- self-attention for neighbourhoods