Part of my series of notes from NAACL-HLT 2019 in Minneapolis.
Locale-agnostic Universal Domain Classification
- ambiguities in large-scale NLU in Alexa
- multiple domains can properly handle an utterance
- e.g. “I want a pepperoni pizza” – order takeout, recipe, restaurant reservation?
- shortlister Kim et al., NAACL 2018 model to solve this problem
- expanding to other locales
- challenges
- maintaining separate model for each locale
- lack of training data (no usage history)
- locale / domain maturity – how long since model deployed, how much data collected
- locale specificity – overlapping domains between locales, but can have locale-specific slots, etc.
- sharing data is tricky
- both shared and locale-specific encoders for utterances (BiLSTM)
- shared encoder tuned to be locale-invariant – adversarial locale prediction loss
- route utterance to locale-specific encoder without knowing domain using attention
- then concat representations and predict domain
- I don’t really understand how this addresses the challenges?
- maintaining different models – still have to do this I guess?
- lack of training data – presumably can rely on the shared encoder
- experiments with different locales, different amounts of data, different kinds of domains, different encoder architectures
Practical Semantic Parsing for Spoken Language Understanding
- paper here
-
executable semantic parsing – sentences -> logical forms
- unify Alexa’s primary problems of SLU and Q&A
- SLU Alexa data – annotated for intent/slot tagging
- think this is proprietary, sadly
- Q&A data – Overnight, NLmaps
- much more low-resource than SLU datasets
- convert this data to trees
- allows more complex requests / multiple intents
-
transition-based parser [Cheng et al. 2017] to parse into trees
- add character embeddings and mechanism to copy directly from input
- evaluate on Q&A and SLU tasks, including ablation studies
- this is training on each domain independently
-
transfer learning from high- to low-resource domains
- tried fine-tuning and multi-task learning, both seem to help
- transfer from SLU to Q&A also seems to work (preliminary)
- avoided more data-hungry architectures, e.g. seq2seq
- different annotation schemes among datasets
- full match metric is quite harsh
Fast Prototyping Dialogue Comprehension System for Nurse-Patient Conversations
-
paper here
- this one was really interesting and really relevant to my work, so I may or may not have taken photos of like every single slide
- nurses do follow-up calls after discharge
- want to extract clinical info from unstructured spoken interactions
- attributes of these dypes of dialogues
- co-references across turns & speakers (zero anaphora)
- thinking aloud (private self-talk, backchanneling)
- self-contradiction – within a single utterance or across multiple turns
- topic drift
- dialect / discourse particles
- dialogue comprehension task
- existing datasets – reading comprehension, very limited data on open-domain dialogues
- need for data augmentation
- seed with real-world data
- annotate with inquiry types & response types
- from analysis of seed data, create templates to generate new data
- pair inquiry and response types
- entity value pool
- enrich verbal expressions and get them linguistically / clinically validated
- model – standard neural architecture thing
- error analysis
- nurses and patients do chit-chat, even within task-oriented dialogue
- patients speculate on potential causal relations
-
proof-of-concept that even without a lot of data you can get a reasonable prototype
- lots of work to be done!
Graph Convolution for Multimodal Information Extraction
- IE in the real world isn’t just text – can be visually rich
- for these documents visual attributes (font size, color, etc.) are very relevant
- need multimodal IE
- diverse tasks with low-development efficiency – techniques for one might not transfer well
- contributions: visual text embeddings and unified model
- documents are graphs of text segments
- combine graph convolution with BiLSTM+CRF
- text embeddings for node features, edge features from relative positions
- self-attention for neighbourhoods