Safe ML: Specification, Robustness and Assurance

06 May 2019 |

Part of my series of notes from ICLR 2019 in New Orleans.

These are talks (and panels) from the Safe ML Workshop.

Introduction

ml safety issues

Cynthia Rudin: Interpretability for Important Problems

2HELPS2B

letter

Dylan Hadfield-Menell: Formalizing the Value Alignment Problem in AI

inverse reward design

David Krueger: Misleading meta-objectives and hidden incentives for distributional shift

coffee

myopia

Panel Number One

Beomsu Kim: Bridging Adversarial Robustness and Gradient Interpretability

adversarial gradient

Avraham Ruderman: Uncovering Surprising Behaviors in Reinforcement Learning via Worst-Case Analysis

RL fail

Ian Goodfellow: The case for dynamic defenses against adversarial examples

overfitting

threat

Panel Number Two

goodfellow

Above: Goodfellow, sick and tired of crappy adversarial example papers (I’m just kidding, he actually always looks like this).