Part of my series of notes from NAACL-HLT 2019 in Minneapolis.
What is a product?
- and how do you know when a product is done?
- the most academic answer: it’s never done
- you keep working on it forever and sometimes you fork it to show people
- the most corporate answer: it’s done when people will pay for it
- the most academic answer: it’s never done
- a product is a thing that people pay for
- a product has to keep a promise
Chapter 1: Academia
- very linguistics-y – “as far as possible from anything I’m doing now”
- the seeds of data ethics principles
- people have to know you’re collecting data, and for whom
- you have to use it for their benefit
- in many ways, a dissertation is kind of like a product
- it’s good when it’s done
- it’s done when you’ve published
Chapter 2: Industry
- computational linguist at Microsoft
- worked on sorting algorithms for non-Latin languages
- this suddenly became very relevant and very urgent after the 2004 tsunami
- made decisions that became official without necessarily having any contact with the community or the appropriate linguistic qualifications
- learned about prioritising user experience, making tradeoffs
- e.g. spellchecking – what if you could catch 80% of the errors in 20% of the time?
- how do you behave ethically and humanely for a billion people?
- can’t be based on personal relationships anymore
- I think this is really important to think about, and not addressed by enough people
Chapter 3: Textio
- language is fascinating, complicated, and constantly changing
-
learning loops
- used to be data creating product
- now it’s product creating data (creating product creating data…)
-
Textio live demo
- mainly targeted at recruitment for now
- not just editing but generation (quite impressive!)
- learning loop involves predicting, suggesting, and creating
- “you can’t do this with an unexplainable model”
- but also “just use what works”, the product doesn’t care about your theoretical biases
- I don’t get this
- helping companies learn about themselves and embody the values they want
- fill roles faster and be more unbiased
- these really do go together, that’s just math
- rule-driven vs. data-driven – sometimes make an editorial choice for the former because of the world we want to create
Three principles of data ethics for the learning loop era
- No surprises – make it clear you’re collecting data
- Use data the way you say you are – don’t sell or share
- Use data for the benefit of those who provided it