Maybe it is just easier for me to write simple schemas and services. To date, my only experiences have been with Assistant, Alexa and Nuance MIX.
That might be a way to learn. I recommend checking our github repository with the skills and then following our tutorials.
I am going through the SKIM paper. I don’t know how stuff like anaphora/co-references fit into the picture. I also need to read up on Transformers.
Are you familiar with neural dialogue state tracking? Essentially, the model is asked to predict all the slots that have been mentioned in the dialogue, regardless of whether they are mentioned explicity or by coreference. Neural DST is quite different than the typical NLU (sentence-at-a-time) plus rule-based state tracking that is typically offered by commercial assistant SDKs. Neural models for DST have no problem learning this task, given enough training data.
SKIM is similar to neural DST, with the added twist that it predicts a directly executable query in ThingTalk, rather than a plain list of slots.
Again, these days I work with Google Assistant. I find myself having to write my own crude dialogue tracking. It feels more like an exercise in interpreter design - so I sense where type information comes in. One of my biggest challenge is a lack of utterances. So I’m really interested in synthetic data.
Right, writing a rule-based state tracker or dialogue tree is quite tedious and it’s easy to miss some case. I see this as why even commercial assistants are moving towards neural DST; check out the Alexa Conversations paper at NAACL for example. Once you go for some sort of neural DST, you’re left with the data acquisition, for which synthesis is a good bootstrapping approach.
With some effort, you can use Genie to synthesize any kind of dialogue data, but these days it is highly optimized for dialogues that use ThingTalk as the state representation.
I see that LLMs can be used to generate text. I am not clear on how to constrain them to generate useful utterances for a given domain.
You start with rule-based synthesis - essentially, grammar-based templates. Then you apply the LM to paraphrase the output of the rule-based synthesis. Because the input is a correct sentence (maybe a bit clunky, maybe a bit repetitive, but correct), the output is also very likely to be correct. There is an additional step of filtering where you can use a semantic parsing model trained on purely rule-based synthetic data to throw out bad paraphrases.
The whole flow is described in the AutoQA paper if you’re interested.