Genie Community Forum

About Program Synthesis

Hi Folks:

I am new to Almond. I have written voice apps with Alexa and Google Assistant. I trying to understand the bigger picture about Almond’s approach. I recently glanced through the paper DIY Assistant. I’ve also recently seen Microsoft papers on PROSE - which is about program synthesis … and its use with GPT-3 and FX. The language the papers use and the approach seem similar. Is this the way voice apps in the near future will be built?

I’ll continue to try to get a developer version of Almond running on my local machine. However I am really curious about how I use these technologies (i.e Large Language Models, schemas/ontologies), today, to improve my Dialogflow and Alexa apps?

Cheers,
Andrew

Hi @andrewfr,

You raise some very good questions!

The work in program synthesis is not quite related to a voice assistant (or more generally semantic parsing). The goal of program synthesis is to find a program that satisfies a set of input / output examples. Typical examples are SQL queries by example, or transformations on log files or spreadsheets. In semantic parsing, the goal of the user is instead expressed as natural language.

Nevertheless, large language models definitely play a role in our work - just like pretty much all of NLP these days. Our latest model uses BART, which is a very popular pretrained LM with a sequence to sequence architecture. It’s quite smaller than GPT3, which I think is an advantage: GPT3 is so large it cannot be finetuned for a specific task, on a specific dataset. All you can do is tweak the prompt you provide in the hope that it naturally has the right behavior. I think this is less reliable than finetuning, especially in settings like voice assistants where you really need control of what the agent is doing.

The question of fine-tuning vs prompting, and the role of large LMs, is far from settled though: it’s definitely an active area of research, with a lot of new papers every few months.

Hi Giovanni:

Thanks for the response. It seems to me that the output of both is a program.

All you can do is tweak the prompt you provide in the hope that it naturally has the right behavior.

I am reading the papers. I am still having a hard time piecing things together. Perhaps the best thing for me to do is get Almond running on my machine, write a really simple schema and Almond service and see what happens in debugger?

Cheers,
Andrew

I would suggest reading the Genie paper and the more recent SKIM to understand more about the NLP methodology in Almond.

You can also run Almond and try piecing together how it works, but there are quite a few components - many of which run in our cloud by default - so it might not be as instructive.

Hi Giovanni:

You can also run Almond and try piecing together how it works, but there are quite a few components - many of which run in our cloud by default - so it might not be as instructive

Maybe it is just easier for me to write simple schemas and services. To date, my only experiences have been with Assistant, Alexa and Nuance MIX.

I am going through the SKIM paper. I don’t know how stuff like anaphora/co-references fit into the picture. I also need to read up on Transformers.

Again, these days I work with Google Assistant. I find myself having to write my own crude dialogue tracking. It feels more like an exercise in interpreter design - so I sense where type information comes in. One of my biggest challenge is a lack of utterances. So I’m really interested in synthetic data.

I see that LLMs can be used to generate text. I am not clear on how to constrain them to generate useful utterances for a given domain.

Cheers,
Andrew

Maybe it is just easier for me to write simple schemas and services. To date, my only experiences have been with Assistant, Alexa and Nuance MIX.

That might be a way to learn. I recommend checking our github repository with the skills and then following our tutorials.

I am going through the SKIM paper. I don’t know how stuff like anaphora/co-references fit into the picture. I also need to read up on Transformers.

Are you familiar with neural dialogue state tracking? Essentially, the model is asked to predict all the slots that have been mentioned in the dialogue, regardless of whether they are mentioned explicity or by coreference. Neural DST is quite different than the typical NLU (sentence-at-a-time) plus rule-based state tracking that is typically offered by commercial assistant SDKs. Neural models for DST have no problem learning this task, given enough training data.
SKIM is similar to neural DST, with the added twist that it predicts a directly executable query in ThingTalk, rather than a plain list of slots.

Again, these days I work with Google Assistant. I find myself having to write my own crude dialogue tracking. It feels more like an exercise in interpreter design - so I sense where type information comes in. One of my biggest challenge is a lack of utterances. So I’m really interested in synthetic data.

Right, writing a rule-based state tracker or dialogue tree is quite tedious and it’s easy to miss some case. I see this as why even commercial assistants are moving towards neural DST; check out the Alexa Conversations paper at NAACL for example. Once you go for some sort of neural DST, you’re left with the data acquisition, for which synthesis is a good bootstrapping approach.
With some effort, you can use Genie to synthesize any kind of dialogue data, but these days it is highly optimized for dialogues that use ThingTalk as the state representation.

I see that LLMs can be used to generate text. I am not clear on how to constrain them to generate useful utterances for a given domain.

You start with rule-based synthesis - essentially, grammar-based templates. Then you apply the LM to paraphrase the output of the rule-based synthesis. Because the input is a correct sentence (maybe a bit clunky, maybe a bit repetitive, but correct), the output is also very likely to be correct. There is an additional step of filtering where you can use a semantic parsing model trained on purely rule-based synthetic data to throw out bad paraphrases.
The whole flow is described in the AutoQA paper if you’re interested.

Hi Giovanni:

Once again, thank you for the detailed response!

That might be a way to learn. I recommend checking our github repository with the skills and then following our tutorials.

I have started the tutorials and I’m looking through the github repository. I’m also going through the ThingTalk documentation. Really I need to write a few skills.

Are you familiar with neural dialogue state tracking?
No. I had look up the term and read papers. In general, I have a layman’s knowledge of neural networks. I need to change that.

The whole flow is described in the AutoQA paper if you’re interested

I’m going through the AutoQA paper. I am unfamiliar with so much. I’m also reading the referenced paper “Building a Semantic Parser Overnight” - the illustration showing the flow from lexicon to semantic parser was handy! Still there is a lot I need to read just to get an inkling of what’s happening.

Again thanks!

Cheers,
Andrew