Genie Community Forum

Newbie to Almond and Genie

Hi Folks:

I am new to Almond. I am very interested in the Genie Toolkit and its synthetic data approach. I tend to program in Python and I have experience with Dialogflow and Alexa. I was hoping to use Genie to help me develop new interaction models (agents) for Alexa and Google Assistant (Dialogflow ES). I’m slowly reading the documentation. Since I’m new, I am not sure what the almond-cloud/router/bridges do. I

My question is are there pointers to developing a very simple Almond skill using Genie? I am looking at the tutorials. I am not sure how Genie is fitting into the picture. Are the sample utterances provided by users or is additional utterances being generated by the Genie-Toolkit under the hood?

Cheers,
Andrew

Hello @andrewfr, welcome to the Almond community!

Genie is the core conversational technology behind Almond. Think of Genie as Dialogflow, and Almond as Google Assistant. Users interact with Almond and developers deal with Genie. Genie produces agents that can be integrated in any app or website (it’s a nodejs package you can import).
The almond-cloud package, as you noted, also contains API bridges to expose Almond on Alexa and Google Assistant. Those API bridges might be useful to deploy a custom Genie-based agent as well, but they don’t do that at the moment, and in fact they are a bit buggy at the moment, especially on the Alexa side.

Genie is synthesis based: you give it a few inputs (orders of 3-5 per function/field) and it generates a lot more sentences (easily 500k - 1M when doing dialogues) that it uses to train. The tutorials explain how to prepare the inputs to Genie (aka the Thingpedia skill manifest). Genie is run automatically when you upload a skill to Thingpedia, or you can run it manually on your own machine if you wish to train your own custom semantic parsing model.
I suggest reading our papers introducing Genie in general and for question answering if you would like to know more of how the synthesis works.

Hi Giovanni:

Thanks for the welcome!

and in fact they are a bit buggy at the moment, especially on the Alexa side.
I see the last commit was in July. I’m assuming the intent_parser.js is a Lamdba? When I used Alexa, I programmed in Python and use a web based end-point. That said, how does Genie generate an Alexa interaction model? Also what makes the code buggy? I’m looking at the history.

I suggest reading our papers introducing Genie in general

Will do. Thanks for the references.

Cheers,
Andrew

The Alexa intent_parser is part of the almond-cloud web frontend. almond-cloud is deployed as a regular old stateful web server.
You could perhaps rip that code out and make it serverless, but it currently isn’t.

The existing code has two options to generate an interaction mode:

  • one is to map every command you specify in Thingpedia (every function and every primitive template in dataset.tt) to an Alexa intent - this is the “dumb” way that uses Alexa’s NLU and bypasses almost Genie entirely, Genie is only used to generate the reply not to interpret the user
  • the other is to have one single Alexa intent with a SearchQuery slot that collects the whole input from the user, and then use Genie’s semantic parser to interpret that; this tends to be better from the UX because Alexa intents are fairly limited but also it can fail in surprising ways due to the need for a carrier phrase in the Alexa intent

Hi Giovanni:

Again thanks for the answer. I am still going through the papers. I have to get stuff running to see what is happening.

the other is to have one single Alexa intent with a SearchQuery slot that collects the whole input from the user, and then use Genie’s semantic parser to interpret that; this tends to be better from the UX because Alexa intents are fairly limited but also it can fail in surprising ways due to the need for a carrier phrase in the Alexa intent

I am not sure if Alexa would work well using, if at all, using this approach. How are you giving Alexa an interaction model? Maybe I don’t understand because I’m a newbie?

There really isn’t an constraining limitations on the number of custom intents one can make, so I don’t think that is an issue. I am assuming a generated Alexa interaction model would mirror what is in the dataset.tt. The stuff in the manifest.tt would accessed by Alexa through a web hook. Again, what is interesting me is Genie’s ability to synthesize a better Alexa interaction model then crafting one purely by hand (or some 3rd party tool).

Humour me for the moment. While I learn more, is it possible to see some intermediate format Genie produces. Say for the “Hello World” and “Cat API” examples. Looking at that, I’ll be better able to see if I can generate an interaction model.

Thanks!
Andrew

Genie’s intermediate representation is ThingTalk, the programming language we purposefully designed to represent virtual assistant commands. Which is to say, Genie produces datasets containing pairs of sentences and programs (hundreds of thousands of such pairs), and produces a semantic parser (a machine learning model) that takes natural language sentences and produces programs. There are no intents anywhere, and there is no Alexa-style interaction model based on dialogue trees. It’s all directly translated from natural language to an executable form.

This is all useful because Genie knows how to compose things. So you specify how to query a restaurant by price, and specify how to query a restaurant by area, and you get the ability to query by price and area. But that is done at the level of ThingTalk programs, not at the level of intents, because intents don’t compose.
You could, if you really wanted, make an intent out of every distinct ThingTalk program ignoring parameters - not just every program in dataset.tt but also every way in which a program is composed with additional filters and clauses during synthesis. You would end up with a lot of intents (potentially thousands) and I think the accuracy would suffer because you’re not sharing knowledge across very similar intents.

Hi Giovanni:

There is much for me to digest here. I appreciate the explanation. The thing confusing me whether intent and skill are being used interchangeably. Also I need to visit the papers to understand what programme means. I’ll accept Alexa skills aren’t composable either (as opposed to say Bixby Capsules that seem to be composable).

Blockquote You could, if you really wanted, make an intent out of every distinct ThingTalk program ignoring parameters - not just every program in dataset.tt but also every way in which a program is composed with additional filters and clauses during synthesis. You would end up with a lot of intents (potentially thousands) and I think the accuracy would suffer because you’re not sharing knowledge across very similar intents.

Maybe what I have in mind is impractical. However I would be aiming at generating a skill. Intents within that would-be Genie synthesized skill (custom ones at that) ought to be sharing knowledge. Mind you, there still would be an Almond programme. The reason I care about generating an interaction model is so Alexa’s ASR/NLU works reasonable well with the Almond app.

However Genie is using stuff like ontologies from Schema.org and generating sentences, thousands of them. Wow! These by-products alone makes Genie a potentially powerful component for in building a tool for constructing Alexa skills and Dialogflow agents

Cheers,
Andrew

Yeah, so the terminology I understand in Alexa is:

  • “skill” is just the user-visible concept of the whole interaction model, the “Foo” in the user phrase “Ask Foo for …”
  • “intent” are families of commands with similar implementation and similar semantics, associated with a label

So for example, you might have a “Cool Restaurants” skill that has intents “FindRestaurantByName”, “FindRestaurantsAround”, “BookRestaurant”, “CancelBooking”, etc.

So the command: “find me Mc Donalds”
would map to FindRestaurantByName(name="McDonalds")
the command: “find me a cheap restaurant nearby”
would map to FindRestaurantAround(price="cheap")
the command: “book at a table at the french laundry”
would map to BookRestaurant(name="the french laundry")
etc.

Genie’s approach is a bit different. We have:

  • “device”: a specific backend service that the assistant relies on (which could be your own custom backend)
  • “function”: a specific query of information, or a specific API action that the backend provides
  • “program”: a sequence of function calls with control constructs, that implements the semantics of a specific command

So for example who ever implements the OpenTable service in Almond would declare a query function called restaurant with a bunch of output parameters: the name, the price, the location, etc.
And now all the commands that search for restaurants are mapped to a database lookup in that table:

“find me Mc Donalds” maps to @opentable.restaurant(), id =~ "mc donalds";
“find me a cheap restaurant nearby” maps to @opentable.restaurant(), price == enum cheap && geo == $location.here
“book a table at the french laundry” maps to @opentable.restaurant(), id =~ "the french laundry" => @opentable.make_reservation(restaurant=id);

Note how each program, in addition to referring to the function with parameter, has a whole lot of logic in to it, because the command requires that logic. Also note how the last command splits into two API calls.
This approach allows you to compose phrases and it’s how to get to complex queries quickly.

Genie’s NLU can go directly from sentence to program - no intent needed.
Alexa’s NLU cannot deal with programs though: all it knows is intents.
So the best way to integrate the two is to delegate to Genie’s NLU. Alexa would still be in charge of ASR, of course.

If you really want to use Alexa’s NLU, you need a way to map each program to an intent. The easiest way is to take all possible programs (or a reasonably large sample of all possible programs, because programs compose and form an exponential space), and compress them down by unifying the parameters. So for example Genie generates:
“find me cheap restaurants here” -> @opentable.restaurant(), price == enum cheap && geo == $location.here
“find me expensive restaurants here” -> @opentable.restaurant(), price == enum expensive && geo == $location.here
“find me cheap restaurants in Palo Alto” -> @opentable.restaurant(), price == enum cheap && geo == new Location("palo alto")
etc.
you can compress all those programs to a single intent FindRestaurantByPriceAndLocation with those slots, and then, later reconstruct the full program from the Alexa intent, to pass to the rest of the Genie runtime.
The code you have found does the second step, but we don’t really have tools to do the first step (compiling the list of intents) because you get a lot of intents and that usually doesn’t work well with Alexa.

Hi Giovanni:

Once again, thanks for the response. I really have to read the papers in depth and set up Almond so I understand what is going on. Some more comments:

Genie’s NLU can go directly from sentence to program - no intent needed.
Alexa’s NLU cannot deal with programs though: all it knows is intents.

Clarification. By NLU - we mean the part that is figuring out the intent and slot values? The data that is being sent to a backend or Lambda function for processing.

you can compress all those programs to a single intent FindRestaurantByPriceAndLocation with those slots,

The code you have found does the second step, but we don’t really have tools to do the first step (compiling the list of intents) because you get a lot of intents and that usually doesn’t work well with Alexa.

So we want to get to a point in step 1, where we have an intent FindRestaurant with three slot values: {name, price, location}

Step 2 (where the tools exist), would take something coming from the Alexa NLU (or Dialogflow NLU) like FindRestaurants(price=cheap, location=“Palo Alto”, name=“any”), feed it into the Genie Runtime.

I could guess what would be needed to write a tool need to complete step one. Would such a tool be useful? Maybe new insights can be found? Again, I would be happy to start with a very simple example and see what is needed.

Cheers,
Andrew
Cheers,
Andrew

Clarification. By NLU - we mean the part that is figuring out the intent and slot values? The data that is being sent to a backend or Lambda function for processing.

Yes, NLU is the process of translating a natural language command to an unambiguous executable representation that the backend can process.

So we want to get to a point in step 1, where we have an intent FindRestaurant with three slot values: {name, price, location}

Step 2 (where the tools exist), would take something coming from the Alexa NLU (or Dialogflow NLU) like FindRestaurants(price=cheap, location=“Palo Alto”, name=“any”), feed it into the Genie Runtime.

Yes that’s pretty much it. I think such tool would be very useful, and we would be happy to integrate it in the Genie toolkit if you would like to contribute it.

Yes that’s pretty much it. I think such tool would be very useful, and we would be happy to integrate it in the Genie toolkit if you would like to contribute it.

I’ve looked through the tutorials. Now I ought to write an Almond device from scratch so I can get a feel. Maybe just stepping through all the stages involved in synthesizing a device would be necessary. I’ll use a debugger if need be. I would like to see what is generated at each important stage.

I came across a tutorial on LLVM.There is a flag to clang that emits the internal representation. I guess I’m looking for something like that …

The best way to learn the low-level of Genie is to use the starter packages at https://github.com/stanford-oval/genie-toolkit/tree/master/starter

Each one works as a sort of tutorial and guides you through building a dataset for neural semantic parsing for various Thingpedia skills.
The starter code assumes you installed Genie from source, using the instructions at https://github.com/stanford-oval/genie-toolkit/blob/master/doc/install.md

There is also a step-by-step tutorial at https://github.com/stanford-oval/genie-toolkit/blob/master/doc/tutorial-basic.md
Both the tutorial and the starter codes will generate various intermediate files that you can analyze too see the ThingTalk representation of each command.

Hi Giovanni:

Thanks for the information!

I downloaded almond-server. I’m having problems installing podman, so I’m running almond-server in its Docker container. I am not a Docker expert but I’ll assume Genie has to have access to the stanfordoval/almond-server volume? Also looking at Genie’s install.md, I see ```
pip install genienlp. I guess I ought to set up a virtual environment?

Cheers,
Andrew

almond-server comes with a bundled version of Genie.
If you want to talk to a local version of Genie, which is recommended for development, you’ll want to install almond-server from git, and use yarn link genie-toolkit from the almond-server directory to link to your local clone of Genie.

As for the Python components, you can install in a virtual env or in your user folder (with pip3 install --user). The latter is recommended because it’s a bit easier. If you choose the virtual env, make sure you run all Genie commands while the virtual env is active, so PATH is set correctly.

Hi Giovanni and Almond Developers

Some questions:

I have installed thingpedia, The instructions recommend a developer key. What does this look like? A uuid for instance?

~

My local install of almond-server is failing. It is because this machine is using a rather old Linux kernel (long story). I am trying to install the local almond-server instead of the Dockerized version, so I can use my local copy of Genie-toolkit. Is there a way to make the Dockerized version use my local Genie-toolkit copy?

Cheers,
Andrew

What do you mean by “have installed thingpedia”? You mean genie-toolkit?
The Thingpedia developer key is a long hexadecimal string that you get from a Thingpedia instance, usually thingpedia.stanford.edu, after applying for a developer account. Guide

As for almond-server, I don’t think you can get a custom genie-toolkit inside the docker. Even if you could, I would not recommend it unless you’re very familiar with docker and also how npm packages work.
What error do you get when you run almond-server? Chances are it’s just a dependency error, I don’t think we require a very recent Linux kernel.

What do you mean by “have installed thingpedia”? You mean genie-toolkit?

I should have been more specific. The Thingpedia Command Line Tools.

What error do you get when you run almond-server? Chances are it’s just a dependency error, I don’t think we require a very recent Linux kernel.

I was getting errors during invoking yarn (I haven’t worked with Node for a while). Okay I did a yarn start and almond-server is running. running. In the week,I’ll get back to the tutorials. Thanks for the help!

Cheers,
Andrew

Hi Giovanni and Almond Developer Team:

I’ve used a few voice platforms now. However it is always a thrill to hear a new platform for the first time! It was really great to hear Almond!

Cheers,
Andrew

I’m glad you got it working! And very nice that sound worked too.

Did you have to do any additional steps to get it working? I’ve started a troubleshooting page and i’ll add your feedback if you have any. Thanks!

1 Like