Deploying Almond with custom UI and Natural Language Support

Anggrio · August 6, 2021, 4:22pm

Hi, my name is Anggrio and I’m a student from the Australian National University. I’m currently working on a project that wants to use Almond as the base for developing a text based virtual assistant that can filter news articles. A few months ago, I’ve posted in the forums here about our team’s progress in implementing it, specifically that we managed to deploy our own web Almond that can access a new news filter skill based on the developer key, but it did not have natural language processing capabilities.
In this semester, my team is looking to try and improve on our previous results by creating our own custom UI / webpages for Almond and implementing the natural language capabilities for our news filter skill. As such, we have some questions regarding how we should proceed with the project:

First, if we want to create our own custom UI such that we can implement our own design of the homepage/login page/and so forth for the Almond, how should we connect to Almond?
Last semester, we used the Web Almond Only installation option for our project. Is there a file that we can tweak in the almond-cloud package to manipulate the UI of the webpages?
or is it possible to design our own UI/website first and then connect it to Almond using the Web Almond APIs?
I’m not sure which option is better or more specifically what are the differences between the two methods, most importantly in cases of connecting to a developer key or our own custom natural language model.
Second, I believe that a few months ago the automatic natural language training was dropped for new skills, which led our project to use the thingtalk language to test the news filter. Is this still unavailable for the newest Almond release?
If yes, then how can we train our own natural language model for our news filter skill? I believe that using Genie was mentioned as a suggestion before but we didn’t have time to implement that last semester.
Finally, we’ve tried to follow our previously working steps to deploy Almond-cloud to try and get Almond running again for some preliminary testing. However, we ran into some errors which didn’t happen last semester.
Using the base git address, the following error is produced when running npm install:

image2293×682 42.5 KB

We’re not sure what’s causing this error, but we managed to get pass it by using git checkout v2.0.0 to go to the Almond 2.0 branch. However, running npm install instead raises some vulnerabilities as shown below:

image1174×215 10.7 KB

Using npm audit, we found that the issues are: axios, dependency of actions-on-google; underscore, dependency of genite-toolkit; sanitize.html, dependency of foodoc; json-bigint, dependency actions-on-google; tar, dependency of genie-toolkit. We tried to use npm install to install the latest versions of those packages and we’ve also tried running npm audit fix but neither of them seem to solve the vulnerabilities.
We then tried to just continue running it as a test and it seems to get stuck in either step 5 or step 6 since the step 5 output is as follows:

image992×256 4.86 KB

While it does allow us to access a deployment of Almond cloud, we found that it doesn’t go to our assigned url variable (which worked last semester) and that while we can login and go around the website, the Almond itself is not connected (the error is failed to load resource, status 404 not found).

Any help with these three problems would be greatly appreciated.

Thank you

gcampax · August 7, 2021, 5:08pm

Hey

If you want to tweak the Almond UI, that lives in views/conversation_mixin.pug, browser/conversation.js (browserify bundle, rebuild with npx make) and public/stylesheets/conversation.css. Alternatively, you can build your own custom UI independent of the existing codebase, and communicate with Almond over the API
You should follow the instructions in the Thingpedia Guide to set up a local dev environment. The local dev environment will have a README explaining how to generate a dataset and train. You can read the Makefile to find out all the tunable hyperparameters as well.
Don’t run npm as root. It unfortunately does stupid, broken things. If you’re developing locally, you can set up your node/npm in a way that it will install without root. If you’re building a container, switch to a non-root user before the npm install step.

gcampax · August 7, 2021, 5:18pm

By the way, we’re moving towards requiring Kubernetes as the recommended installation strategy for Almond Cloud, both for development and for deployment. We’ll have updated example configuration files and instructions soon (probably end of next week).

If you have access to a Kubernetes cluster or can setup one, I recommend you look into that. Outside of managed cloud offering (EKS, GCK), easy to install options for Kubernetes are kind with docker, minikube in a VM, and k0s on bare metal.

Anggrio · August 9, 2021, 5:22am

Hi Giovanni,
I’ve read through the documentation that you’ve linked and I still have some questions regarding the methods you provided.

Our team isn’t experienced with web development that much. Does it matter what web framework we use to create the UI if we want to start from scratch? What frameworks did the Stanford team use when developing the web Almond (if any)?
From the local testing and genie docs you linked, it seems that the genie init-device will provide some template files similar to what needed to be submitted when creating a device through the website. Can I reuse my code from the version I uploaded to the website and simply update the related files (manifest.tt, index.js, dataset.tt)?
For testing, are we supposed to create the NLP model first? I see that the scenario test in particular would replace the method of testing it in the web Almond and that it can accept natural language/programming language similar to how the web Almond can input both as well.
For generating the dataset and training the model, the requirements for the hardware used seems to be quite heavy. Would it not be possible to run it in a normal laptop/home computer? Also, would the generated dataset be a new dataset.tt, replacing the one we’ve made last semester?
When we are satisfied with the locally made skill, how do we connect it to the Almond so that our custom UI can use it through the API (if this is even possible)? Is it by using the genie upload-device command?

gcampax · August 9, 2021, 3:41pm

The frontend framework doesn’t matter! We used nodejs+express for the server (+ the usual express middleware), pug for templating, bootstrap+jquery for the frontend.
Yes, genie init-device will give you a skeleton that you can fill up with your files. The meaning of the files is the same locally and in the web editor.
You can test in the scenario tests with \t while the model is training. Scenario tests are a way to run automated testing that you would do in the web UI, and have the same features.
The requirements stated are to build a full-size dataset (about ~750k synthetic turns), which one needs to train the whole Thingpedia at once. With a single skill, you can probably lower the hyperparameters, something like make subdatasets=2 target_pruning_size=150 should do nicely. I recommend at least 16G of RAM though. The requirements for training are unfortunately fairly strict: you need a good GPU with a good amount of VRAM, and you need patience.
The dataset you get consists of synthetic.txt containing dialogues, and user/train.tsv containing the turns extracted from the dialogues. It is constructed from the phrases in manifest.tt and dataset.tt; it does not replace either.
You can upload the skill to Thingpedia as before (make will give you a zip file to upload), or you can use the genie upload-device command - the latter does the same upload in a scripted fashion. Then you can run an NLP server to serve the model (using almond-cloud’s nlp component, or using genie server which is a bit more lightweight) and point your custom Web Almond to that NLP server.

Anggrio · August 12, 2021, 3:02pm

Hi Giovanni, our team has just tried to create the device locally and we’ve run into some problems that we aren’t sure about.

Since we will create a new device, would it have any issues if we used the same name as the one that exists right now in dev Almond using my account? I used my dev Almond account’s developer key for the developer key parameter when running genie init-project.
When copying the files, we found that the make lint command made an error for some lines that were originally accepted using the web version. The changes that it asks include adding a semicolon and removing a curly bracket section in favor of using one line (for if and else), as well as changing the end-of-line to use the linux format. This shouldn’t affect how the device works right?
Finally, we couldn’t test it using the genie-assistant command because we couldn’t package it using npx make. The output is shown below:

image1540×371 31.6 KB

We’re going to try making the scenario tests later this week, but found that genie-assistant command in the documentation. It seems to enable us to use Almond directly in the command line (chat with the assistant like the normal website). Is this command also suitable for testing? Also, is it connected to the normal Almond website or the dev website? and how can I change the developer key if its using the normal website?

gcampax · August 13, 2021, 6:54pm

If you have both a local device and a device in Thingpedia with the same ID accessible with the configured dev key, the two will be merged. In practice this only affects secrets (API keys) though.
No it doesn’t, but we encourage everyone to follow a consistent coding style.
I think you installed the wrong make. It looks like you installed make from npm. You should install make from the package repository (apt-get or brew). In fact, if you’re on a Mac you should probably use gmake instead of make.

Anyway, yes genie assistant is designed for testing. You can point it to the local directory containing your skills, so you don’t have to upload to Thingpedia. It’s typically connected to the normal Almond website but you can change that from the commandline. Use --help to learn all the relevant flags.

Anggrio · August 18, 2021, 4:29pm

Hi Giovanni, it seems that we managed to get it working without warnings by reinstalling the entire thing and we could test the previous semester’s code by pointing the local directory when using genie assistant. However, we ran into some problems regarding the uploading, scenario test, and generating training data.

For the upload, we tried the same command as before with npx followed by genie-upload. The npx works fine and it created the zip file in the build directory. However, when we tried to run the genie-upload command, the following happens:

image866×224 8.87 KB

I have included my developer key when setting up the project using genie init-project --developer-key, but it seems that the access token is something else?
For the scenario test, we tried to use a simple get news article command for the sports topic and the test itself ran without issue. However, it seems that our expected output is not in the correct format in the scenario.txt file, as it failed the test even though the output is what we expected. The following is the output of the test:

image1127×285 10.6 KB

Our scenario.txt file is as follows:

# 1. Request sports topic news articles
U: \t @org.itspersonal.newsfilter.news_article(topic = enum sports); 
A: Here are the top 5 news for today.
# 2. Request tech topic news articles
U: \t @org.itspersonal.newsfilter.news_article(topic = enum tech); 
A: Here are the top 5 news for today.

How should we rewrite the expected answer?

Finally, for the generate training data, it unexpectedly stops as follows:

Generate error752×367 6.45 KB

We tried to use your suggested command, make subdatasets=2 target_pruning_size=150 and we’re not sure what is causing it to stop suddenly. It also didn’t generate a synthetic.txt file or a user/train.tsv file. Is there something we’re doing wrong here?

gcampax · August 23, 2021, 8:29am

You can get an access token from the Settings page of Web Almond. Scroll down to “Authorized OAuth apps” and click “Issue an access token”. Pass the access token on the command-line, or use git config thingpedia.access-token $token to set it permanently.
You wrote down that you expected “Here are the top 5 news for today” but the agent replies with “Here are the top news for today” (without the number). I’m not sure what’s the correct expectation here. You can make the number optional with something like Here are the top ([0-9]+ )?news for today if either answer is acceptable.
“Killed” means killed by the Linux OOM killer. Increase the available ram and/or reduce the maximum heap size of the node process (set with memsize on the cmdline, defaults to about 9G).

Hope this helps!

Anggrio · August 26, 2021, 12:24pm

Hi Giovanni,
We tried to add the access token to the command but now it seems to cause a different error as follows:

We’re not sure what #_[name] is, but our device name is found in the json files.

For the training, we switched to a more powerful computer because it couldn’t run using the memsize limit. So now we tried to use the same command you gave us, make subdatasets=2 target_pruning_size=150 datadir and found the following error:

It seems to ask for slots of some kind? The device itself is working fine as we tested using the fixed scenario tests and manually using genie assistant

gcampax · August 26, 2021, 4:10pm

Yeah you need to add the device name to the #_[name] annotation on the class. Same as the description in #_[description], and some more metadata annotations as well. Look at the various examples in thingpedia-common-devices
This error is unexpected, and I’m not sure exactly how it occurs. I don’t understand why it’s trying to execute a program that is not ready to execute, and there might be a bug in Genie. Can you upload the manifest somewhere so I can take a look?

Anggrio · August 26, 2021, 5:20pm

Is it the one called #_[thingpedia_name] like this:

If so, which ones are required?

I have the file uploaded to a github as follows: Personal-Virtual-Assistant-for-News-Filtering/manifest.tt at master · TechLauncher-its-personal/Personal-Virtual-Assistant-for-News-Filtering · GitHub

Anggrio · August 28, 2021, 12:57pm

Update:
We managed to get upload-device working by adding the annotations as I posted previously, and after checking in the almond.stanford.edu it seems to work fine.

For the training problem, I think that the problem comes from it trying to call the mark_training_news _article command directly instead of through training_news_article. If you remember from the last semester, that function is only meant to receive the information obtained from running the training_news_article function, specifically a random article returned by the service. Is there a way to disable calling the function directly then? or at least remove it from the functions that the training will try to run?

gcampax · September 4, 2021, 5:57pm

Hi sorry for the delay. I think actually the issue is the call to training_news_article().

Specifically, the synthesis generates the phrase “mark a training news article” which is mapped to mark_training_news_article(). As you noted, mark_training_news_article needs an article to mark, so the synthesis injects a call to training_news_article() first (so the agent will retrieve a list of articles and let the user choose which one to mark). The issue is that training_news_article() itself needs a parameter (the type of article). The parameter is unspecified, so it should be $? in thingtalk. Instead, it’s empty, and this causes issues because the parameter is required.

The faulty code is this line: https://github.com/stanford-oval/genie-toolkit/blob/master/lib/templates/dialogue_acts/initial-request.ts#L145
Where it creates an Ast.Invocation with an empty list of Ast.InputParam.
Instead, for every required input Ast.ArgumentDef defined in the query it’s trying to call (which you can retrieve with query.iterateArguments()), it should create an Ast.InputParam with an Ast.UndefinedValue as the value.

Can you try making the code changes above, and seeing if that helps the synthesis?

Anggrio · September 4, 2021, 6:34pm

Hi Giovanni,

So from what you are saying, we need to change the initial-request.ts file in that line you mentioned?
Can you check my understanding of the problem?

So the problem is this line right:

newTable = new Ast.InvocationExpression(null,
                    new Ast.Invocation(null,
                        new Ast.DeviceSelector(null, query.class!.name, null, null),
                        query.name,
                        [],
                        query),
                    query);

and the Ast.InputParam that is empty is the [] that is passed to the Ast.Invocation constructor. Would the following be the correct solution:

invoInputParam = [];
for (ArgumentDef i in query.iterateArguments()) {
      invoInputParam.append(new Ast.InputParam(null, "topic", new Ast.UndefinedValue));
};
newTable = new Ast.InvocationExpression(null,
                    new Ast.Invocation(null,
                        new Ast.DeviceSelector(null, query.class!.name, null, null),
                        query.name,
                        invoInputParam,
                        query),
                    query);

Note that topic is the name of the required arguments in manifest.tt

gcampax · September 4, 2021, 6:37pm

“topic” should be the name of the argument you get during iteration, and also you need to check that the argument is both input and required (iterateArguments will also give you optional inputs and output parameters), but yeah that’s the idea.

Anggrio · September 4, 2021, 6:45pm

so maybe it should be

invoInputParam = [];
for (Ast.ArgumentDef i in query.iterateArguments()) {
      if (i.is_input && i.required)
         invoInputParam.append(new Ast.InputParam(null, i.name, new Ast.UndefinedValue));
}
newTable = new Ast.InvocationExpression(null,
                    new Ast.Invocation(null,
                        new Ast.DeviceSelector(null, query.class!.name, null, null),
                        query.name,
                        invoInputParam,
                        query),
                    query);

Does the append command exist for the list? or is there another method to add entries to the end of a list?
Also does the new Ast.UndefinedValue need to be new Ast.UndefinedValue()?

gcampax · September 4, 2021, 7:27pm

append should be push. Having () or not does not make a difference for a new expression in JS.
So yeah this looks good. Try it and see if it fixes your issue. If it does, would you mind sending a pull request?

Anggrio · September 5, 2021, 6:54am

Hi Giovanni,

Unfortunately it doesn’t seem to work, we received the same error below as before:

Here is what I ended up writing for the fix:

const invoInputParam = [];
for (const i of query.iterateArguments()) {
      if (i.is_input && i.required)
         invoInputParam.push(new Ast.InputParam(null, i.name, new Ast.UndefinedValue()));
}
newTable = new Ast.InvocationExpression(null,
                 new Ast.Invocation(null,
                       new Ast.DeviceSelector(null, query.class!.name, null, null),
                       query.name,
                       invoInputParam,
                       query),
                 query);

Note, we needed to pull and reinstall the genie-toolkit because we didn’t take the master branch previously. The command we used to run the generate training data is still the same, make subdatasets=2 target_pruning_size=150 datadir

gcampax · September 7, 2021, 3:35pm

That’s quite surprising, because the symptoms you have match exactly what that fix is supposed to address. Just to be sure, did you rebuild genie-toolkit (npx make is the quickest way) after making those changes?