Deploying Almond with custom UI and Natural Language Support

Anggrio · September 18, 2021, 12:52am

Yes, the laptop has a RTX 3080 GPU, so it should be quite powerful. I’m not sure what you meant with configuring CUDA for PyTorch, is that an Nvidia specific thing? If so, how do we configure it for PyTorch?
The laptop is using Windows and were doing the genie installation on an Ubuntu virtual machine that’s available from the Microsoft Store, so maybe that’s causing the slower training time?

Edit:
After doing some research, I found that CUDA and PyTorch are libraries that can be installed in certain NVidia GPUs. The laptop we are using seems to be capable of using CUDA, so do we need to install both of these libraries in the VM?

Edit 2:
We managed to install CUDA 11.1 and PyTorch in the VM. However, since we reinstalled the entire VM, we also redid the entire local genie installation. For some reason, we are now getting this error when running the make subdatasets=2 target_pruning_size=150 datadir command:

I don’t think it has ever asked for a template before, did something change in the genie repository? We copied everything needed and redid the installation steps according to our previous recorded steps.

gcampax · September 18, 2021, 5:58pm

I think you have a versioning problem in your reinstallation attempt. The latest version of genie does not have a --template flag for the generate-dialogs command.

Anggrio · September 18, 2021, 10:24pm

Ok, we’ll try the installation again. What about the CUDA and PyTorch? Do we just need to install them in the VM? Also, does genienlp already have a certain PyTorch version it wants to use? I think our 3080 machine can only use CUDA version 11.1 and above, so would that be a problem?

gcampax · September 18, 2021, 11:22pm

Yeah you will need to install CUDA inside the VM.

genienlp declares a dependency on PyTorch 1.9.* (aka the latest). It will install PyTorch the normal way (CUDA 10) if you pip install genienlp, but if you install PyTorch beforehand it will use the installed copy.
You can run:

import torch
print(torch.cuda.is_available())

to see if CUDA works on your machine.

Anggrio · September 18, 2021, 11:57pm

Yes we’ve run that command before and verified that we have both the latest PyTorch and CUDA 11.1 installed. Is there anything else that we need to do or can we just run the training command now?

gcampax · September 18, 2021, 11:58pm

You should be able to run the training now, and it should be several orders of magnitude faster!

Anggrio · September 19, 2021, 6:51am

Hi Giovanni, is there some new bug that the latest version of genie has introduced? We are installing using:

git clone https://github.com/stanford-oval/genie-toolkit
cd genie-toolkit
npm install
npm link

We are now getting the following error for running genie assistant:

and for the make datadir command:

Is there a fix coming soon or can we go to a certain release branch that doesn’t have this issue? We’re certain this is a very recent issue with the master branch because just last week we tried and managed to reinstall it using the same command with no problems.

gcampax · September 19, 2021, 7:43pm

Apologies for the breakage with genie assistant, it’s due to a recent change to handle timezones better. I just pushed a fix to the master branch.

As for the make datadir error, make sure inside the its-personal-devices it loads the genie-toolkit git clone. Use npm link genie-toolkit to ensure that.

Anggrio · September 20, 2021, 2:59am

It seems that we are somehow running out of memory for the CUDA as follows:

Is there a way to fix this? I found some suggestions online about reducing the batch size for the job but I’m not sure how to do that here. I tried to reduce the save and log frequency but it doesn’t seem to affect the problem in any way. I also tried running torch.cuda.empty_cache() but it doesn’t seem to do anything. Its really strange that the bytes free is entirely 0.

Edit:
We tried lowering the values for --val_batch_size and --train_batch_tokens but it doesn’t seem to solve the issue. While the amount of memory it asks is lowered from the initial 198.00 MiB to 20.00 MiB, its still gives us the CUDA out of memory error.

Edit2:
We tried lowering --train_batch_tokens by itself to 10 and it causes a different error as below:

So even though it doesn’t go above the CUDA memory, the batch size seems to be too small for the training example it uses.

gcampax · September 20, 2021, 8:31pm

Yeah you need to lower the batch size if you get CUDA errors, and also make sure you don’t have other processes using the GPU because that will consume memory. nvidia-smi will tell you how much GPU memory is currently in use.
The batch size is in tokens, and it prints the input length right above the line that says “Begin training”. You need a batch size that can fit at least one full example, so at least 150?

Anggrio · September 20, 2021, 9:57pm

By batch size do you mean --train_batch_tokens? I think we tried lowering that to around 100 to 200 but both times it triggers the out of memory error. What we’re most confused about is that the error states that we have 8 GiB total GPU memory, and around 5 GiB allocated, and only need a couple more MiBs, but the bytes free is always 0. When lowering the batch_tokens, the MiB gets smaller as well to around 20 MiB, and the allocated GiB increased, but the bytes free remains 0.
So it seems that even though lowering the batch size would decrease the MiB it tries to allocate, as well as increase the GIB that it has already allocated, we still cannot have enough free bytes.

gcampax · September 21, 2021, 12:09am

Yeah that’s what I’m referring to.
It’s very surprising that you don’t have enough memory to even run one example. Are you sure nothing else is running on that GPU?
PyTorch memory allocation will be confused if there are multiple processes using the GPU, and the error message will be wrong.

Also, let me ask someone else who might know the answer.
@Mehrad @s-jse would you know why a low batch size doesn’t seem to work?

Anggrio · September 21, 2021, 12:30am

Yes, we’ve tried closing every other application and running only that in the terminal but it still gets the out of memory error.

Edit:
Here is the output when running with --train_batch_tokens=150:

Edit2:
Here is the output when running nvidia-smi -q:

Mehrad · September 21, 2021, 12:49am

I’m afraid 8G gpu memory is not enough for training (IIRC for bert-large which has ~330M params huggingface recommended at least 12G memory. bart-large has ~400M params so…). You can run inference on 8G with bart-large but training takes up almost twice the memory since you need to keep gradients too.
I suggest trying “facebook/bart-base” instead of bart-large.

Anggrio · September 21, 2021, 12:52am

Hi Mehrad,
Can you explain a bit more about how to choose facebook/bart-base instead of bart-large when running the training command?

Mehrad · September 21, 2021, 1:04am

The argument to change is “–pretrained_model” for training

Anggrio · September 21, 2021, 1:15am

It seems to run for now:

Thanks for your help Mehrad

Edit:
How powerful do we need the machine that runs genie server to be? Does it need to have the same GPU as the one running the training?

Mehrad · September 21, 2021, 3:13am

As I mentioned, GPU memory usages for training is almost double the inference. So you can run server on a GPU with ~half the memory (or run it with a higher batch size on the same GPU as training).

Anggrio · September 21, 2021, 9:25am

We’ve completed the training successfully and tested it using genie assistant and scenario tests. Both seem to work fine so it should be good enough. Next we want to know what’s the correct genie server command to run. So far we have it as follows:

So other than --nlu-model and --thingpedia, what else do we need to set to get it to be accessible by the custom web almond?

gcampax · September 21, 2021, 3:22pm

That should be enough, yeah. You can try it by making requests to the server when it comes up. Port defaults to 8400.