Genie Community Forum

Deploying Almond with custom UI and Natural Language Support

Yes, the laptop has a RTX 3080 GPU, so it should be quite powerful. I’m not sure what you meant with configuring CUDA for PyTorch, is that an Nvidia specific thing? If so, how do we configure it for PyTorch?
The laptop is using Windows and were doing the genie installation on an Ubuntu virtual machine that’s available from the Microsoft Store, so maybe that’s causing the slower training time?

Edit:
After doing some research, I found that CUDA and PyTorch are libraries that can be installed in certain NVidia GPUs. The laptop we are using seems to be capable of using CUDA, so do we need to install both of these libraries in the VM?

Edit 2:
We managed to install CUDA 11.1 and PyTorch in the VM. However, since we reinstalled the entire VM, we also redid the entire local genie installation. For some reason, we are now getting this error when running the make subdatasets=2 target_pruning_size=150 datadir command:


I don’t think it has ever asked for a template before, did something change in the genie repository? We copied everything needed and redid the installation steps according to our previous recorded steps.

I think you have a versioning problem in your reinstallation attempt. The latest version of genie does not have a --template flag for the generate-dialogs command.

Ok, we’ll try the installation again. What about the CUDA and PyTorch? Do we just need to install them in the VM? Also, does genienlp already have a certain PyTorch version it wants to use? I think our 3080 machine can only use CUDA version 11.1 and above, so would that be a problem?

Yeah you will need to install CUDA inside the VM.

genienlp declares a dependency on PyTorch 1.9.* (aka the latest). It will install PyTorch the normal way (CUDA 10) if you pip install genienlp, but if you install PyTorch beforehand it will use the installed copy.
You can run:

import torch
print(torch.cuda.is_available())

to see if CUDA works on your machine.

Yes we’ve run that command before and verified that we have both the latest PyTorch and CUDA 11.1 installed. Is there anything else that we need to do or can we just run the training command now?

You should be able to run the training now, and it should be several orders of magnitude faster!

Hi Giovanni, is there some new bug that the latest version of genie has introduced? We are installing using:

git clone https://github.com/stanford-oval/genie-toolkit
cd genie-toolkit
npm install
npm link

We are now getting the following error for running genie assistant:


and for the make datadir command:

Is there a fix coming soon or can we go to a certain release branch that doesn’t have this issue? We’re certain this is a very recent issue with the master branch because just last week we tried and managed to reinstall it using the same command with no problems.

Apologies for the breakage with genie assistant, it’s due to a recent change to handle timezones better. I just pushed a fix to the master branch.

As for the make datadir error, make sure inside the its-personal-devices it loads the genie-toolkit git clone. Use npm link genie-toolkit to ensure that.

It seems that we are somehow running out of memory for the CUDA as follows:

Is there a way to fix this? I found some suggestions online about reducing the batch size for the job but I’m not sure how to do that here. I tried to reduce the save and log frequency but it doesn’t seem to affect the problem in any way. I also tried running torch.cuda.empty_cache() but it doesn’t seem to do anything. Its really strange that the bytes free is entirely 0.

Edit:
We tried lowering the values for --val_batch_size and --train_batch_tokens but it doesn’t seem to solve the issue. While the amount of memory it asks is lowered from the initial 198.00 MiB to 20.00 MiB, its still gives us the CUDA out of memory error.

Edit2:
We tried lowering --train_batch_tokens by itself to 10 and it causes a different error as below:



So even though it doesn’t go above the CUDA memory, the batch size seems to be too small for the training example it uses.

Yeah you need to lower the batch size if you get CUDA errors, and also make sure you don’t have other processes using the GPU because that will consume memory. nvidia-smi will tell you how much GPU memory is currently in use.
The batch size is in tokens, and it prints the input length right above the line that says “Begin training”. You need a batch size that can fit at least one full example, so at least 150?

By batch size do you mean --train_batch_tokens? I think we tried lowering that to around 100 to 200 but both times it triggers the out of memory error. What we’re most confused about is that the error states that we have 8 GiB total GPU memory, and around 5 GiB allocated, and only need a couple more MiBs, but the bytes free is always 0. When lowering the batch_tokens, the MiB gets smaller as well to around 20 MiB, and the allocated GiB increased, but the bytes free remains 0.
So it seems that even though lowering the batch size would decrease the MiB it tries to allocate, as well as increase the GIB that it has already allocated, we still cannot have enough free bytes.

Yeah that’s what I’m referring to.
It’s very surprising that you don’t have enough memory to even run one example. Are you sure nothing else is running on that GPU?
PyTorch memory allocation will be confused if there are multiple processes using the GPU, and the error message will be wrong.

Also, let me ask someone else who might know the answer.
@Mehrad @s-jse would you know why a low batch size doesn’t seem to work?

Yes, we’ve tried closing every other application and running only that in the terminal but it still gets the out of memory error.

Edit:
Here is the output when running with --train_batch_tokens=150:

Edit2:
Here is the output when running nvidia-smi -q:


image

I’m afraid 8G gpu memory is not enough for training (IIRC for bert-large which has ~330M params huggingface recommended at least 12G memory. bart-large has ~400M params so…). You can run inference on 8G with bart-large but training takes up almost twice the memory since you need to keep gradients too.
I suggest trying “facebook/bart-base” instead of bart-large.

Hi Mehrad,
Can you explain a bit more about how to choose facebook/bart-base instead of bart-large when running the training command?

The argument to change is “–pretrained_model” for training

It seems to run for now:


Thanks for your help Mehrad

Edit:
How powerful do we need the machine that runs genie server to be? Does it need to have the same GPU as the one running the training?

As I mentioned, GPU memory usages for training is almost double the inference. So you can run server on a GPU with ~half the memory (or run it with a higher batch size on the same GPU as training).

We’ve completed the training successfully and tested it using genie assistant and scenario tests. Both seem to work fine so it should be good enough. Next we want to know what’s the correct genie server command to run. So far we have it as follows:


So other than --nlu-model and --thingpedia, what else do we need to set to get it to be accessible by the custom web almond?

That should be enough, yeah. You can try it by making requests to the server when it comes up. Port defaults to 8400.