Almond configuration

a.campetella · February 14, 2020, 10:42am

HI,
I’ve just installed Almond cloud on my machine and now I must perform the right configuration for my purpose.
My goal is to make a chastomized Thingpedia, in order to add/delete devices independently.
The official procedure, in step 3, says to modify the config.js file appropriately. First step is to set the value of the variable DATABASE_URL.
Therefore, I suppose it needs to create a database before. Is it right?
Can I create it with any user o it’s better use root ?
What should database contain?
Do I have to use my own NLP server whit custom thingpedia?

Thank’s a lot.

gcampax · February 23, 2020, 10:03pm

Hi!

Yes you should create a database ahead of time. Make sure to use a supported database engine (mariadb >= 10.2) for best results.
Any user will do, as long as the user has DDL (create, drop, alter) and DDL (insert, select, update, delete) permissions on the full database.
The database will be populated during bootstrap. It will contain the users who create an account on Almond, the Thingpedia devices, the dataset, the metadata about the NLP models, etc.
Yes you must run your own NLP server if you use a customized Thingpedia.

a.campetella · February 24, 2020, 10:27am

Hi Giovanni,
thank’s for your support.
I have another question about the install/cfg my own almond.
In the procedure there’s the installation of decaNLP and Almond tokenizer.
Where can I find the guide of NLP server and Almond tokenizer ?
How can I start and use the NLP server ?
What’s the function of Almond tokenizer ?

gcampax · February 25, 2020, 5:56pm

Almond Tokenizer is at https://github.com/stanford-oval/almond-tokenizer
It’s purpose is to preprocess inputs from the user, to identify numbers, dates, times, etc. It uses Stanford CoreNLP for this purpose.
The easiest way to run it is through docker, as in:

docker run -p 8888:8888 stanfordoval/almond-tokenizer

The NLP server is part of Almond Cloud. You run it as:

almond-cloud run-nlp --port ...

or if running inside a git checkout:

node ./main.js run-nlp --port ...

Like other parts of Almond Cloud, we offer systemd .service files and example k8s manifests that might be helpful to get started, but site-specific customization is often needed.

In particular, for the NLP server, you need to specify the URL (NL_SERVER_URL in the config), and the path to the directory where the models are stored (NL_MODEL_DIR). The latter can be a local path or an Amazon S3 URL.

If you plan to train your own model, the model directory needs to be accessible from the machine running the training server. This can be accomplished using S3, using a network file system like NFS or SMB, or by setting up password-less SSH/rsync and using a file:// URL with the hostname of the NLP machine as the model directory.
The training server also needs access to the FILE_STORAGE_DIR path (or S3 URL), which is shared with the frontend server. This one cannot use rsync, only S3 or NFS/SMB.
(If you’re on AWS, S3 is by far the easiest option. NFS (Amazon EFS) is also easy but quite a bit more expensive. On Azure, SMB (Azure Files) is the best option. Everywhere else, NFS is easiest if you have used that before.)

If you don’t train your own model, you might survive by downloading a pre-trained model from https://almond.stanford.edu/luinet/models
(log in to see the “Download” button).
This is ok only if your Thingpedia does not diverge from the public one too much. Otherwise your accuracy will be very poor as the NLP server will discard most of the neural network predictions.