Genie toolkit in docker

apsisahuman · May 18, 2021, 12:08pm

Hi,
I am running genie in a docker container and setting up all the genie dependencies.
I can call the genie command in the terminal. However, when I train using pldi19-dataset, I get an error, can you please help me?: I really don’t know how to start debugging this.

root@bca96f0e5f78:/projects/downloads/pldi19-dataset# genie train --datadir /projects/downloads/pldi19-dataset/ --outputdir /projects/downloads/pldi19-dataset/output --workdir /projects/wd --debug

‘genienlp’ ‘train’ ‘–train_tasks’ ‘almond’ ‘–save’ ‘/projects/wd/model’ ‘–cache’ ‘/projects/wd/cache’ ‘–data’ ‘/projects/wd’ ‘–preserve_case’ ‘–no_commit’ ‘–embeddings’ ‘/projects/wd/embeddings’ ‘–train_iterations’ ‘100000’ ‘–save_every’ ‘2000’ ‘–log_every’ ‘500’ ‘–val_every’ ‘1000’
Traceback (most recent call last):
File “/usr/local/bin/genienlp”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.8/dist-packages/genienlp/main.py”, line 79, in main
subcommands[argv.subcommand]2
File “/usr/local/lib/python3.8/dist-packages/genienlp/train.py”, line 576, in main
args = arguments.post_parse_train_specific(args)
File “/usr/local/lib/python3.8/dist-packages/genienlp/arguments.py”, line 372, in post_parse_train_specific
if ‘mbart’ in args.pretrained_model:
TypeError: argument of type ‘NoneType’ is not iterable
/repos/genie-toolkit/dist/tool/genie.js:40
process.on(‘unhandledRejection’, (up) => { throw up; });

Dockerfile:
FROM ubuntu:groovy

MAINTAINER allmin

RUN mkdir /projects

ENV TZ=Europe/Amsterdam
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN cat /etc/resolv.conf
RUN apt-get install gpgv
RUN apt-get update
RUN apt-get -y upgrade
RUN apt-get update
RUN apt-get -y install curl dirmngr
RUN apt-get -y install apt-transport-https lsb-release ca-certificates build-essential make g++ gcc make graphicsmagick gettext zip unzip git-all --fix-missing
RUN curl -sL https://deb.nodesource.com/setup_15.x | bash -
RUN apt -y install nodejs
RUN node -v
RUN npm -v

WORKDIR /projects
RUN cd /projects
RUN npm init -y
COPY ./ /projects
RUN apt-get update
&& apt-get install -y python3-pip python-dev python3-dev
&& cd /usr/local/bin
&& ln -s /usr/bin/python2 python
&& pip3 install --upgrade pip
ENV PATH $PATH:/root/.local/bin

WORKDIR /usr/src/app
COPY requirements.txt /usr/src/app/
RUN pip3 install -r requirements.txt --user
COPY . /usr/src/app

RUN mkdir /repos
WORKDIR /repos
RUN echo “installig repos”
RUN cd /repos/
RUN git clone --single-branch --branch v2.0.0-beta.1 thingtalk.git
&& cd thingtalk \
&& git reset --hard
&& npm init -y
&& npm ci
&& npm link
RUN cd /repos/
RUN git clone --single-branch --branch v2.9.0-beta.1 /thingpedia-api
&& cd thingpedia-api \
&& git reset --hard
&& npm init -y
&& npm ci
&& npm link
RUN cd /repos/
RUN git clone --single-branch --branch v0.2.1 thingtalk-units.git
RUN cd /repos/
RUN git clone --single-branch --branch v0.8.0-beta.1 genie-toolkit
&& cd genie-toolkit
&& git reset --hard
&& EXPO_DEBUG=true
&& npm config set package-lock false
&& npm install --no-fund
&& npm link
&& genie -h

RUN mkdir /projects/downloads
RUN cd /projects/downloads

requirements.txt:
numpy==1.19.4
scipy==1.5.4
scikit-learn==0.23.2
joblib==0.17.0
matplotlib
genienlp>=0.6.0a4
transformers==4.1.1
pyats

gcampax · May 18, 2021, 4:48pm

Hi!

A couple of things to note:

if you’re trying to reproduce the PLDI19 result, you’ll probably want to use the same versions of all repositories as indicated in artifact. With the latest version you won’t get the same results as before because the models changed a lot, and the model used in PLDI was removed entirely. Even if you only care about the dataset, I think the Thingpedia snapshot included the dataset, which is necessary to typecheck the ThingTalk code and evaluate correctly, needs to be upgraded to the latest format.
I’m not sure why you use such a complicated script to install genie-toolkit dependency. npm ci is sufficient, all dependencies are already specified in the package-lock.json
Node 15.* is known to cause issues at the moment, and you should use node 12 or 14.

In any case, we don’t typically use the genie train command in our workflow, and it’s possible that the default command line flags passed to the underlying genienlp train are not up-to-date. We usually call genienlp train directly with the right arguments. See the genienlp README for the right flags to use for training.

We also have dockerfiles and Kubeflow pipelines we use in the genie-k8s repository. Maybe that will be helpful to you!

apsisahuman · May 19, 2021, 12:00pm

Thank you,
I reran a new container with node 14 and an install of genie-toolkit v0.8.0-beta.1 using npm ci.
I am following the documentation genie-toolkit/tutorial-basic.md at v0.8.0-beta.1 · stanford-oval/genie-toolkit · GitHub. The generate command terminates unexpectedly. can you please suggest what is going wrong?

root@760a5b584cc3:/projects/downloads/experiment_dir# node --max_old_space_size=40000 which genie generate --locale en-US --template /repos/genie-toolkit/languages-dist/thingtalk/en/thingtalk.genie --thingpedia thingpedia.tt --entities entities.json --dataset dataset.tt -o synthesized.tsv --debug
Loaded 52 devices
Loaded 519 templates
— DEPTH 0
depth 0 took 0.14 seconds

— DEPTH 1
depth 1 took 95.93 seconds

— DEPTH 2
expand NT[thingpedia_complete_query] → images from bing matching ${p_query} with size ${p_width} x ${p_height} (NT[p_query : constant_or_undefined], NT[p_width : constant_or_undefined], NT[p_height : constant_or_undefined]) : took 43.87 seconds using sampling
expand NT[thingpedia_complete_query] → images from bing matching ${p_query} larger than ${p_width} x ${p_height} (NT[p_query : constant_or_undefined], NT[p_width : constant_or_undefined], NT[p_height : constant_or_undefined]) : took 50.54 seconds using sampling
expand NT[thingpedia_complete_query] → images from bing matching ${p_query} larger than ${p_width} x ${p_height} in either dimension (NT[p_query : constant_or_undefined], NT[p_width : constant_or_undefined], NT[p_height : constant_or_undefined]) : took 53.69 seconds using sampling
expand NT[thingpedia_complete_query] → images from bing matching ${p_query} smaller than ${p_width} x ${p_height} (NT[p_query : constant_or_undefined], NT[p_width : constant_or_undefined], NT[p_height : constant_or_undefined]) : took 55.53 seconds using sampling
expand NT[thingpedia_complete_action] → set the temperature between ${p_low:const} and ${p_high:const} on my ${p_name:const} (NT[p_low : constant_Measure_C], NT[p_high : constant_Measure_C], NT[p_name : constant_String]) : took 25.92 seconds using enumeration
expand NT[thingpedia_complete_action] → set the low temperature to ${p_low} on my ${p_name:const} , and the high to ${p_high} (NT[p_low : constant_or_undefined], NT[p_name : constant_String], NT[p_high : constant_or_undefined]) : took 36.89 seconds using sampling
expand NT[query_coref_same_sentence] → images from bing matching ${p_query} with size ${p_width} x ${p_height} (NT[p_query : the_out_param_Any], NT[p_width : constant_or_undefined], NT[p_height : constant_or_undefined]) : took 18.98 seconds using sampling
expand NT[query_coref_same_sentence] → images from bing matching ${p_query} with size ${p_width} x ${p_height} (NT[p_query : constant_or_undefined], NT[p_width : the_base_table], NT[p_height : constant_or_undefined]) : took 52.42 seconds using sampling
expand NT[query_coref_same_sentence] → images from bing matching ${p_query} with size ${p_width} x ${p_height} (NT[p_query : constant_or_undefined], NT[p_width : the_out_param_Any], NT[p_height : constant_or_undefined]) : took 27.21 seconds using sampling
expand NT[query_coref_same_sentence] → images from bing matching ${p_query} with size ${p_width} x ${p_height} (NT[p_query : constant_or_undefined], NT[p_width : constant_or_undefined], NT[p_height : the_base_table]) : took 57.09 seconds using sampling
expand NT[query_coref_same_sentence] → images from bing matching ${p_query} with size ${p_width} x ${p_height} (NT[p_query : constant_or_undefined], NT[p_width : constant_or_undefined], NT[p_height : the_out_param_Any]) : took 15.10 seconds using sampling
expand NT[query_coref_same_sentence] → images from bing matching ${p_query} larger than ${p_width} x ${p_height} (NT[p_query : the_out_param_Any], NT[p_width : constant_or_undefined], NT[p_height : constant_or_undefined]) : took 23.46 seconds using sampling
expand NT[query_coref_same_sentence] → images from bing matching ${p_query} larger than ${p_width} x ${p_height} (NT[p_query : constant_or_undefined], NT[p_width : the_base_table], NT[p_height : constant_or_undefined]) : took 61.61 seconds using sampling
expand NT[query_coref_same_sentence] → images from bing matching ${p_query} larger than ${p_width} x ${p_height} (NT[p_query : constant_or_undefined], NT[p_width : the_out_param_Any], NT[p_height : constant_or_undefined]) : took 16.05 seconds using sampling
terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc
Aborted (core dumped)

Here I show the synthesized.tsv:
root@760a5b584cc3:/projects/downloads/experiment_dir# cat synthesized.tsv
S1000000000 check if the indoor illuminance is high . @org.thingpedia.iot.illuminance . illuminance ( ) ;
root@760a5b584cc3:/projects/downloads/experiment_dir#

gcampax · May 19, 2021, 3:39pm

This is the error message. It says that you ran out of memory. You’re passing a memory limit of 40G on the cmdline - that is probably too high. You should set the memory limit depending on the available memory on your machine.

apsisahuman · May 19, 2021, 10:14pm

I have a 60GB machine.

I tried reducing that parameter to 20GB, then I get a bad alloc at the beginning of DEPTH 3.

Then I tried reducing to 8000MB and 16000MB, I get a “JavaScript heap out of memory” error.

root@760a5b584cc3:/projects/downloads/experiment_dir# node --max_old_space_size=8000 `which genie` generate --locale en-US --template /repos/genie-toolkit/languages-dist/thingtalk/en/thingtalk.genie --thingpedia thingpedia.tt --entities entities.json --dataset dataset.tt -o synthesized.tsv --debug
Loaded 52 devices
Loaded 519 templates
--- DEPTH 0
depth 0 took 0.14 seconds

--- DEPTH 1
depth 1 took 95.31 seconds

--- DEPTH 2
expand NT[thingpedia_complete_query] -> images from bing matching ${p_query} with size ${p_width} x ${p_height} (NT[p_query : constant_or_undefined], NT[p_width : constant_or_undefined], NT[p_height : constant_or_undefined]) : took 46.51 seconds using sampling

<--- Last few GCs --->

[812:0x5a96c40]   845185 ms: Mark-sweep (reduce) 7941.3 (8010.0) -> 7940.5 (8011.0) MB, 5675.0 / 0.0 ms  (average mu = 0.122, current mu = 0.001) allocation failure scavenge might not succeed
[812:0x5a96c40]   850816 ms: Mark-sweep (reduce) 7941.6 (8013.0) -> 7940.9 (8013.2) MB, 5628.6 / 0.0 ms  (average mu = 0.066, current mu = 0.001) allocation failure scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xa222f0 node::Abort() [node]
 2: 0x96411f node::FatalError(char const*, char const*) [node]
 3: 0xb97f1e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb98297 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd52fd5  [node]
 6: 0xd53b5f  [node]
 7: 0xd61beb v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 8: 0xd657ac v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0xd2af4d v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
10: 0xd271a9 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawArray(int, v8::internal::AllocationType) [node]
11: 0xd27264 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [node]
12: 0xd30ded v8::internal::Factory::NewJSArrayStorage(v8::internal::ElementsKind, int, v8::internal::ArrayStorageAllocationMode) [node]
13: 0xd3536e v8::internal::Factory::NewJSArray(v8::internal::ElementsKind, int, int, v8::internal::ArrayStorageAllocationMode, v8::internal::AllocationType) [node]
14: 0xec29e0 v8::internal::ElementsAccessor::Concat(v8::internal::Isolate*, v8::internal::BuiltinArguments*, unsigned int, unsigned int) [node]
15: 0xc06558  [node]
16: 0xc0cfdf  [node]
17: 0xc0e846 v8::internal::Builtin_ArrayConcat(int, unsigned long*, v8::internal::Isolate*) [node]
18: 0x1423359  [node]
Aborted (core dumped)

gcampax · May 19, 2021, 10:21pm

I think the default pruning size might be off. Try adding --target_pruning_size 500 to the end of the command, that should work with about 10G of RAM and should give you a full sized dataset.

apsisahuman · May 20, 2021, 8:21am

Amazing!. That fixed the issue. Thank you! this is the command that worked based on your inputs:
node --max_old_space_size=10000 which genie generate --locale en-US --template /repos/genie-toolkit/languages-dist/thingtalk/en/thingtalk.genie --thingpedia thingpedia.tt --entities entities.json --dataset dataset.tt -o synthesized.tsv –target-pruning-size 500 --debug