Device that lets user choose if an article is relevant or not

Anggrio · March 30, 2021, 9:25pm

Hi,

I want to create a device that lets users train a machine learning algorithm to find news articles of a certain topic. I have created a device in thingpedia that will do a simplified version of this. Specifically, the current implementation is already connected to another service that will get news articles from various RSS feeds (the get part of the device).

I want to create a different command that will allow the user to choose if an article is relevant to a topic or not. Specifically, I want the program flow to go as follows:
User asks to train a certain topic
System returns a random news article from a remote service
System asks the user to determine if the article is relevant to the topic or not by choosing yes or no (preferably by pressing a button)
User chooses yes or no
System sends the article alongside the user’s response to the remote service

How should I structure the do or action command to do this (specifically the index.js and manifest.tt)? Is it even possible for the device to obtain the article first before asking the user to choose?

gcampax · March 31, 2021, 6:09pm

This is a great idea for an app, but I am afraid it is not possible to implement it exactly in the current version of Almond.

The closest you can get, I think, is along the lines of:

U: find me a sports article
A: I found XXXX, it is a sports article by YYYY published on ZZZZ
U: mark this as relevant / mark this as not relevant

or perhaps:

U: find me a sports article
A: I found XXXX, it is a sports article by YYYY published on ZZZZ. Would you like to mark it as relevant?
U: yes
A: is it relevant?
U: yes / no

(I don’t remember under what conditions the agent offers an action the results)

In the manifest, this would be:

class @com.foo {
   entity com.foo:article;

   query article(in req topic : Enum(sports,politics,international ...),
                 out id : Entity(com.foo:article),
                 out author : String,
                 ...);

   action mark_relevant(in req article: Entity(com.foo:article),
                        in req relevant : Boolean);
}

Unfortunately, the current agent has no mechanism to indicate that after the result is shown to the user, a new action must be initiated.
(This is something we’re actively thinking about for a later release. If you’re interested, a very rough draft of our design is at A Unified Control Language For Genie Dialogue Agents | Open Virtual Assistant Lab and i would love to hear feedback on that)

In any case, you need to use the latest Almond/Genie models to achieve this, which means you need to train a custom dialogue model and deploy it on your server. You will probably want to use the skeleton code at GitHub - stanford-oval/thingpedia-common-devices: Thingpedia interface code for commonly used devices to build the skill and train the model.

Anggrio · March 31, 2021, 7:57pm

I’ve already built a device in Thingpedia with the id org.itspersonal.newsfilter (named simple news filter) that connects to a remote service that will handle the news filter part, so the relationship with the Almond itself is just passing articles. I checked that it can return articles similar to how the other news devices does it. Your second example of the program flow is good enough I think? Though the initial command would be something like do training for "tech" news articles, and the Almond/service will return a random article (it doesn’t guess if its relevant/irrelevant), so the query would be Is this article relevant?

Currently, I already figured out how to pass the user’s input as the topic to the remote service, using the “” marker during input. The service currently returns a list of articles with their title, description, link, and published date. The dataset that I use to train the filter only needs the article’s title, so we are satisfied with only returning the title and relevant/irrelevant label after the user’s response back to the remote service. We currently only plan to implement filters for two topics, technology and sports (because of time constraints on when the project is due). The filter is currently based on spaCy’s textcat project.

gcampax · March 31, 2021, 8:26pm

Yeah if you want either the first or the second flow, you need to write the manifest as I gave it to you. You need to use IDs of entity types to link the query and the action, and you need to make “get article” a query and “mark relevant” an action.
Do not look at existing news devices at thingpedia.stanford.edu because those are in Almond 1.0 format and do not support any multi-turn interaction. Look at SmartNews in dev.almond.stanford.edu instead for the query part.

Anggrio · March 31, 2021, 8:48pm

I’m looking through the index.js file now and the main thing I see is that ID is set by the remote service? and the article format includes:
id, title, date, source, author, and url

So the mark relevant action will pass the id back to the service, where the article is kept with the id as its key/index? Is my understanding correct?

gcampax · March 31, 2021, 8:51pm

Yes pretty much. The ID will be an entity, so it will have both a value which is an opaque string meaningful to your service, and a display string shown as name to the user.

Anggrio · March 31, 2021, 8:54pm

So the remote service would have to store the article’s information with perhaps a key in a dictionary then use that key as the id to return it initially to Almond and also to access it again after the user’s response?

gcampax · March 31, 2021, 8:55pm

Well, the remote service will have some database of articles, right? So yeah you should assign a unique ID to each article.

Anggrio · March 31, 2021, 9:03pm

Hmm, since we are storing a large dataset with around 4000 entries, maybe a different database would work best? We only need to store the article once anyway since its added to the training dataset no matter what the user’s response is (relevant/irrelevant).

Can you explain a bit more about why/how an entity would be able to be passed from the first query to the action? I think if we can just send back the article’s title alongside a flag to mark if its relevant/irrelevant it would also work fine, which I don’t see why we can’t do if we just use the article’s title+flag as the id.

gcampax · March 31, 2021, 9:05pm

The requirement to use an entity is just a limitation of the current dialogue model (see the issue at Add support for non-ID parameter passing · Issue #303 · stanford-oval/genie-toolkit · GitHub)

The ID of the entity can be anything, and can even be the same as the title. I’m just assuming though that you’d want some unique ID for the article regardless, because titles need not be unique.

Anggrio · March 31, 2021, 9:10pm

No, our dataset only contains the article’s titles, which we use as the input for the filter. Is there a limit to the id size? I’m concerned about articles with long titles

gcampax · March 31, 2021, 9:11pm

There is no limit to ID size, no.

Anggrio · March 31, 2021, 9:16pm

Cool, then we’ll just use the title as the id, then send it back to the service alongside a flag for relevant or not.

I noticed that the article format is a little different from what I already have. However, the one I’m using (title, description, link, updated) is showing up fine in the Almond. Should I change the format to follow the smartnews one? I would only use the four elements here though, so no author/source.

gcampax · March 31, 2021, 9:43pm

Apart from ID, use whatever other output parameters are meaningful for your article database. The SmartNews one is just an example.

Anggrio · March 31, 2021, 9:47pm

Ok, then I’ll just stick with what’s already working then. I don’t see in the index.js file how the ID can be passed between query and action, can you explain a bit more about the syntax?

gcampax · March 31, 2021, 9:51pm

You declare in the manifest a query whose id output parameter has a certain entity type, and you declare in the manifest an action which has a parameter of that type. The agent matches those two by type.
You then handle the input parameter in the action.

An implementation example is Spotify’s play action.

Anggrio · March 31, 2021, 10:00pm

Ah, in your first example earlier in the thread, its the query with out id : Entity(com.foo:article) that is paired with the action with in req article: Entity(com.foo:article) right?

gcampax · March 31, 2021, 10:00pm

Yes, that is correct.

Anggrio · March 31, 2021, 10:06pm

Ok, I’ll have to try this out later then. I’ll post again if I run into any issues.

Anggrio · April 13, 2021, 1:52am

Hi Giovanni,

I am unable to request a developer account in the dev.almond.stanford.edu website. The provided link https://dev.almond.stanford.edu/user/request-developer shows the following page:

I tried using entity in the normal website but it seems that it doesn’t work there? The manifest.tt input field throws an error when it finds the entity declaration: