new datasets
This commit is contained in:
parent
411ccf9bcd
commit
2b5d923f63
566 changed files with 142473 additions and 0 deletions
24
Datasets/evaluation_examples/README.md
Normal file
24
Datasets/evaluation_examples/README.md
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
# Evaluation examples
|
||||
|
||||
Here we put the data examples to benchmark the ability of agents when interacting with GUI.
|
||||
The examples are stored in `./examples` where each data item formatted as:
|
||||
|
||||
```
|
||||
{
|
||||
"id": "uid", # unique id
|
||||
"snapshot": "snapshot_id", # the snapshot id of the environment, with some data already there and apps already opened, or just desktop
|
||||
"instruction": "natural_language_instruction", # the natural language instruction of the task, what we want the agent to do
|
||||
"source": "website_url", # where we know this example, some forum, or some website, or some paper
|
||||
"config": {xxx}, # the scripts to setup the donwload and open files actions, as the initial state of a task
|
||||
# (coming in next project) "trajectory": "trajectory_directory", # the trajectory directory, which contains the action sequence file, the screenshots and the recording video
|
||||
"related_apps": ["app1", "app2", ...], # the related apps, which are opened during the task
|
||||
"evaluator": "evaluation_dir", # the directory of the evaluator, which contains the evaluation script for this example
|
||||
…
|
||||
}
|
||||
```
|
||||
|
||||
The `./trajectories` file contains the annotated trajectories for each data item in `./examples` for finishing the task.
|
||||
|
||||
For now, it is under construction, and only tested on Windows 10. Please:
|
||||
- Modify the path accordingly to run the evaluation;
|
||||
- Remind us if some parts are overfit to our environment.
|
||||
Loading…
Add table
Add a link
Reference in a new issue