Alex_Dubchak_comi...
opened issue Alex_Dubchak_comi.../Quality_evaluation#10
2026-04-29 07:49:50 +00:00
Launch the tests again using new llm-as-a-judge. Deadline: 3rd May, 11:59PM.
Alex_Dubchak_comi...
opened issue Alex_Dubchak_comi.../Quality_evaluation#9
2026-04-29 07:00:19 +00:00
Get an access to VPS, deploy all the test-pipeline. Deadline: 3rd May, 23:59.
Alex_Dubchak_comi...
closed issue Alex_Dubchak_comi.../Quality_evaluation#8
2026-04-29 06:13:48 +00:00
Test already deployed Browser-Use system. Deadline: 7th April, 2026, 11:59 PM.
Alex_Dubchak_comi...
pushed to main at Alex_Dubchak_comi.../Quality_evaluation
2026-04-23 20:31:08 +00:00
Alex_Dubchak_comi...
pushed to main at Alex_Dubchak_comi.../Quality_evaluation
2026-04-22 22:21:52 +00:00
Alex_Dubchak_comi...
pushed to main at Alex_Dubchak_comi.../Quality_evaluation
2026-04-22 21:05:06 +00:00
Alex_Dubchak_comi...
opened issue Alex_Dubchak_comi.../Quality_evaluation#8
2026-04-03 12:42:40 +00:00
Test Browser-Use deployed system. Deadline: 7th April, 11:59 PM.
Alex_Dubchak_comi...
closed issue Alex_Dubchak_comi.../Quality_evaluation#7
2026-04-03 12:40:16 +00:00
Launch browser-use system. Deadline: already was completed by the time of task creation
Alex_Dubchak_comi...
opened issue Alex_Dubchak_comi.../Quality_evaluation#7
2026-04-03 12:40:08 +00:00
Launch browser-use system. Deadline: already was completed by the time of task creation
Alex_Dubchak_comi...
closed issue Alex_Dubchak_comi.../Quality_evaluation#4
2026-04-03 12:38:57 +00:00
Study some existing datasets which test B2B-assistant systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi...
closed issue Alex_Dubchak_comi.../Quality_evaluation#3
2026-04-03 12:38:57 +00:00
Study some existing datasets which test Media Skill systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi...
pushed to main at Alex_Dubchak_comi.../Quality_evaluation
2026-03-27 14:13:25 +00:00
Alex_Dubchak_comi...
closed issue Alex_Dubchak_comi.../Quality_evaluation#2
2026-03-27 10:28:48 +00:00
Study some existing datasets which test Browser Use/Computer Use systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi...
closed issue Alex_Dubchak_comi.../Quality_evaluation#5
2026-03-27 10:28:48 +00:00
Study some existing datasets which test LLM-agent systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi...
pushed to main at Alex_Dubchak_comi.../Quality_evaluation
2026-03-25 12:28:26 +00:00
Alex_Dubchak_comi...
pushed to main at Alex_Dubchak_comi.../Quality_evaluation
2026-03-25 12:28:11 +00:00
Alex_Dubchak_comi...
pushed to main at Alex_Dubchak_comi.../Quality_evaluation
2026-03-25 12:03:19 +00:00
Alex_Dubchak_comi...
opened issue Alex_Dubchak_comi.../Quality_evaluation#5
2026-03-25 09:52:42 +00:00
Study some existing datasets which test LLM-agent systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi...
opened issue Alex_Dubchak_comi.../Quality_evaluation#4
2026-03-25 08:21:37 +00:00
Study some existing datasets which test B2B-assistant systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi...
opened issue Alex_Dubchak_comi.../Quality_evaluation#3
2026-03-25 08:19:50 +00:00
Study some existing datasets which test Media Skill system. Deadline: 27th March, 2026, 12:00 PM.