Alex_Dubchak_coming_up
  • Joined on 2026-03-20
Alex_Dubchak_comi... opened issue Alex_Dubchak_comi.../Quality_evaluation#10 2026-04-29 07:49:50 +00:00
Launch the tests again using new llm-as-a-judge. Deadline: 3rd May, 11:59PM.
Alex_Dubchak_comi... opened issue Alex_Dubchak_comi.../Quality_evaluation#9 2026-04-29 07:00:19 +00:00
Get an access to VPS, deploy all the test-pipeline. Deadline: 3rd May, 23:59.
Alex_Dubchak_comi... closed issue Alex_Dubchak_comi.../Quality_evaluation#8 2026-04-29 06:13:48 +00:00
Test already deployed Browser-Use system. Deadline: 7th April, 2026, 11:59 PM.
Alex_Dubchak_comi... pushed to main at Alex_Dubchak_comi.../Quality_evaluation 2026-04-23 20:31:08 +00:00
e4acc1c84d 252 tests
Alex_Dubchak_comi... pushed to main at Alex_Dubchak_comi.../Quality_evaluation 2026-04-22 22:21:52 +00:00
1dd92ab887 45 tests
Alex_Dubchak_comi... pushed to main at Alex_Dubchak_comi.../Quality_evaluation 2026-04-22 21:05:06 +00:00
98d5e90894 mind2web
Alex_Dubchak_comi... opened issue Alex_Dubchak_comi.../Quality_evaluation#8 2026-04-03 12:42:40 +00:00
Test Browser-Use deployed system. Deadline: 7th April, 11:59 PM.
Alex_Dubchak_comi... closed issue Alex_Dubchak_comi.../Quality_evaluation#7 2026-04-03 12:40:16 +00:00
Launch browser-use system. Deadline: already was completed by the time of task creation
Alex_Dubchak_comi... opened issue Alex_Dubchak_comi.../Quality_evaluation#7 2026-04-03 12:40:08 +00:00
Launch browser-use system. Deadline: already was completed by the time of task creation
Alex_Dubchak_comi... closed issue Alex_Dubchak_comi.../Quality_evaluation#4 2026-04-03 12:38:57 +00:00
Study some existing datasets which test B2B-assistant systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi... closed issue Alex_Dubchak_comi.../Quality_evaluation#3 2026-04-03 12:38:57 +00:00
Study some existing datasets which test Media Skill systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi... pushed to main at Alex_Dubchak_comi.../Quality_evaluation 2026-03-27 14:13:25 +00:00
2b5d923f63 new datasets
Alex_Dubchak_comi... closed issue Alex_Dubchak_comi.../Quality_evaluation#2 2026-03-27 10:28:48 +00:00
Study some existing datasets which test Browser Use/Computer Use systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi... closed issue Alex_Dubchak_comi.../Quality_evaluation#5 2026-03-27 10:28:48 +00:00
Study some existing datasets which test LLM-agent systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi... pushed to main at Alex_Dubchak_comi.../Quality_evaluation 2026-03-25 12:28:26 +00:00
411ccf9bcd Update README.md
Alex_Dubchak_comi... pushed to main at Alex_Dubchak_comi.../Quality_evaluation 2026-03-25 12:28:11 +00:00
252167ff46 Update README.md
Alex_Dubchak_comi... pushed to main at Alex_Dubchak_comi.../Quality_evaluation 2026-03-25 12:03:19 +00:00
19917c69cb Update README.md
Alex_Dubchak_comi... opened issue Alex_Dubchak_comi.../Quality_evaluation#5 2026-03-25 09:52:42 +00:00
Study some existing datasets which test LLM-agent systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi... opened issue Alex_Dubchak_comi.../Quality_evaluation#4 2026-03-25 08:21:37 +00:00
Study some existing datasets which test B2B-assistant systems. Deadline: 27th March, 2026, 12:00 PM.
Alex_Dubchak_comi... opened issue Alex_Dubchak_comi.../Quality_evaluation#3 2026-03-25 08:19:50 +00:00
Study some existing datasets which test Media Skill system. Deadline: 27th March, 2026, 12:00 PM.