Tech 22 Feb, 2025

A test environment has been created to evaluate whose work

The group of international scientists has developed a computer agent arena foundation, designed to check and improve the work of computer agents by artificial intelligence. The project has the participation of experts from the University of Waterloo, Hong Kong University, Salesforce Research and the University of Carnegie Mellon.

A test environment has been created to evaluate whose work

Computer assistants are programs that perform tasks without human intervention. Examples of such assistants are a Siri voice assistant, who can send messages and hold meetings. However, modern Aissists face difficulties in performing complex tasks that require interaction with different applications. For example, the border reporting may be difficult due to the need to find data in letters, extracts and tables.

The computer agent arena has become the first platform to check Aissistry in a real computer environment. This is the development of the previous Osworld project – the first expansion environment for the operation of multimodal systems.

According to one of the developers, a professor at the University of Waterloo Viktor Zhong, the new environment allows you to compare different AI models based on language and visual technologies. Users choose the operating system, applications (for example, Google Chrome or Excel), then set the task of the assistant, then the system in real time comparing the performance of the task with two different models .