Jump to content

File:Measuring AI Ability to Complete Long Tasks.png

Page contents not supported in other languages.
This is a file from the Wikimedia Commons
From Wikipedia, the free encyclopedia
Original file (1,300 × 776 pixels, file size: 62 KB, MIME type: image/png)

Summary

Description
English: The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts.
Date
Source https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Author METR

Licensing

w:en:Creative Commons
attribution
This file is licensed under the Creative Commons Attribution 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Captions

The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability, from 2019 to 2026.

Items portrayed in this file

depicts

19 March 2025

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current11:37, 15 June 2025Thumbnail for version as of 11:37, 15 June 20251,300 × 776 (62 KB)NmzzzdUploaded a work by METR from https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ with UploadWizard

The following 2 pages use this file:

Global file usage

The following other wikis use this file: