Talk:List of large language models

Release date sorting is incorrect.

I'm encountering an issue with the sorting order of the Release Date column. How to fix it? Acdadd (talk) 05:50, 1 January 2025 (UTC)[reply]

We can put the dates to a yyyy-mm-dd format. DemCam (talk) 16:14, 24 January 2025 (UTC)[reply]

I attempted to fix it, I think it works. DemCam (talk) 16:40, 24 January 2025 (UTC)[reply]

Notes for how some of the compute are made

The DeepSeek compute budget is hard to figure out.

In the DeepSeek LLM paper they showed in a plot that the DeepSeek-LLM-67B cost 1e24 FLOPs, and said it was trained on 2T tokens.

In the V2 paper, they said "During our practical training on the H800 cluster, for training on each trillion tokens, DeepSeek 67B requires 300.6K GPU hours, while DeepSeek-V2 needs only 172.8K GPU hours, i.e., sparse DeepSeek-V2 can save 42.5% training costs compared with dense DeepSeek 67B." and "We construct a high-quality and multi-source pre-training corpus consisting of 8.1T tokens."

The V3 paper they said it cost 2.788M H800-hours. With all these data, we can calculate that they used a ratio of 0.02 petaFLOP-days per H800-hour. pony in a strange land (talk) 03:54, 28 January 2025 (UTC)[reply]

Where is Neuro-sama?

I saw her being mentioned in the table when I last visited, but now she is no longer present, I checked the revisions and someone removed her for not being an LLM, however she is one, so why change it?109.81.174.249 (talk) 05:00, 18 February 2025 (UTC)[reply]

Neuro-sama is not a LLM, but a chatbot. Alenoach (talk) 19:36, 19 February 2025 (UTC)[reply]

She is a composite system with many parts, only one (or more?) of which is probably an LLM. However since we never had a technical report about how Neuro-Sama is made, or what the LLM is, I'm not going to add her in.

I may if there is a technical report about the LLM behind Neuro-sama. pony in a strange land (talk) 10:44, 22 February 2025 (UTC)[reply]