Wikipedia talk:WikiProject AI Cleanup
This is the talk page for discussing WikiProject AI Cleanup and anything related to its purposes and tasks. |
|
Archives: 1, 2Auto-archiving period: 3 months ![]() |
![]() | This project page does not require a rating on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||
|
![]() | To help centralize discussions and keep related topics together, all non-archive subpages of this talk page redirect here. |
![]() | This page has been mentioned by multiple media organizations:
|
The pre-ChatGPT era
[edit]We may want to be more explicit that text from before ChatGPT was publicly released is almost certainly not the product of an LLM. For example, an IP editor had tagged Hockey Rules Board as being potentially AI-generated when nearly all the same text was there in 2007. (The content was crap, but it was good ol' human-written crap!) Maybe add a bullet in the "Editing advice" section along the lines of "Text that was present in an article before December 2022 is very unlikely to be AI-generated." Apocheir (talk) 00:57, 25 October 2024 (UTC)
- This is probably a good idea. I'm sure they were around before then, but definitely not publicly. Symphony Regalia (talk) 01:42, 25 October 2024 (UTC)
- Definitely a good idea, also agree with this. Just added a slightly edited version of it to "Editing advice", feel free to adjust it if you wish! Chaotic Enby (talk · contribs) 01:59, 25 October 2024 (UTC)
- So far, I haven’t seen anything that I thought could be GPT-2 or older. But I did run into a few articles that seem to make many of the same mistakes as ChatGPT, except a decade earlier.
- If old pages like that could be mistaken for AI because it makes the mistakes that we look for in AI text, that does still mean that’s a problematic find; maybe we should recommend other cleanup tags for these cases. 3df (talk) 22:53, 25 October 2024 (UTC)
- I think that's very likely an instance of "bad writing". Human brains have very often produced analogous surface-level results! Remsense ‥ 论 23:05, 25 October 2024 (UTC)
- Yes, I have to say, ChatGPT's output is a lot like how a lot of first- or second-year undergraduate students write when they're not really sure if they have any ideas. Arrange some words into a nice order and hope. Stick an "in conclusion" on the end that doesn't say much. A lot of early content on Wikipedia was generated by exactly this kind of person. (Those people grew out of it; LLMs won't.) -- asilvering (talk) 00:31, 26 October 2024 (UTC)
- I ran this text from 2017 version. GPT Zero said 1% chance of AI.
- Yes, I have to say, ChatGPT's output is a lot like how a lot of first- or second-year undergraduate students write when they're not really sure if they have any ideas. Arrange some words into a nice order and hope. Stick an "in conclusion" on the end that doesn't say much. A lot of early content on Wikipedia was generated by exactly this kind of person. (Those people grew out of it; LLMs won't.) -- asilvering (talk) 00:31, 26 October 2024 (UTC)
- I think that's very likely an instance of "bad writing". Human brains have very often produced analogous surface-level results! Remsense ‥ 论 23:05, 25 October 2024 (UTC)
FIH was founded on 7 January 1924 in Paris by Paul Léautey, who became the first president, in response to field hockey's omission from the programme of the 1924 Summer Olympics. First members complete to join the seven founding members were Austria, Belgium, Czechoslovakia, France, Hungary, Spain and Switzerland. In 1982, the FIH merged with the International Federation of Women's Hockey Associations (IFWHA), which had been founded in 1927 by Australia, Denmark, England, Ireland, Scotland, South Africa, the United States and Wales. The organisation is based in Lausanne, Switzerland since 2005, having moved from Brussels, Belgium. Map of the World with the five confederations. In total, there are 138 member associations within the five confederations recognised by FIH. This includes Great Britain which is recognised as an adherent member of FIH, the team was represented at the Olympics and the Champions Trophy. England, Scotland and Wales are also represented by separate teams in FIH sanctioned tournaments.
Graywalls (talk) 00:03, 6 November 2024 (UTC)
- There's probably more bad than good writing on the Internet, and all LLMs have been extensively trained on all this bad writing, that's why they are prone to be like it 5.178.188.143 (talk) 14:23, 17 January 2025 (UTC)
How can I help?
[edit]Hi all- As a website owner that has been using ChatGPT for years, I believe I can spot signs of AI-generated content pretty quickly. I have a full-time job but would love to assist (to ensure the truth remains true and for my own personal development.)
Thanks! Chris Aisavestheworld (talk) 21:09, 2 January 2025 (UTC)
- Hello! A good start would be to install Wikipedia:Twinkle, which allows you to tag articles (including, in this case, with the {{AI-generated}} tag). You can tag pages that you encounter, or look for new additions in Special:RecentChanges! If you see users adding AI-generated content with clear issues (which for now is the vast majority of visible AI-generated content), you can warn them with {{uw-ai1}}. Chaotic Enby (talk · contribs) 21:23, 2 January 2025 (UTC)
- Thanks very much! I'll do that. Aisavestheworld (talk) 16:15, 6 January 2025 (UTC)
- @Aisavestheworld: Also have a go at servicing the Category:Articles containing suspected AI-generated texts catgeory where they end up, to clean the stuff up and remove the article content entries. Be bold and remove the stuff if you see it. This is the greatest literary/encyclopeadic project since the Library of Alexandria, so its worth the time. If your in the NPP/AFC group, post it back on the NPP queue and anything else if you find its troublesome, for example if there is autopatrolled editor is who is using it. If its draft under the 90 day limit, then redraft it and put a clear reason why its been drafted. Speak to the editor and tell them why is not acceptable to post AI slop. Explain it clearly so they realise its not whats wanted, and tell them there is stormy weather ahead if they continue. Be soft, considerate, kind, responsive and helpful. But if you warning them and they don't comply after the four warnings, e.g. disruptive editing, send them to WP:ANI, or here where we can have a group chat e.g. coin. If it doesn't work, out then its ANI. It is far too early to use AI effectively, seems to be the wide consensus, although I think its probably going to be good for diagrams, for example medical diagrams, and physical illustrations but not BLP's portraits or any BLP. Hope that helps. scope_creepTalk 16:48, 6 January 2025 (UTC)
- Thank you @Scope creep - Can you help me get started here? I think I just need to know where to go and I can get started: "Category:Articles containing suspected AI-generated texts catgeory". Aisavestheworld (talk) 18:29, 6 January 2025 (UTC)
- @Aisavestheworld: I never realised you've been only been on Wikipedia for a very short time. I would ignore that advice I gave you for at least a year or two until your well established. scope_creepTalk 18:36, 6 January 2025 (UTC)
- Understood. Thanks again! Aisavestheworld (talk) 18:40, 6 January 2025 (UTC)
- @Aisavestheworld: I never realised you've been only been on Wikipedia for a very short time. I would ignore that advice I gave you for at least a year or two until your well established. scope_creepTalk 18:36, 6 January 2025 (UTC)
- Thank you @Scope creep - Can you help me get started here? I think I just need to know where to go and I can get started: "Category:Articles containing suspected AI-generated texts catgeory". Aisavestheworld (talk) 18:29, 6 January 2025 (UTC)
- @Aisavestheworld: Also have a go at servicing the Category:Articles containing suspected AI-generated texts catgeory where they end up, to clean the stuff up and remove the article content entries. Be bold and remove the stuff if you see it. This is the greatest literary/encyclopeadic project since the Library of Alexandria, so its worth the time. If your in the NPP/AFC group, post it back on the NPP queue and anything else if you find its troublesome, for example if there is autopatrolled editor is who is using it. If its draft under the 90 day limit, then redraft it and put a clear reason why its been drafted. Speak to the editor and tell them why is not acceptable to post AI slop. Explain it clearly so they realise its not whats wanted, and tell them there is stormy weather ahead if they continue. Be soft, considerate, kind, responsive and helpful. But if you warning them and they don't comply after the four warnings, e.g. disruptive editing, send them to WP:ANI, or here where we can have a group chat e.g. coin. If it doesn't work, out then its ANI. It is far too early to use AI effectively, seems to be the wide consensus, although I think its probably going to be good for diagrams, for example medical diagrams, and physical illustrations but not BLP's portraits or any BLP. Hope that helps. scope_creepTalk 16:48, 6 January 2025 (UTC)
- Thanks very much! I'll do that. Aisavestheworld (talk) 16:15, 6 January 2025 (UTC)
I learned in this thread that there are AI bias checkers. My knee-jerk reaction is, for WP-purposes, kill with fire. Gråbergs Gråa Sång (talk) 21:29, 6 January 2025 (UTC)
AI-touched-up images?
[edit]Sofronio Vasquez currently uses the image File:Sofronio P. Vasquez III in 2025 (Enhanced) (3).png, which has the rubbery, weirdly lit appearance of AI-generated images, but was extracted from this youtube video and then "digitally enhanced". (I verified that the scene actually appears in the video.) I asked User:HurricaneEdgar, who touched it up, what "digitally enhanced" meant but he didn't respond. Are AI-touching-up tools available, and do they have the same issues as other AI generation? Apocheir (talk) 23:28, 16 January 2025 (UTC)
- Yes, AI-enhancing/upscaling tools definitely exist. In this case, the article should be tagged with {{Upscaled images}}, and the file should be flagged on Commons with {{AI upscaled}}. On the English Wikipedia, it is preferable to use the original picture rather than any AI-upscaled version. @HurricaneEdgar, if you still have the original (non-enhanced) image, it could be helpful to upload it so it can be used instead. Chaotic Enby (talk · contribs) 00:21, 17 January 2025 (UTC)
Bot request discussion
[edit]I've opened a thread at Wikipedia:Bot requests#Bot to track usage of AI images in articles to suggest a bot that detects when AI and AI-upscaled images are being used in articles (not in any clever deductive way, just using the Commons categories), outputting a list in the style of the currently hand-crafted Wikipedia:WikiProject AI Cleanup/AI images in non-AI contexts.
If anybody has any thoughts on that or expertise to share, please drop by. Belbury (talk) 15:57, 22 January 2025 (UTC)
- That could be great indeed! If the bot can directly add them to the page, it could be even more practical! Chaotic Enby (talk · contribs) 20:38, 22 January 2025 (UTC)
User:Vanderwaalforces has now kindly set up User:DreamRimmer's script to run as a bot update every Sunday, adding a list of AI-affected files to Wikipedia:WikiProject AI Cleanup/VWF bot log. I'll check in occasionally and see whether anything on there needs an {{upscaled images}} template, or adding to Wikipedia:WikiProject_AI_Cleanup/AI_images_in_non-AI_contexts. --Belbury (talk) 09:46, 3 February 2025 (UTC)
Idea: one of four categories added to images automatically during upload
[edit]Hi everyone! I have been thinking for a while about a solution regarding separating AI-generated content and human-made content and came up with my idea about the four categories. I would like to ask for your opinions and especially ideas and solutions on commons:Commons talk:AI-generated media#Four categories for images! Kind regards, S. Perquin (talk) – 21:31, 24 January 2025 (UTC)
Provisional AI-generated analysis of documentation (January 2025)
[edit]On the Dutch-language Wikipedia, we are currently also having a discussion about the use of AI in Wikipedia articles, including the use of AI-generated images. I have the impression that this topic is being considered more actively and thoroughly on the English-language Wikipedia than on the Dutch-language Wikipedia. If you don't mind, I would like to join the discussion here as well, to gather inspiration for the policy on the Dutch-language Wikipedia.
In any case, a great idea to keep documentation on which AI-generated images are removed and which are not. Because I was curious about what conclusion ChatGPT would draw from these findings, I gave it the following prompt: Analyze why AI-generated images were removed from the English-language Wikipedia based on this documentation and write only a conclusion in up to ten sentences
. This was the outcome:
“ | AI-generated images were removed from English-language Wikipedia articles due to concerns about their accuracy, quality, and relevance in non-AI contexts. Such images often fail to represent historical or real individuals authentically, risking misinformation. They can introduce errors or unrealistic portrayals that conflict with Wikipedia's goal of providing reliable, evidence-based content. Quality concerns also stem from inconsistencies in detail and fidelity, particularly in facial depictions or historical recreations. Where authentic photographs or verified illustrations exist, they are preferred, making AI-generated visuals unnecessary. The lack of proper sourcing for AI-generated content further complicates their verification, undermining Wikipedia's strict citation and credibility standards. Ethical considerations, such as misrepresentation of subjects or contexts, also played a role in their removal. The community has emphasized prioritizing verified, human-curated content over AI-generated media to maintain trust. Additionally, concerns about potential copyright ambiguities surrounding AI-generated works influenced this decision. These measures align with Wikipedia's principles of neutrality, reliability, and transparency in public knowledge dissemination. | ” |

One of the questions that comes to my mind is: aren't human-made images sometimes also just inaccurate, irrelevant and misleading? Take the picture on the right as an example. The article about the Secernosaurus provides this picture. However, according to a user on Wikimedia Commons, it would not be accurate because hindquarters would lack musculature and fat. If the same image were made by AI, and it is many times better, could and should it be in the article? Doesn't this apply to all dinosaurs in principle? They might have been bigger and skinnier than previously thought. The same likely applies to many other artistic impressions. Exoplanets and stars might also look different than we think. I'm curious about how you think about, for example, artistic impressions on the English-language Wikipedia. Kind regards, S. Perquin (talk) – 09:16, 25 January 2025 (UTC)
- If human-made images are inaccurate, they should also be removed. We do have WP:PALEOART and WP:DINOART for reviewing reconstructions of extinct animals. If you believe that this image of Gryposaurus (not Secernosaurus, despite it being used there) is inaccurate, it should be submitted there for review and removed from the article. I haven't seen any AI-generated reconstructions of dinosaurs that are
many times better
than this slightly skinny hadrosaur and don't introduce blatant inaccuracies, but yes, on principle, we don't have any guidelines specifically excluding AI-images for paleoart reconstructions (or anywhere beyond BLPs). However, we also shouldn't give more latitude to errors in AI-generated images either, even if the process is often more error-prone and less consistent with the paleontological data than human reconstructions. Chaotic Enby (talk · contribs) 14:17, 25 January 2025 (UTC)- Apparently, this image has already been reviewed (thus the tag on Commons), with the consensus being that it's too slim but not terribly inaccurate. Still, I've replaced it with a more plump reconstruction. Chaotic Enby (talk · contribs) 14:29, 25 January 2025 (UTC)
- I handle extinct buildings rather than extinct animals, but similar discussions arise as to whether we should use a photo or a drawing, with one side saying the photo should always be preferred, and my side saying such prejudice has little value. My example is the extinct Bronx Borough Hall for which we have good drawings, and poor contemporary photos, and my own photos of the remnants. I had no trouble pushing my opinion that the best drawing we had was the best illustration, and it seems to me every time, it will be a judgement call. There are general arguments for preferring plain photos over retouched photos, over paintings and drawings by people, over AI renderings, but when it comes down to cases, we have to decide as best we can among what's actually available. A good AI will surely beat a bad illustration from another source, if those are what are available. Jim.henderson (talk) 16:34, 29 January 2025 (UTC)
Discussion at Wikipedia talk:Large language models § LLM-generated content
[edit] You are invited to join the discussion at Wikipedia talk:Large language models § LLM-generated content, which is within the scope of this WikiProject. Chaotic Enby (talk · contribs) 11:24, 31 January 2025 (UTC)
I'm thinking about having that page's title changed to something along the lines of [Signs or Indicators] of (likely) [AI or ChatGPT] authorship, but I can't decide which words should be used.
- Signs or Indicators?
- AI or ChatGPT?
- Should likely be included?
If you have any better title ideas, feel free to share your alternative proposals. – MrPersonHumanGuy (talk) 14:40, 3 February 2025 (UTC)
- AI (or LLM) should be better than ChatGPT, as we should also have catchphrases indicating other large language models. Best to also add "likely". Not sure about "Signs" vs "Indicators", both are good although "Signs" might be more concise. Chaotic Enby (talk · contribs) 12:39, 20 February 2025 (UTC)
- "Signs", "AI" and "likely" are all good ideas.
- I've just added a section on markup (the
turn0search0
issue noted below, plus a?utm_source=chatgpt.com
one I just encountered for the first time), which seem worth tracking but definitely aren't "catchphrases". Belbury (talk) 17:27, 27 February 2025 (UTC)- Great job! Regarding
?utm_source=chatgpt.com
, there was a discussion at Wikipedia talk:Large language models#LLM-generated content regarding making an edit filter for that purpose, although it hasn't lead to a concrete implementation yet. Chaotic Enby (talk · contribs) 17:35, 27 February 2025 (UTC)
- Great job! Regarding
citeturn0search0
[edit]I deleted a couple of spam pages, likely AI-generated, and noticed that in both cases, each section of text ended in citeturn0search0 – anyone know where that comes from? I'm guessing some sort of AI tool, but don't know. When I tried googling it (didn't find anything particularly useful, BTW), that square symbol turned into a 'hamburger' stack; no idea what character it's actually meant to be. -- DoubleGrazing (talk) 08:55, 20 February 2025 (UTC)
- Definitely an artefact of ChatGPT, and maybe other models. If I get an answer with grey button external links at the ends of sentences, those become
turn0search0
when I click the "Copy" button to put the response into my clipboard. I've also found that if ChatGPT returns an answer with some example images at the top, those images becomeiturn0image0turn0image1turn0image4turn0image5
. - I'm not seeing a huge amount of this out there on the web, so maybe it's just a recent bug in how ChatGPT's interface renders markup to the clipboard. Belbury (talk) 10:06, 20 February 2025 (UTC)
- Thanks, good to know. -- DoubleGrazing (talk) 10:10, 20 February 2025 (UTC)
is there a way to state that only the lastest Version is ai
[edit]I think the latest edit on Quantum Markov chain is ai made based on how unsually long it is for one edit, the facts that none of the new references are normal cites and the fact that "citeturn0search0"(an ai artifact) is at the end Skeletons are the axiom (talk) 16:34, 26 February 2025 (UTC)
- In that case, the best thing to do is to revert to the previous version. However, if someone has time and is knowledgeable in that domain, it could be helpful to take a look at the references (especially the third and fourth ones which are linked) to see if there's any material in the article that they support. Chaotic Enby (talk · contribs) 17:35, 26 February 2025 (UTC)
User rapidly creating long bios that GPTZero says are 100% probability AI-generated
[edit]Please see Special:Contributions/HRShami. I tested the first paragraph of Calin Belta § Career and the first paragraph of David L. Woodruff § Career and got a 100% AI-generated score from GPTZero in both cases, but the likelihood of AI generation is also suggested by the speed at which these articles are being generated. Sourcing quality is poor: many opinions about what the subjects have accomplished, mostly sourced to the publications of the subjects themselves; spot-checking the references in the Woodruff article found that they backed up maybe 1/3 of the claims in the text they purported to be references for. —David Eppstein (talk) 07:34, 27 February 2025 (UTC)
- I have been writing articles pretty much the same way since pre-GPT era. It's a very standard Wikipedia way. The thought of checking my writing against GPTZero did not even occur to me because I absolutely despise AI generated writing. After your message I checked three articles on GPT Zero and it declared "moderately confident that writing is human" and "certainly human writing" on all three. In any writing, if you pick a very small part of it, no machine can tell correctly whether it is AI or human. You must check the whole writing. Even checking single paragraphs of my writing generated "human content" on GPT Zero for most of the paragraphs. If just one paragraph in an article with 8 or 9 paragraph returns AI Generated, with the rest of the paragraphs returning "Human Content", I think we should accept the writing as human content. I don't know what you mean by speed. I have written a total of 10 articles in February and edited one article completely. If I use AI, I can easily generate 10 articles a day. I might have misplaced references in the Woodruff article, which is a human error. Sometimes, other editors point out that the reference is not correct for the preceding information and I fix it with the correct reference. I asked ChatGPT to generate the same Woodruff article. I suggest you do the same. Even after multiple prompts, the article generated by ChatGPT was nowhere near my writing.HRShami (talk) 10:05, 27 February 2025 (UTC)
Possible AI article?
[edit]a friend of mine notified me of this article 1 nm process, which they suspect might be written using an LLM. I am personally not good at figuring out this kind of stuff so I'm passing it on to here so that ppl here can check. ―Howard • 🌽33 00:23, 3 March 2025 (UTC)
- Indeed. Nuked the parts that looked AI-generated (and were unsourced, anyway). Diverging Diamond (is Queen of Hearts's alt; talk) 00:27, 3 March 2025 (UTC)
This was recently relisted with the broader scope. JoelleJay (talk) 22:15, 4 March 2025 (UTC)
Wikipedia:Computer-generated content listed at Requested moves
[edit]
A requested move discussion has been initiated for Wikipedia:Computer-generated content to be moved to Wikipedia:AI-generated content. This page is of interest to this WikiProject and interested members may want to participate in the discussion here. —RMCD bot 19:41, 5 March 2025 (UTC)
- To opt out of RM notifications on this page, transclude {{bots|deny=RMCD bot}}, or set up Article alerts for this WikiProject.
Old Gods of Appalachia
[edit]I believe the episode summaries in Old Gods of Appalachia are AI generated. It looks like a large number of summaries were added in a single edit by an editor who has previously been warned for using AI generated content. It looks like someone else has also questioned whether it's AI generated content on the talk page. I'm looking for a second opinion, guidance on what to do, or assistance in cleaning it up. TipsyElephant (talk) 00:17, 16 March 2025 (UTC)
- Some of them definitely sound like AI to me. In the first one alone:
The narrative delves into
,The prologue highlights the interconnectedness
... Chaotic Enby (talk · contribs) 00:58, 16 March 2025 (UTC)
Likely AI contents scraping, but also likely public relations editing
[edit]This maybe of interest for members here Wikipedia:Conflict_of_interest/Noticeboard#User_Hifisamurai and https://commons.wikimedia.org/wiki/Special:Log/Hifisamurai Graywalls (talk) 09:24, 16 March 2025 (UTC)
Chatbot additions to VG (nerve agent)
[edit]This is being discussed by members of the chemistry project at WT:WikiProject Chemicals#Use of chatbot in VG (nerve_agent) but may be of wider interest. Please comment there, not here. Mike Turnbull (talk) 15:32, 16 March 2025 (UTC)
Passive or active cleanup?
[edit]I'm interested and excited to help with this effort. I'm curious how folks here practice AI cleanup. Do you actively look for AI slop or are you passively aware of it while doing other tasks?
I spent some time this AM reviewing Special:RecentChanges expecting to find more instances of potentially AI generated content given the lengthy policy discussions on Village pump. I'm in tune with some of the quirks and language tendencies of popular chat models in other context so I guess I was surprised not to find anything obvious. I'm not an experienced editor by any means... Does anyone have any tips related to visual queues they look for in edit history summaries that merit a closer look? Zentavious (talk) 14:44, 20 March 2025 (UTC)
- I would say I'm doing a mix of passive cleanup (cleaning it up while doing other tasks such as new page patrolling), semi-active cleanup (cleaning articles reported by other users as potentially AI-generated), and behind-the-scenes technical work. Regarding history and edit summary alone, there's often less to work with, but two clues are long, structured edit summaries (often generated by LLMs, although humans can also take care of writing good edit summaries!), and repeated long additions by the same user in a short time, especially on different articles. That last one is particularly telling: if the same editor makes 5000 bytes additions every five minutes, they likely haven't written everything by themselves. Chaotic Enby (talk · contribs) 17:37, 20 March 2025 (UTC)
I'm not sure where the threshold is for the outright removal of AI generated text. At Elkmont, Alabama, an editor has stated--when asked if they are using AI--"I am using something to help me edit the text". I reverted their edit twice, because the tone was extremely formal and out of line with Wikipedia's voice. The input of others would be appreciated! Thanks. Magnolia677 (talk) 15:26, 23 March 2025 (UTC)
- In this case, I would say that WP:NOTEVERYTHING and WP:INDISCRIMINATE apply, and that it is reasonable to revert the edits. I mean, these are all delightful:
Farmers were diligently planting corn, with hopes for a bountiful harvest if conditions remained favorable, while wheat and oat crops showed promise. The cotton market was active, and concerns arose over potential losses in the peach crop due to recent frosts
T. O. Bridgforth celebrated his 55th birthday with a large family reunion and dinner, which was described as one of the most sumptuous meals enjoyed since the end of a severe drought
The article closed with lighthearted local anecdotes, including a humorous mix-up involving a wheelbarrow and an umbrella
- but not remotely encyclopedic. There are also some instances of external URLs in the content body, which violates WP:NOELBODY. You might politely point them in the direction of WP:LLM too
, and if they must continue to use an LLM assistant, to add well-cited encyclopedic content in smaller chunks, so that each addition may be considered on its own merit. Rather than one huge swathe of text.Cheers, SunloungerFrog (talk) 16:08, 23 March 2025 (UTC)