User:Ukiriz/sandbox/Computational creativity 2
Overview: I added the Text-to-image Model subsection under the Machine learning for Computational creativity section. The added text is bolded and underlined.
![]() | This is a user sandbox of Ukiriz/sandbox. You can use it for testing or practicing edits. This is not the place where you work on your assigned article for a dashboard.wikiedu.org course. Visit your Dashboard course page and follow the links for your assigned article in the My Articles section. |
Machine learning for Computational creativity
[edit]
photograph of an astronaut riding a horse on moon
While traditional computational approaches to creativity rely on the explicit formulation of prescriptions by developers and a certain degree of randomness in computer programs, machine learning methods allow computer programs to learn on heuristics from input data enabling creative capacities within the computer programs.[1] Especially, deep artificial neural networks allow to learn patterns from input data that allow for the non-linear generation of creative artefacts. Before 1989, artificial neural networks have been used to model certain aspects of creativity. Peter Todd (1989) first trained a neural network to reproduce musical melodies from a training set of musical pieces. Then he used a change algorithm to modify the network's input parameters. The network was able to randomly generate new music in a highly uncontrolled manner.[2][3][4] In 1992, Todd[5] extended this work, using the so-called distal teacher approach that had been developed by Paul Munro,[6] Paul Werbos,[7] D. Nguyen and Bernard Widrow,[8] Michael I. Jordan and David Rumelhart.[9] In the new approach, there are two neural networks, one of which is supplying training patterns to another. In later efforts by Todd, a composer would select a set of melodies that define the melody space, position them on a 2-d plane with a mouse-based graphic interface, and train a connectionist network to produce those melodies, and listen to the new "interpolated" melodies that the network generates corresponding to intermediate points in the 2-d plane.
Language Models and Hallucination
[edit]Language models like GPT and LSTM are used to generate texts for creative purposes, such as novels and scripts. These models demonstrate hallucination from time to time, where erroneous materials are presented as factual. Creators make use of their hallucinatory tendency to capture unintended results. Ross Goodwin's 1 the Road, for example, uses an LSTM model trained on literature corpora to generate a novel that refers to Jack Kerouac's On the Road based on multimodal input captured by a camera, a microphone, a laptop's inner clock, and a GPS throughout the road trip.[10][11] Brian Merchant commented on the novel as "pixelated poetry in its ragtag assemblage of modern American imagery".[11] Oscar Sharp and Ross Goodwin created the experimental sci-fi short film Sunspring in 2016, written with an LSTM model, trained on their scripts and 1980-1990 sci-fi movies.[10][12] Rodica Gotca critiqued their overall lack of focus on the narrative and intention to create based on the background of human culture.[10]
Nevertheless, researchers highlight the positive side of language models' hallucination for generating novel solutions, given that the correctness and consistency of the response could be controlled. Jiang et al. propose the divergence-convergence flow model for harnessing the hallucinatory effects. They summarize the types of such effects in current research into factuality hallucinations and faithfulness hallucinations, which can be divided into smaller classes like factual fabrication and instruction inconsistency. While the divergence stage involves generating potentially hallucinatory content, the convergence stage focuses on filtering the hallucinations that are useful for the user with intent recognition and evaluation metrics.[13]
Text-to-Image Model
[edit]Recent advancements in computational creativity in the image-text medium are represented by text-to-image models that enable the generation of digital artworks from natural language descriptions. These systems have evolved rapidly since OpenAI introduced CLIP (Contrastive Language-Image Pre-training) in January 2021, which learned from web-scale training data to associate visual concepts with textual descriptions. [14]While early models like VQGAN-CLIP and BigSleep combine CLIP with GAN (Generative Adversarial Network), later ones like DALL-E, Midjourney, and Stable Diffusion incorporate diffusion models instead of GAN, which allows the production of more photo-realistic digital images from textual prompts.[15][16] Nevertheless, Hannah Johnston points out the "surreal quality that has since been largely lost" due to this transition. [17]
The creative potential of these systems lies not merely in their computational output, but in the human-AI collaborative process they facilitate. Zhou and Lee's research, analyzing over 4 million artworks from more than 50,000 users, shows that text-to-image AI significantly enhances human creative productivity by 25% and increases the value as measured by the likelihood of receiving a favorite per view by 50%. However, this productivity increase comes with nuanced effects on creativity itself. When top-quality work Content Novelty, or center piece theme and connections, increases over time, mean Content Novelty decreases, a sign that the resulting idea space is larger but wasteful. This suggests that some creators push boundaries of creativity further than others, but in general, there is coalescing into visually uniform and similarly themed paintings.[18]
The process of artistic text-to-image creation relies on "prompt engineering" – the iterative task of crafting effective textual inputs to condition the AI model. It depends on understanding the training data of the model, experimenting with style modifiers, and engaging in what researchers have called "generative synesthesia" – the seamless blending of human discovery and AI leveraging to discover new creative pipelines. Jonas Oppenlaender emphasizes the entire process of human-AI interaction, including image-level and portfolio-level selection, over the final output of the text-to-image model.[19][20] Referring to Mel Rhodes' four-P model (person, process, press, product) to reframe computational creativity, Oppenlaender highlights the entire creative pipeline – the practitioner, the iterative process, the community environment, and the chosen output – to evaluate AI-assisted art creation.[21][20]
Online communities are now at the core of this system, serving as learning platforms as practitioners share techniques and prompts. These communities make high-level art production tools available to everyone, allowing creators who lack formal artistic education to produce high-quality visual material. AI-boosted artists capable of executing more innovative ideas without regard to their prior originality might produce works that are rated more favorably by peers.[20]
The technology raises profound questions over authorship and creativity in the contemporary age. There are legal implications, such as when AI-generated content has won art competitions or artists asking for copyright on AI-assisted content. Such developments remind us of previous argument over the validity of photography as art, suggesting a paradigm shift over how society conceptualizes and perceives creative work.[22]
- ^ Mateja, Deborah; Heinzl, Armin (December 2021). "Towards Machine Learning as an Enabler of Computational Creativity". IEEE Transactions on Artificial Intelligence. 2 (6): 460–475. doi:10.1109/TAI.2021.3100456. ISSN 2691-4581. S2CID 238941032.
- ^ Todd, P.M. (1989). "A connectionist approach to algorithmic composition". Computer Music Journal. 13 (4): 27–43. doi:10.2307/3679551. JSTOR 3679551. S2CID 36726968.
- ^ Bharucha, J.J.; Todd, P.M. (1989). "Modeling the perception of tonal structure with neural nets". Computer Music Journal. 13 (4): 44–53. doi:10.2307/3679552. JSTOR 3679552. S2CID 19286486.
- ^ Todd, P.M., and Loy, D.G. (Eds.) (1991). Music and connectionism. Cambridge, MA: MIT Press.
- ^ Todd, P.M. (1992). A connectionist system for exploring melody space. In Proceedings of the 1992 International Computer Music Conference (pp. 65–68). San Francisco: International Computer Music Association.
- ^ Munro, P. (1987), "A dual backpropagation scheme for scalar-reward learning", Ninth Annual Conference of the Cognitive Science
- ^ Werbos, P.J. (1989), "Neural networks for control and system identification", Decision and Control
- ^ Nguyen, D.; Widrow, B. (1989). "The truck backer-upper: An example of self-learning in neural networks" (PDF). IJCNN'89.
- ^ Jordan, M.I.; Rumelhart, D.E. (1992), "Forward models: Supervised learning with a distal teacher", Cognitive Science
- ^ a b c Gotca, Rodica. "Computational literature – creation under the auspices of AI and GPT models | Dialogica" (in Romanian). doi:10.59295/dia.2023.1.04. Retrieved 2025-05-10.
- ^ a b Merchant, Brian (2018-10-01). "When an AI Goes Full Jack Kerouac". The Atlantic. Retrieved 2025-05-10.
- ^ Newitz, Annalee (2021-05-30). "Movie written by algorithm turns out to be hilarious and intense". Ars Technica. Retrieved 2025-05-10.
- ^ Jiang, Xuhui; Tian, Yuxing; Hua, Fengrui; Xu, Chengjin; Wang, Yuanzhuo; Guo, Jian (2024-02-02). "A Survey on Large Language Model Hallucination via a Creativity Perspective". arXiv:2402.06647 [cs.AI].
- ^ Radford, Alec; Kim, Jong Wook; Hallacy, Chris; Ramesh, Aditya; Goh, Gabriel; Agarwal, Sandhini; Sastry, Girish; Askell, Amanda; Mishkin, Pamela (2021-02-26), Learning Transferable Visual Models From Natural Language Supervision, arXiv, doi:10.48550/arXiv.2103.00020, arXiv:2103.00020, retrieved 2025-05-28
- ^ "What DALL-E Reveals About Human Creativity | Stanford HAI". hai.stanford.edu. Retrieved 2025-05-28.
- ^ Mukherjee, Amritangshu (2022-12-13). "Exploring the World of AI-Generated Art with CLIP and VQGAN". Medium. Retrieved 2025-05-28.
- ^ Johnston, Hannah (April 30, 2025). "The Art of VQGAN+CLIP: An Ode to Early AI Image Generation". Medium. Retrieved May 28, 2025.
- ^ Zhou, Eric; Lee, Dokyun (2024-03-01). "Generative artificial intelligence, human creativity, and art". PNAS Nexus. 3 (3): pgae052. doi:10.1093/pnasnexus/pgae052. ISSN 2752-6542.
- ^ Zhou, Eric; Lee, Dokyun (2024-03-01). "Generative artificial intelligence, human creativity, and art". PNAS Nexus. 3 (3): pgae052. doi:10.1093/pnasnexus/pgae052. ISSN 2752-6542.
- ^ a b c Oppenlaender, Jonas (2022-05-13). "The Creativity of Text-to-Image Generation". arXiv.org. Retrieved 2025-05-28.
- ^ Rhodes, Mel (1961). "An Analysis of Creativity". The Phi Delta Kappan. 42 (7): 305–310. ISSN 0031-7217.
- ^ Belanger, Ashley (2024-10-07). "Artist appeals copyright denial for prize-winning AI-generated work". Ars Technica. Retrieved 2025-05-28.