The way I see this explosion of AI generative arts (including imagery, sound, as well as texts) is as follows:
First, it's not at all surprising to me that these products are being pushed out at the blink of an eye atm -- the deep learning model, that being based on the neural network, has been created for a very long time, but the infrastructures to support any efficient use had been lacking. To expand, the neural network is, to my latest knowledge, the most accurate ML model
at scale, but it's also a very cost-prohibitive model. It requires large numbers of servers pooling their computation resources together, at numbers impractical to host in one single location (energy consumption, heating, etc. all proves difficult for one locality). Cloud networks resolves that, now with the technology available since web 2 and matured (see AWS, Azure, etc.) it allows mass industrial adoptation of neural network based AIs.
Neural networks, as one can imagine, depends on massive clusters of servers to process input data and compute learning results. In a roughly similar process, input data are process in one neural, the result (like a signal) is passed to the next neurals in the chain to be further refined. Thus the more neurons and more layers these neurons are placed on to refine the data, the better the results. That means more servers produces the better results, and thus the richiest player in town has the ace in the business. We see this in crypto -- when computation power = better profit, we see resources conglomerate into syndicates. Take OpenAI for example, the aforementioned ChatGPT and DALL-E are both based upon this tech.
OpenAI is, according to microsoft (bing):
OpenAI is an American artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership. OpenAI conducts AI research to promote and develop friendly AI in a way that benefits all humanity. The organization was founded in San Francisco in 2015 by Sam Altman, Reid Hoffman, Jessica Livingston, Elon Musk, Ilya Sutskever, Peter Thiel and others, who collectively pledged US$1 billion. Musk resigned from the board in 2018 but remained a donor. Microsoft provided OpenAI LP a $1 billion investment in 2019 and a second multi-year investment in January 2023, reported to be $10 billion.
What we are seeing now, is massive corporations competing to push out the most ground-breaking and therefore, most attractive deep-learning product, as a result of the market having accumulated the cloud infrastructures required to do so.
The final piece of the puzzle is open-sourcing -- neural networks works by learning from data inputs with corresponding
responses. Think about teaching a toddler how to say any words. It's a relatively simple process given time -- the toddlers is constantly bombarded with phrases spoken around them every day, from its parents, its relatives, friends of the family, passers-by, and even the smartphones/computers/televisions/radios etc. They learn the sound passively most of the time, connecting the previous phrase to the next and trying to figure out meaning; there's visual data accompanying the sounds and vice-versa, and the parents can provide timely feedback with rewards or punishments. The neural network is made to immitate that process, so there needs to be a way to source a massive amount of data as well as their corresponding meanings.
The other day, I used a (at the time still free-to-try) AI image generator to give me pictures of straight bananas (don't wanna get into midjourney rn due to relative cost). This model is based on aversary network, with one AI discriminating the generator's results -- or, the discriminator learns from a preset of image-to-text data to learn to tell what is real and what's is generated from visual clues, and then it is fed inputs from the text-to-image generator to give results. The text-to-image generator adjusts its parameters to learn if its images are "correct" or not (the goal is to fool the discriminator) and the discriminator's goal is to tell the image generator's results from real images. But the problem is, you still need real data to further refine the discriminator. So when I chose not to regenerate images for straight bananas, I told the discriminator that these straight bananas looked like good straight bananas to me.
With OpenAI offering open-sources APIs, it is pooling developer's data from across the global, everyone is trying to jump on the train for generative AIs, and the feedback loop keeps going until the openAI project reaches their goal of creating result indistinguishable from man-made sources. IMO this is still web 2.0 -- the cloud-based business models. The AI is certainly transforming the market place, but that's not the point of AIs. The web simply made possible the acquisition of data in the amount and quality they could not earlier.
The end goal of the generative AI is to remove labour from creation, to this I've no doubt. You can generate a movie with nothing more than a script, and a script from a sentence, a sentence from a few words -- all these requires is but to assemble different models into one, one result fed to the next, easily accomplishable, streamlinable, automatiable. They have the tools to make it happen, and with the web, they are sitting there and getting the missing pieces they need to accomplish it.