Column Until recently, personal computer hardware seemed to have outlived any demands that software might eventually place on it. Even high-end games—traditionally the leading edge of user performance demands—barely taxed the massively overclocked, high-end silicon available. Then came AI art.
Apple’s M1 Ultra microprocessor has a transistor count north of 100 billion. Nvidia has just released its main GPU RTX 4090, with 76 billion transistors – a triple increase over the previous generation, the product of the latest process node, and a devilishly careful attitude to power consumption. Almost 500W TDP? Fire it up and heat your home this winter.
But to what end? Fortnite battle royale 300fps? In April I wrote: “These monsters must be tamed, trained and worked.” Technology abhors a vacuum – four decades in the field has taught me that. Where there is capacity, something will come to use it.
That other shoe dropped in early September, when HuggingFace AI — a private company creating software tools that leverage cutting-edge artificial intelligence techniques — released Stable Diffusion.
Similar to systems such as DALL•E and Midjourney, Stable Diffusion levitates then reduces billions of images to symbolically weighted tokens that can be called back into view with an appropriately crafted text prompt. The whole thing sits just this side of witchcraft – yet it works remarkably well.
Unlike DALL•E or Midjourney, Stable Diffusion is both completely self-contained – capable of running on any sufficiently powerful machine – and pure FOSS. This meant that although the initial release required some of Nvidia’s highest-end GPUs, within a week project contributors had removed its code and reduced its hardware requirements. The current version can run quite comfortably on the rugged PC I bought six years ago to explore the recently reborn world of virtual reality – as well as on almost any M1-based Mac. Many gaming PCs and laptops can run Stable Diffusion well enough to use it for project-based creative needs – or just for fun.
Then a group of researchers published a paper about something they called Dreamfusion – able to conjure up an endless series of fully realized 3D models from textual prompts. Type in
pineapple, and the computer will have a thought, then generate its best approximation of what that model should look like. Although that group has not yet published its code, the paper provided enough of a blueprint for an ambitious coder to adapt the Stable Diffusion codebase to create Stable Dreamfusion—which, again, requires fairly powerful hardware.
Image produced by Stable Diffusion from the text prompts ‘Robot painting a picture while running on a treadmill’ … Click to enlarge
Not to be outdone, another group at Tel Aviv University amazed the world with the Human Movement Diffusion Model. This paper showed how researchers used Diffusion-based AI techniques to convert a prompt like “
the person walks forward two steps and does a cartwheel” into humanoid animation. A week later, the researchers themselves published their code as FOSS.
We’re still a little early in this exponential growth of AI capabilities to know where any of it will lead. Already, both Canva and Microsoft have integrated prompt-based image generators within their creative tools. Meta, Google and others have demonstrated proprietary prompt-to-video generators. Going by current trends, we won’t have to wait long until we have FOSS equivalents to play with.
The visual arts have powerful new tools that are not the exclusive domain of giants like Google or OpenAI – the latter company which promised to democratize AI at its foundation, but perversely seems to be focused on creating its own proprietary empire with Microsoft as its own. unofficial owner
In one of my first columns for The Register I signaled the end of the endless upgrade cycle for PCs. No more treadmill: good enough, they would only be replaced when they wear out. With the exception of a set of updates to accommodate pandemic video conferencing, that prediction proved correct.
But the personal computer shed its skin, revealing its sleek new form as a creative supercomputer: pervasive and creatively capable in ways the old computer couldn’t begin to approach. Rather than offering another new stylus or brush, these qualitatively different tools forge a new kind of creative partnership.
In June I bought a high-spec PC laptop – and immediately felt guilty about it, thinking I’d never actually get it to work. Today, I make full use of a machine that can do both the everyday and the incredible. In retrospect, that purchase looks like a clever bargain – a harbinger of a true renaissance – as the computer, reborn, starts to work. ®