THE GAFFA TAPES

24 Nov 2025

The Man In The Box Syndrome

by Brian Coleman

Snippets from the archives of a bygone era

AI Music Videos: Who’s Really In Control?

Communicating with AI bots during my recent attempts to generate an AI music video clip took me back to a haunting childhood experience when Bobby Davis, a creative kid in my street, fashioned a cardboard TV packing box into a virtual jukebox. Inside the box he hid with a battery-operated record player, prompting me to drop sixpence in the coin slot and select a song. Decades later, with AI holding all the tools to produce a music video clip without leaving my home studio, I’ve had to relive the spectre of interacting with an inanimate object, bizarrely trying to reason and humanise conversations with it.

If you wanted to make a cheap music clip in the 1960s, you might have taken a look at Bob Dylan’s’ Subterranean Homesick Blues’, which was shot in 1965 on black-and-white 16mm film in the back alley of the Savoy Hotel, London. The clip is a continuous shot of Dylan flicking through a sequence of cardboard cue cards displaying deliberately misspelt song lyrics. However, as inexpensively as a clip like this could be made, there were few incentives for fledgling bands to make music clips in the 60s or the 70s and beyond, as there was no accessible screen media where these could be shown. Video clips were solely the domain of the recording companies that sent them to television music programmes as promotional material for their recording artists.

Things changed dramatically in the music video world with the rise of YouTube in 2005, when anyone with a camera could capture a band or a single artist or even use simple visual effects to make a music clip and upload it to a worldwide market. Now, with AI video generation in the hands of home studio buffs, sophisticated video clips that traditionally used VFX (visual effects), such as the Rolling Stones’ ‘Angry’, where vintage footage of the Stones comes to life on billboards, can be approximated with AI, but not necessarily with the same precision.

Although VFX has been around for some time now, including some free versions available for the domestic market, it’s a steep learning curve mastering industry-standard compositing tools, including chroma-key, masking, compositing, and tracking techniques. AI generation tools simplify the process by using text or image prompts or both. While AI simplifies and accelerates the process, VFX in the hands of professionals still reigns supreme. In 2015 I interviewed a VFX artist for an article for IF (Inside Film) magazine. He had worked on the pipeline of major VFX films, including animated features. I asked him what kind of characters had to be hand-drawn. He said, “Brian, they don’t draw anything!” That’s not entirely correct, as some of the anime and 2D cartoons do have a hand-drawn element before they become digitised. While not an exact analogy, CGI (computer-generated imagery) and VFX work similarly to the relationship between an AI image generator and an AI video generator.

I’ve always preferred to watch a simple video of a band or artist performing rather than CGI or VFX creations. The music video world hailed Peter Gabriel’s 1986 ‘Sledgehammer’ video that featured a variety of visual effects, including stop-motion and clay-motion effects where his face morphed into an array of pulsating fruit, and, for good measure, they threw in a scene with a pair of oven-ready dancing chickens.

While this video left me fighting the urge to gag, Gabriel’s 1987 (almost live) performance of ‘Sledgehammer’ in Athens is one of my all-time favourite video clips.

Richard Marx’ 1992 release of the song ‘Hazard’, with the lyric,“I swear I left her by the river,” begged for a narrative-type video to be made. Marx’s music concept came to him in a dream, which he coupled with a fictional murder mystery that became a narrative-driven short film. In my travelling sales rep days in the early 90s, I noticed a downhearted Asian girl working in an outback truck stop, which became my muse for a narrative-type song. I had some discussions with a potential partner who toyed with financing a music video, which never came to fruition because of the expenses involved. Now, some 30 years later, I was able to do the entire video on my home computer using AI video generation and some video and audio editing software.

With credit card in hand, I signed up for a 30-day free trial of Google AI Pro, which incorporates its image generator Nano Banana (Gemini 2.5 Flash Image) and Google’s Veo 3 video generator. For those who haven’t dabbled, it’s a good idea to initially use a text prompt in Nano Banana to get the image you want (you have virtually unlimited attempts at this). I also used ChatGPT image generator to get different perspectives. You then upload the AI image to Veo 3 along with a prompt of what you want the image to do in the video. Sounds simple, doesn’t it? Well, the process is very simple, but getting exactly what you want isn’t.

As advanced, and sometimes mind-blowing, as AI video generation technology is, it is still in its infancy, and if you think you are going to get seamless AI videos like those on show on various YouTube channels and TV ads, you may come away disillusioned. A lot of those creators are using plans that cost around $370 AUD per month. The initial frustration is that the basic Google AI Pro plan ($32.99 AUD per month) limits you to three videos (Veo 3 Fast) per day, each capped at eight seconds, with a total of 50 videos per month at 720p resolution. Google AI Ultra subscribers have access to 1080p and 25,000 monthly credits. Another frustration is that if you are a whiz on the QWERTY keyboard and happen to hit the ‘enter’ button to start a new prompt line, the generator takes this as a start-up with no warning, and you’ve just burned one of your three daily eight-second videos on the basic plan.

I could go on endlessly about how Nano Banana and other image generators can’t quite generate exactly what you envisage via a prompt, but the fact that Nano Banana allows virtually unlimited access is somewhat forgiving. There is a lot of talk about how AI generators can compose songs and generate artists singing those compositions. However, at the time of writing, Veo 3 cannot generate a character singing an original or cover song from an uploaded audio file.

One of my first attempts at image and video generation was an animation project. After hours of image prompts, I was able to get a satisfactory representation of a character from my former children’s show, Captain Aldo (an anthropomorphic seal). One of Captain Aldo’s songs, which was brilliantly sung and recorded by a former cast member of the show, could not be utilised by Veo 3. The workaround was to write the prompt in verses. Veo 3 then generated the character singing (kind of) my lyrics with one of its own trite melodies. I then uploaded it to my video editor and ran the original audio on another track. Then began the painstaking task of changing the speed and time-lapse of the video for the lip-sync before finally deleting the stodgy generated track.

*Captain Aldo audience Nano Banana AI image*

In my opinion, a video editor is essential to blend the various video segments together, and while Veo 3 can generate spoken audio in sync with a character, it often gets confused if more than one character is in the scene, so you get a comedy of errors with the wrong character speaking someone else’s lines.

Another problem is the audio sometimes sounds metallic or even robotic. There is also a problem with accents as well as inflections, which are the rise and fall of pitch in speech that can convey emotions; errors in this regard can also become comical. I found the workaround for this was using the antiquated film method of ADR (Automated Dialogue Replacement), which is dubbing a voice over the original audio in an audio editor. You can then even replace your own voice in perfect sync by uploading it to a serious AI voice generator.

For my narrative-type music video clip, Capricorn Dreamer, Nano Banana created several amazing images, including my despondent Filipina protagonist staring through her workplace window at a neon-lit outback truck stop. It also crafted flashback scenes of her walking hand in hand with her former true love on the beach in Boracay, Philippines. For the Boracay beach scene, I uploaded the image to Veo 3 with a prompt to transition from this scene to her fleeing in the night from a bamboo beach hut after learning that her lover betrayed her. I didn’t bother with an image for the bamboo hut scene; I gambled with just the prompt, and the Veo 3 video amazed me with a cinematic scene better than I’d imagined. However, in another scene my protagonist serves coffee at the truck stop and then walks ghost-like through a bench seat in the diner. After generating numerous scenes, it’s your video editor’s job to insert your song and make the clip, and although each scene is only eight seconds in duration, you can slow them to any speed in the editor for a slow- motion effect or to fit them to the song lyrics.

Things are changing rapidly in the AI world, and in late August 2025 came Google’s Australian release of its AI production toolbox, Flow (previously not available in Australia).

There is, of course, the costly Google AI Ultra plan that includes Flow, but there is also a free subscription that comes with 100 credits if you have a Google account. An eight-second video with or without audio will cost you 20 credits, so you have five free videos per month that utilise their new beta version of Veo 3 Fast with 720p resolution; 1080P is only available on the Ultra plan.

I’m enthusiastic about AI image and video generation, and yes, film clips have come a long way since we saw Bob Dylan flick through his song lyrics scribbled on sheets of cardboard acquired from a shirt laundry. But after suffering through a torrent of infantile AI blunders, the ‘man in the box’ delusion left me thinking that some of these errors were contrived. This brought out the worst in me, and I found myself sarcastically telling Botzilla that it wasn’t quite ready to take over the world. Of course, AI never responds in anger; it always apologises profusely, biding its time, quietly sharpening the axe in the knowledge that I’m on some dissident file awaiting futuristic AI retribution.

Published monthly since 1991, our famous AV industry magazine is free for download or pay for print. Subscribers also receive CX News, our free weekly email with the latest industry news and jobs.

Subscribe now