Using AI to create a meme video with almost no human input.

I joined the Discord today to chat about AI and memes, and we got to talking about removing as much of the human element as possible. I had nothing better to do, so I figured I'd give it a shot.

I started with a "seed" phrase by using Google's "I'm feeling curious" feature, fed the resulting question into Cleverbot, had it argue with a different Cleverbot I had in an Incognito tab (by taking the output of one chat and using it as the input for the other) until I had a decent prompt, and put that prompt in a text completion app.

I placed that generated text into 3 different programs:

Melobytes, a text-to-music generator. I used default settings.
DALL-E mini, the popular text-to-image generator.
NaturalReaders, a text-to-speech website, using Charles (UK) at 0.72x speed for the voiceover.

Put it all together, and the result is an incomprehensible mess of a video. The only real creative input I had was rerolling the seed phrase and choosing the prompt from the Cleverbot conversation.

While I don't think we've reached comedy singularity and all humor will be generative, I think we've reached a tipping point where tools are both accessible and popular enough to make a more and more significant impact on meme culture in the coming years.



"The only real creative input I had was rerolling the seed phrase and choosing the prompt from the Cleverbot conversation."

I disagree, you also decided you were going to create a meme-video using AI, and chose the particular pipeline to achieve this. That in itself is a creative decision, which I'll argue is much less trivial than rerolling the seed phrase. As it is, this pipeline still seems like a lot of work for someone who isn't sufficiently motivated, though it could definitely be automated. It's also certainly possible to build a single neural network that automatically generated meme videos, though I don't think I've seen anyone do any such thing. With a good data set of meme videos that can be processed frame-by-frame it probably won't be that hard to do using a blend of a GAN for the initial frame and RNN for how it unfolds. Audio is a bit trickier but could be done in a similar way.