Creating viral videos is no easy feat. Social media algorithms shift daily, making it tricky to crack the code for massive reach and engagement. But after countless experiments, I’ve found a sweet spot with Google’s Veo 3, a powerful AI video generation tool that can transform your ideas into cinematic, scroll-stopping content. At Avocado AI, we’re all about empowering teams to harness generative media AI to amplify their brand’s voice, and this guide is your starting point to do just that.
I’m sharing my master prompt structure for FREE to help you kickstart your journey. Before we dive in, let’s see the impact well-crafted AI-generated videos can have. Below are real results from Avocado AI’s experiments ******
(Ps we are using Avocado ai to generate content/video with Veo 3 for Avocado AI, so no bs just results and yes it’s a crazy inception yeah I know):

Before using Veo 3: Our brand’s reach was modest, with limited views and engagement.

After using Veo 3: In just two weeks, we saw a significant spike in views and reach, proving the power of strategic AI video creation using avocadoai.co
A prompt is a text-based instruction you give to an AI model like Veo 3 to generate a specific output—in this case, a video with synchronized visuals and audio. Think of it as a detailed recipe: the clearer and more specific your instructions, the better the final dish. A well-crafted prompt can turn a simple idea into a viral-worthy video that captures attention and sparks engagement.
Veo 3, developed by Google DeepMind, is a game-changer in AI video generation. Unlike other tools, it creates high-quality, 8-second videos with native audio—dialogue, sound effects, and music—directly from text or image prompts. Its advanced physics simulation, realistic visuals, and character consistency make it ideal for crafting professional-grade content that feels authentic. At Avocado AI, we integrate tools like Veo 3 into our generative media suite to help teams create effortlessly and scale their creative output.
**Anatomy**
A third-person filmed street interview in [LOCATION].
[CHARACTER] is standing on a tropical street in [SPECIFIC_AREA], holding a microphone. [CHARACTER_DESCRIPTION] with [APPEARANCE_DETAILS]. Calm, deadpan tone.
He interviews a stereotypical [LOCATION] expat — highly visual and exaggerated (e.g., party girl, crypto bro, spiritual coach, club promoter, fitness influencer). They are styled according to their cliché, and deliver their line with confidence, irony, or delusion.
Background: [BACKGROUND_ELEMENTS] — based on the character type.
Bigfoot (mic out, to expat):
“Why’d you move to [LOCATION]?”
Expat (delivering punchline, in character):
[INSERT_BRUTALLY_HONEST_OR_IRONIC_ONE_LINER_HERE]
🎤 Bigfoot (deadpan, to camera):
[INSERT_SAVAGE_MEME_WORTHY_REACTION]
Style: Third-person vlog. No subtitles. 8 seconds. Ensure mic is clearly visible and near the speaker each time. Background must reflect [LOCATION].
----------------------------------------------------------------------------
**Sample**
A third-person filmed street interview in [Bali].
[Bigfoot] is standing on a tropical street in Canggu, holding a microphone. He’s a large gorilla-like figure with brown fur, wearing a sleeveless green tank top. Calm, deadpan tone.
He interviews a stereotypical Bali expat — highly visual and exaggerated (e.g., party girl, crypto bro, spiritual coach, club promoter, fitness influencer). They are styled according to their cliché, and deliver their line with confidence, irony, or delusion.
Background: scooters, palm trees, trendy shops, beach clubs, smoothie bars, rice fields — based on the character type.
Bigfoot (mic out, to expat):
“Why’d you move to Bali?”
Expat (delivering punchline, in character):
[Insert brutally honest or ironic one-liner here.]
🎤 Bigfoot (deadpan, to camera):
[Insert savage, meme-worthy reaction.]
Style: Third-person vlog. No subtitles. 8 seconds. Ensure mic is clearly visible and near the speaker each time. Background must reflect tropical Bali.
Chill, don’t be scared—it’s not that deep and not that complex! This is just a detailed breakdown of the prompt structure. Keep in mind that Veo 3 only generates 8-second videos, so the prompt above is tailored and optimised for it. It’s designed not just for the time limit but also to avoid those weird, unwanted gibberish subtitles you might see in some video results.
| Detail | Description |
|---|---|
| Species | Large gorilla-like creature with brown fur |
| Clothing | Sleeveless green tank top |
| Personality | Calm, dry, deadpan, meme-level sarcasm |
| Behavior | Holds mic clearly, doesn’t overreact |
| Camera Style | Always third-person, Bigfoot fully visible |
| Presence | Mic visible, clean framing with interviewee |
| Tone | Straight-faced and chill, rarely emotional |
| Setting Type | Details |
|---|---|
| Gym Street | Scooters, palm trees, protein shake ads, gym branding |
| Café Zone | Trendy cafés, laptops, digital nomad signs |
| Beach Club | Neon signs, champagne buckets, DJs, scooters |
| Spiritual Market | Incense stalls, crystals, barefoot tourists |
| Yoga Retreat Area | Open pavilions, mats, flowing robes, nature |
| Rice Field Road | Green paddies, huts, barefoot influencers |