How To Script A YouTube Video In 20 Minutes Using AI
How To Script A YouTube Video In 20 Minutes Using AI
This post is part of the Complete AI YouTube Workflow series - a step-by-step breakdown of the exact system I use to run my entire YouTube channel with AI. You are on Article 9. This is one of the most important articles in the entire series - the scripting workflow is where your unique experience gets turned into a video that is genuinely worth watching.
- How I Run My Entire YouTube Channel With One AI Tool
- The 20 AI Agents That Do My YouTube Workflow For Me
- How To Find Proven Video Ideas In 5 Minutes Using AI
- How To Steal Your Competitor's Best Ideas (Ethically)
- How To Know If A Video Idea Will Work Before You Film It
- How I Write YouTube Titles That Get 10x More Clicks
- How To Generate Thumbnail Concepts Without A Designer
- How To Make Professional AI Thumbnails In Under 10 Minutes
- How To Script A YouTube Video In 20 Minutes Using AI (You Are Here)
- What's Actually Wrong With Your YouTube Channel (And How AI Finds It)
- How To Respond To 1,000 YouTube Comments Without Losing Your Mind
- How To Make Money On YouTube Using AI
Here is the mistake that kills most AI-assisted scripts before they start.
The creator opens a chat, types "write me a script about [topic]," reads what comes back, and posts it. The video gets views for about a week and then flatlines. The comments are polite but thin. Nobody shares it. The algorithm quietly stops pushing it.
The script was not bad. It was just empty. Technically correct, well structured, perfectly formatted. And completely devoid of anything a viewer could not have found anywhere else.
That is the low information gain problem. And it is the reason I built the scripting workflow the way I did.
The One Rule That Makes This Work
AI is only as good as the input you give it.
A script written entirely by AI contains zero information gain. Nothing the viewer could not have found by asking the same question themselves. No stories only you could tell. No opinions only you have formed. No results only you have achieved.
A script built around your voice notes, your real experience, and your genuine perspective has extremely high information gain because the unique human element is baked in from the very beginning.
This workflow starts with you. The voice note is where the magic lives. The AI organises, refines, and polishes. But it never replaces the human insight that makes a video worth watching.
The five sources of information gain are: personal experience you have lived through, client results with real numbers and specific details, original opinions that challenge conventional wisdom, proprietary frameworks you have developed yourself, and specific stories that only you could tell. The scripting agents are trained to find these moments throughout your script and label them [INFORMATION GAIN] so you always know exactly where your unique value sits.
The Full Scripting Workflow
There are five agents in this section of the system. Together they take you from a raw brain dump all the way to a filming-ready script. Here is exactly how they work.
Step 1: Record Your First Voice Note (Before Task 6)
Before you open Cowork, record a voice note. Talk through everything you know about the video topic. Every point. Every story. Every opinion. Do not structure it. Do not plan it. Do not try to make it good. Just talk.
The messier the better. This is your brain on the topic and it is the raw material everything else is built from.
Transcribe it using any tool you like. Otter.ai, Descript, or your phone's built-in transcription all work fine.
Task 6 — Brain Dump Organiser
Paste the transcript into Cowork and type:
Run Task 6 — here is my brain dump: [paste your voice note transcript]
Task 6 reads through the whole thing and finds the structure hidden inside it. It identifies your core argument and key supporting points. It flags your information gain moments with [INFORMATION GAIN] so they are never accidentally cut. It marks thin or underdeveloped sections with [NEEDS MORE] so you know exactly where to go deeper. And it organises everything into a logical flow: hook moment, context, key points, payoff.
One important rule: Task 6 never adds anything that was not already in your voice note. It only organises what is already there. Your ideas stay your ideas.
If the output has a lot of [NEEDS MORE] flags that is actually a good sign. It means you have a solid structure but need more depth in specific places. Record your second voice note focusing exactly on those sections.
Step 2: Record Your Second Voice Note (Before Task 7)
Now you have the logical flow from Task 6, record a second voice note presenting from that structure. Talk through the video as if you were speaking to camera. Use the flow as your guide but do not read from it. Just talk.
This second voice note becomes your 80% script. Transcribe it the same way as the first.
Task 7 — Script Refiner
Paste the second transcript into Cowork and type:
Run Task 7 — here is my second voice note: [paste transcript]
This is the task that does the most heavy lifting. Task 7 tightens rambling sections and removes repetition without touching your natural voice. It ensures the script follows the logical flow from Task 6. It strengthens weak transitions between sections. It labels every [INFORMATION GAIN] moment. And at the end it gives you an honest information gain score out of 10.
The scoring works like this. A score of 1 to 4 means the script reads like it could have been written by anyone — go back and add significantly more personal experience before filming. A score of 5 to 7 means there are good unique moments but the agent will flag exactly which sections need more depth. A score of 8 to 10 means your voice and experience are clearly present throughout and this video will genuinely stand out.
Do not skip this score. It is the most honest feedback you will get on whether your video is worth making.
Task 8 — Hook Writer
Your opening 30 seconds determine whether people watch the rest. Task 8 writes a complete opening sequence built on four parts.
The hook question: the exact question your ideal viewer is already asking themselves, written so they feel like you read their mind. The credibility statement: one to two sentences establishing why you have earned the right to talk about this topic, with specific numbers, results, and timeframes. The video structure: two to three sentences telling the viewer exactly what they are going to get and why it matters to them specifically. The open loop: a "but first" bridge that teases something surprising coming later in the video, making clicking away feel like they are missing out on something.
Run it by typing:
Run Task 8 — write my hook and intro for this title: [paste your title]
Task 9 — Fluff Reducer
Once your script is written Task 9 cleans it up. It removes repetition, filler sentences, overly complex structures, and any section where you are clearly stalling before getting to the actual point.
Critically, it never touches your natural voice, your conversational phrases, your [INFORMATION GAIN] moments, or any story or analogy that makes a complex idea easier to understand. It only removes the cognitive load, not the character.
Run it by typing:
Run Task 9 — here is my script: [paste full script]
Task 10 — CTA Inserter
The final step. Task 10 drops exactly two calls to action into your finished script in exactly the right places.
The first CTA sits after your first main point of value, usually two to three minutes in. This is the moment the viewer has just received their first genuinely useful insight and their trust is at its peak. The CTA connects directly to that moment, referencing what was just said and explaining how it goes deeper into exactly that thing.
The second CTA sits at the 70% mark. Shorter and more direct than the first. The trust has already been built. This is just a prompt for anyone who missed the first one or needs a second nudge.
Run it by typing:
Run Task 10 — insert CTAs into my script. My CTA is: [describe exactly what you want people to do and where you are sending them]
The output is your filming-ready script with both CTAs clearly labelled [CTA 1] and [CTA 2].
You Are Ready to Film
Your script has gone from rough voice note to filming-ready in under two hours. It sounds like you because it started with you. It is built around your unique experience because the agents were trained to protect that from the very beginning.
Pick up the camera.
Free guides on using AI to grow on YouTube
Subscribe Below