You get:
- videos that don’t work with sound off (50%+ of viewers)
- text that’s illegible or too fast
- no text retention hooks (viewers drop off)
- missed opportunities to emphasize key points
- lower retention and completion rates
But on-screen text is not optional.
It is how you retain viewers watching without sound.
- Hook text (0-3 sec): grab attention
- Key point text (3-30 sec): reinforce message
- Retention text (every 5-10 sec): reset attention
- CTA text (last 5 sec): tell them what to do
Without text planning, your video loses viewers who watch without sound.
This framework forces AI to plan on-screen text that retains viewers.
Assume the role of a TikTok video editor who plans on-screen text for retention. Your task is to create an on-screen text timeline. Generate: 1. HOOK TEXT (0-3 seconds) - What text appears - Visual placement (center, top, bottom) 2. KEY POINT TEXTS (3-30 seconds) - Timestamps for each text - What each text says - Visual placement 3. RETENTION TEXTS (every 5-10 seconds) - Short, punchy words or phrases - Resets viewer attention 4. CTA TEXT (last 5 seconds) - What text appears - Visual placement 5. STYLE RECOMMENDATIONS - Font, size, color, animation 6. TEXT TIMING SUMMARY - Full timeline with timestamps INPUTS: Video Topic: [WHAT IS THE VIDEO ABOUT?] Video Length: [30 SEC / 45 SEC / 60 SEC] Key Messages (2-4 things to emphasize): [LIST] Target Audience: [WHO ARE THEY?] Desired CTA: [FOLLOW / COMMENT / SAVE / SHARE] RULES: - Hook text within first 3 seconds (critical) - New text every 5-10 seconds (retention) - Text must be readable (large, bold, high contrast) - Keep text short (under 10 words per screen) - Use text to emphasize key points, not repeat audio - CTA text in last 5 seconds - Plan for sound-off viewing (50%+ of viewers)
- Hook text within first 3 seconds — critical for retention.
- New text every 5-10 seconds — resets viewer attention.
- Text must be readable — large, bold, high contrast (white text with black outline).
- Keep text short — under 10 words per screen (viewers won’t read more).
- Use text to emphasize key points, not just repeat the audio.
- CTA text in last 5 seconds — tell them exactly what to do.
Video Topic: How to raise freelance rates without losing clients
Video Length: 45 SECONDS
Key Messages: “Don’t apologize,” “Have a replacement client ready,” “Use this 3-sentence script”
Target Audience: Freelancers earning $30-80/hour
Desired CTA: SAVE (“Save this for when you raise your rates”)
This framework improves outcomes by forcing:
- hook text (attention)
- key point texts (emphasis)
- retention texts (reset attention)
- CTA text (action)
- style recommendations (execution)
Great on-screen text doesn’t just repeat the audio — it adds emphasis and retains viewers watching without sound.
Build Better AI Systems
Subscribe for advanced prompt engineering, AI social media tools, TikTok frameworks, and practical strategies for creators and marketers.
