Seedance 2.0 Prompting Playbook: Direct ByteDance's Video Model
GuidesPrompt EngineeringVideo GenerationByteDance

Seedance 2.0 Prompting Playbook: Direct ByteDance's Video Model

2026-06-18

TL;DR: Seedance 2.0 is a multimodal director, not a one-line prompt box. It reads your text and reference images together and splits them into a "space layer" (what's in frame) and a "time layer" (how things change). To get clean output you write engineering-style instructions, not flowery descriptions: define each subject explicitly, lay out the action as numbered shots, and use ByteDance's symbol syntax — () for music, <> for sound effects, {} for dialogue, 【】 for on-screen subtitles. This guide systematizes the official method for text-to-video and image-to-video, including the exact fixes for the three problems everyone hits: faces that drift, subtitles you didn't ask for, and unstable motion.

Most "Seedance 2.0 review" articles tell you it's powerful but has a steep learning curve, and then leave you there. The learning curve is real — but it's not mysterious. ByteDance published a detailed prompting method, and once you internalize how the model actually thinks, the curve mostly disappears.

This is that method, translated and organized for English-speaking creators. (Source: ByteDance's official Doubao Seedance 2.0 Prompting Guide — linked at the end.)

How does Seedance 2.0 actually interpret a prompt?

It treats your prompt like a director reading a brief, not a search engine matching keywords. Internally it decomposes everything you give it into two layers:

  • The space layerwhat's in the frame: subjects, scene, style, lighting.
  • The time layerwhat happens over time: actions, camera moves, the order events occur.

That's the single most useful thing to understand. A good Seedance prompt isn't a paragraph of adjectives ("a cinematic, emotional scene of a man running"). It's a structured instruction that feeds both layers deliberately: who, in what scene, doing what action, with the camera moving how, in what order. Write for the director, not the keyword index.

What's the basic formula for image-to-video?

Seedance 2.0's headline strength is reference — it can pull a face, a character, a scene, or a style from your reference images and build a brand-new video around them. The official sentence pattern is simple: name what you're extracting and what you want generated.

"Referencing <subject N> in <image N>, generate…" "Referencing the <style / scene / composition> in <image N>, generate…"

For pure text-to-video, you skip the references and go straight to the structured description below — the same eight-element formula applies either way.

This playbook covers the two modes you run on LinkModel: text-to-video and image-to-video. Seedance 2.0's native API also has separate operations for editing and extending existing clips — those are out of scope here.

What's the advanced formula for a polished shot?

When you want control, the official guide gives an eight-element ordering. Think of it as the order a director locks down a shot:

Precise subject + action detail + scene/environment + lighting & color
+ camera movement + visual style + image quality + constraints

Lock down who is doing what first, then where, then the mood, then how it's shot, and finally tighten everything with style, quality, and constraint words. The rest of this playbook is really just how to fill each slot well.

Define every subject explicitly

In a reference image with multiple people or objects, the model needs to know exactly which one you mean. Define it with 2–3 stable, static features:

"Define the woman in image 1 wearing a red dress and straw hat as Subject 1."

Then bind it consistently. For simple scenes, tag every mention as <subject>@<image> (e.g. Maya@image1). For defined subjects, reuse the same label every time ("the cop," "the thief"). The official guidance is blunt about face references: use one clean headshot plus one full-body shot — and do not use multi-view/three-view character sheets, because the model often reads the different angles as different people.

Write it as numbered shots, not one run-on sentence

Because the model decouples space and time, the ideal prompt for anything complex is a storyboard. Break the video into shots and describe them in order — who, where, doing what, camera how:

Bad: "A man runs nervously through the street, very cinematic."

Good: Shot 1: Side angle in an alley, the man starts running slowly, breathing hard. Shot 2: He knocks over a fruit stand; the camera whips and cuts to a close-up of his frightened face. Shot 3: He vaults a low wall and disappears; the camera slowly pulls back to the empty street.

One important quirk: the model is unreliable with precise timestamps (e.g. "0–3 seconds"). Don't force exact durations — organize by shot order and let it pace the story naturally.

Describe action like a movement coach

  • Be specific and quantify: "slowly raise a hand," "quickly turn the head," not "moves around."
  • Prefer small, slow, continuous motion over high-energy sprints, big jumps, or violent flips — those destabilize the output.
  • Show emotion through the body, not abstract words. Instead of "very sad," write: head lowered, shoulders trembling slightly, eyes rimmed red, fingers unconsciously clutching the hem of a shirt.

One camera move per shot

The model understands standard camera terms (close-up, wide shot, slow push-in, steady pan, locked-off). Use them — but only one move per shot. Asking for push, pull, pan, and tilt all at once makes the frame unstable.

The symbol syntax nobody tells you about

This is the section that separates people who've read the official method from people who haven't. Seedance 2.0 uses specific brackets to tell information types apart:

InformationSymbolExample
Music()(upbeat rock music plays in the background)
Sound effect<><a dog barks in the distance>
Dialogue{}{Hello, world} — for non-English lines, name the language: says in Japanese {こんにちは}
Subtitle【】【Chapter One: Departure】

Used well, these let you score the audio and dialogue of a shot precisely instead of hoping the model guesses. This is native audio-video generation — lean into it.

How do I fix Seedance 2.0's most common problems?

These are the three (plus a few) that every creator hits. The official fixes are surprisingly specific.

The character's face keeps changing (ID drift)

The usual cause is a weak face reference — a face buried in a busy full-body shot, or a multi-view sheet. The fix:

  • Add a dedicated headshot (head only, neutral expression, minimal background) alongside the full-body image.
  • In the prompt, split the reference: "Subject 1's facial features reference image 1 (headshot); styling references image 2 (full body)."
  • Put the most precise references earlier in the prompt.
  • Avoid multi-view character sheets entirely.

Subtitles appear that you never asked for

There's no 100% switch, but you can drop the probability a lot:

  • Add explicit constraints: "keep it subtitle-free," "avoid generating any text or subtitles."
  • If your reference image/video contains text you don't need, strip it first (with an image/video editing model) before using it as input.
  • Generate in landscape when possible — subtitles appear noticeably less often in landscape than in portrait. Crop to vertical later in an editor.

Motion comes out shaky or distorted

High-energy action is the usual culprit. The fix is to dial the motion down, not up:

  • Prefer small, slow, continuous motion — a slow turn, a measured walk — over sprints, big jumps, or violent flips.
  • Describe one clear action per shot instead of stacking several.
  • Pair it with one camera move per shot (see above); a busy subject plus a busy camera is what tears the frame apart.

A few more from the official guide

  • Twin/duplicate people: define each person against a specific image ("Maya (image 1)…"), add a global constraint forbidding identical duplicates, and use single-person reference photos — not three-view sheets.
  • More than 4 reference people: generate in groups of ≤4, then combine the grouped images into the final video.
  • Use HD reference images: low-resolution or heavily compressed inputs drag the output quality down with them.

One more counterintuitive tip from the docs: don't max out the reference slots. Seedance 2.0 accepts up to 12 images, but the recommended setup is 4–5 — say, 1–2 character images, 1 scene image, 1 style reference. Too many references and the model can't tell which features take priority.

Frequently asked questions

Is Seedance 2.0 hard to use?
It has a real learning curve because it's a multimodal, reference-first model rather than a one-line generator. But the curve is mostly about learning ByteDance's structured method — define subjects explicitly, write numbered shots, use the symbol syntax. Once you do, it becomes predictable.

What's the prompt formula for Seedance 2.0?
The official advanced formula is: precise subject + action detail + scene/environment + lighting & color + camera movement + visual style + image quality + constraints. Fill those slots in that order and you'll get far more controllable results than a single descriptive sentence.

How do I keep a character's face consistent in Seedance 2.0?
Use a dedicated headshot plus a full-body image, reference them separately in the prompt, place the face reference early, and avoid multi-view character sheets — they cause the model to read one person as several.

How do I stop Seedance 2.0 from adding subtitles?
Add explicit "no subtitles" constraints, remove text from your reference assets before using them, and generate in landscape orientation, which produces unwanted subtitles far less often than portrait.

How many reference images should I use?
Four to five, even though the limit is 12. The official guidance warns that too many references make it hard for the model to prioritize features, causing style conflicts and subject confusion.

The bottom line

Seedance 2.0 rewards people who treat it like a film crew briefing instead of a search box. Internalize the space-layer/time-layer model, fill the eight-element formula, write your shots in order, and use the symbol syntax to score audio and dialogue. Keep your reference set tight, lead with a clean headshot, and keep the official fixes for drift, subtitles, and shaky motion in your back pocket. Do that and you're not fighting the learning curve — you're directing.

This playbook systematizes ByteDance's official Doubao Seedance 2.0 Prompting Guide.

Try it now

Put this playbook to work on Seedance 2.0

Every prompt in this playbook runs on the real Seedance 2.0 from ByteDance — available on LinkModel through one API key, billed pay-as-you-go, with no per-provider sign-up and no monthly subscription. Open the Playground and generate your first video in a couple of minutes.

Related Posts