actDave over at Dave’s Gaming recently began experimenting with a new AI rendering model at OpenAI named SeeDream. Among other things, one of its signature capabilities is that it allows you to submit up to 14 reference images when performing a render, which you can then reference by number. So you can feed it existing images of a few characters and then tell it to do things like: “(Character 1) shakes hands with (Character 2)” rather than having to fit all the full character descriptions into a single prompt. This opens up a world of possibilities that are difficult-to-bordering-on-impossible with CoPilot/DALLE – so we’re going to kick the tires a bit and establish exactly what this adds to our toolkit.
Edit: Here is a link to a Guide to SeeDream Prompts
Standard Rendering
So to start, we’ll just compare it to CoPilot/DALL-E using it straight. Here’s a fight scene from my latest chapter of Jasper in BG3. Karlach fighting wargs in a warehouse.
“A luis royo dark fantasy painting depicting blurred motion inside an abandoned medieval warehouse. rear view. A strong german warrior woman with a glowing axe gleefully tackles staggering large black wolves. She is a muscular tiefling, with red skin,horns, Burgundy punk hair,red eyes,and pockmarked skin,wearing rugged leather armor. She has a faint flaming aura.”
When I render this with CoPilot/DALLE-3, here’s a few samples:
And here’s trying the same thing with OpenAI/SeeDream:
There’s some interpretive differences – like how it interprets my description of the hairstyle. And the artist styles don’t quite match. But if anything SeeDance is matching my overall description better than CoPilot. So let’s make it harder. Here’s one from an earlier episode with an Owlbear in action:
“A dark fantasy painting of blurred motion inside an infernal manor. An owl-bear slams a paw into a glowing marble obelisk, smashing it into flying pieces. Three demon knights stab the owlbear with spears. It has an owl head and beak, the hulking body of a grizzly bear, bear paws, owl ears, and feathers along its forelimbs. The knights wear dark gothic armor and have horns and red skin. Hellish light.”
CoPilot/DALLE-3:
And SeeDream:
That is actually… a remarkably good take on my description. Again I’m not sure about the “style”, and I’d have to tweak the description a bit to get rid of those ridiculous ears. But other than that its a really good interpretation of my description.
Lets explore styles a bit. Here’s Jasper rendered in a few styles that I often use – “Luis Royo”, “Jeff Easley”, and “Larry Elmore”.
“A <ARTIST> portrait painting. A short small halfling smirks playfully at the camera. The stout halfling is Welsh, with a neat brown beard, wearing a swashbuckler hat and wears a fine red coat over embroidered leather armor. On a sunny forest path.“
In DALLE-3:
And SeeDream:
And getting really weird, we’ll do Jasper and the Emperor.
“A dark Luis Royo painting of a misty void. A short small halfling argues as a tall purple humanoid with a long squid head withdraws into the mist. The stout halfling is Welsh, with a brown beard, wearing a swashbuckler hat and wears a fine red coat over leather armor. The Mind Flayer has long tentacles, and tiny pinpoint eyes glowing purple. He is wearing dark chitin armor with maroon highlights and gestures threateningly.”
CoPilot/DALLE-3:

SeeDream:

And the pattern I’m starting to see emerge is that SeeDream in general does a better job of sticking to my description, while DALLE pays more attention to the artist style queues and is willing to “riff on the theme” more to give me more variants. Both have their uses.
A few other odds & ends I’ve noticed so far:
- SeeDance seems to be better about facial hair descriptions. If I’m having trouble with “bald” or “clean shaven” in DALLE SeeDance can often pick up the spare
- SeeDance has different content filters!! This one is *huge*. Some of the biggest pains in the butt for me when I’m doing a render episode is when CoPilot decides it just won’t do a render that it was perfectly happy with in a slightly different setup in an earlier render. People being tied up, injuries, and people lying down often trigger this. But sometimes it just starts objecting for no obvious reason at all. And SeeDance will often happily render exactly the same prompt that sent CoPilot into fits.
Multiple Character Rendering
This one is the big selling point. Being able to feed it pre-generated character images is *huge*. It theoretically means you can get amazing consistency rendering the same character in different scenarios. It also means the potential to render entire parties together in a way that is practically impossible with DALLE (which tends to cap out at about three characters – and even that is an exercise in frustration).
An example here is generating a group shot of Grim’s Gang getting ready for Deadfire. My first cut I simply fed it images of the characters that I liked and said “group shot” – making no effort at all to construct the scene carefully.

A few inconsistencies in terms of relative sizes, but overall a remarkably good attempt. And its not just cut-n-paste. Ace – for example – is in a completely different pose in the group shot that still fits his character.
Now if I want to put in effort to compose the shot, there’s a couple ways I could go. For a one-off, I would separately render each character in DALLE with the same style/background cues in the poses that I want, and then feed those to make the group shot. That can work very well for a single image, but that’s a lot of rendering if you’re going to be following a cast along on an entire adventure.
An alternate would be to render each character in a neutral pose with matching style/scene, and then use those same reference images to generate a variety of combos for a single episode. So let’s try that!
First we’ll generate portraits of everyone aboard a ship, and use those as reference images.
So we’ll start with a group shot:
“Group portrait aboard a galleon. Character 1 is grim with crossed arms. Character 2 is restless. Character 3 is short. Character 4 is daydreaming. Character 5 is analytical.”

Now that’s an interesting group shot – although I can’t quite make sense of the geometry of the ship. I’ll have to tweak that a bit. It happens with DALLE too. Not quite the same style either, but then I didn’t ask for that. Tweak those and I get this:

I’m loving the personality I’m getting in those shots! The main thing I think I’ll need to be careful with is relative heights – possibly by simply listing everyone’s height when setting up the picture.
Now let’s try something more dynamic. This also checks what happens when I have reference images for characters that I don’t specify:
“Watercolor. Argument on a deserted beach. Character 1 is 6’7″ and frustrated as character 4 is 5’8″ and being stubborn.”

Could use a little tweaking to show Grim a bit less expressive and Star a bit more so, but overall a pretty good attempt! How about a fight scene?
“Watercolor blurred motion. (Character 1) 6’7″ slams a quarterstaff down towards (Character 2) 5’9″ smirks as she spins and dodges out of the way while swinging a quarterstaff.”
Single Character Rendering
So how much can SeeDream vary stance/expression from the source image? Lets investigate by taking the same description and reference image, and just changing that.
“She is sneering in anger“

“She is angrily threatening the viewer”

“She is laughing hysterically“

“She is smirking mischievously”

Not quite as much variation as I would like, but I haven’t really experimented enough yet to know how best to coax dynamic expressions out of SeaDream.
Downsides:
- Time/Cost: It can take longer to generate complex images in SeeDream than in CoPilot/DALLE, and there is a cost. CoPilot credits regenerate daily, you get a lot of them, and you don’t have to pay anything to have a decent amount to play with. OpenAI credits are more limited and regenerate monthly – unless you pay for more.
- Artifacts: While SeeDream appears to be quite good at interpreting descriptions, it can get confused by lots of reference images – or by reference images that it just doesn’t quite understand. In my last Jasper BG3 episode I attempted to get Minsc fighting a Sahuagin by providing reference images and describing the battle. This had two issues:
Reference Images:
About 50% of the time it replaced the Sahuagin with an angry-looking fish:

The “battle motion” did not seem quite as dynamic and well integrated with reference imagery as it did when generating the scene entirely from text. The text prompt results just feel like higher quality images.
Using reference images:
Using a text prompt:
Overall, this looks like an excellent tool in the toolkit. I don’t think it replaces CoPilot/DALLE entirely for me – particular given the cost and seeming lack of support for artist styles. But I think I will use it heavily in my upcoming Deadfire run to really get to know it. It seems to do a great job with that watercolor fantasy style.































Leave a Reply