Zeno's Ziggurat


RPG characters with AI image creation

I claim no ownership or copyright of these images whatsoever. You may download and use them for whatever purpose you wish.


AI Concepts: Rendering Revisited


I recently made a discovery, while working on renders for Jasper’s BG3 adventures, that called many of my assumptions about how the AI works into question. I was attempting to quickly copy the query from one render for use in another while the AI was off thinking, and did a quick CTRL+A, CTRL+C to grab all the text and paste it somewhere else. When I did so I got something quite unexpected. In addition to the active query, I got the last 10-12 queries… some of which had been substantially re-written.

So lets look at what actually happened:

Original prompt from me:

dark luis royo painting;(tall muscular woman) exuberantly bear-hugs a (small halfling man);(tiefling woman,horns,polish,red skin,warrior,burguny punk hair,barbarian armor);(halfling,brown beard,swashbuckler hat,leather armor,red shirt);tavern at twilight

Example of AI “translation”:
A dark, atmospheric painting in the style of Luis Royo, depicting a cozy tavern illuminated by flickering candlelight during twilight. In the foreground, a tall, muscular red-skinned warrior woman, embodying a Polish tiefling with prominent horns and punk-styled burgundy hair, exuberantly bear-hugs a small halfling man. The halfling, with a charming brown beard, is dressed in a swashbuckler hat and rugged leather armor over a vibrant red shirt, creating a playful and warm sce

The first thing I noticed that it had taken the prompt I had written to minimize word usage (due to the AI’s word count limit) and expanded it to be quite verbose. To the point that part of the prompt was actually cut off! This tells us that being sparse with words doesn’t necessarily help much.

The second thing I noticed is that sometimes the text translation *completely louses things up*! It can introduce additional characters. It can reverse who is doing what to whom. All the traditional sorts of “what was the AI thinking?” errors of which we’ve grown so fond. And that’s when I realized a key point about all of this. When you write an image prompt you’re not talking to “an AI”. You’re talking to four… So this is what I now think is going on “under-the-hood” (fair caveat – this is just me theorizing).

  1. You write your prompt
  2. An AI checks for words/phrases that it finds objectionable and rejects the render if it finds anything it doesn’t like
  3. ChatGPT AI translates your prompt into colorful purple prose it considers more suitable for rendering. If its doing four images it does this four times with different variations on your prompt.
  4. Those variants are sent to DALL-E (or whatever image AI you chose).
  5. The four images that come back are scanned with an image interpreting AI, and any that it finds objectionable are rejected. This is why sometimes you get less than four results back.

The big revelation here for me was (3).

So after much experimenting, I’ve now switched to an “artist’s prompt” method of writing. Geared toward at least making sure the initial text re-writer understands what I’m going for so that the translation it sends to the image generation AI best matches what I want. Here’s an example below. The old way isn’t bad. It mostly has an idea what I’m going for. But the new way nails it much harder.

Old way:

dark Luis Royo painting; (tall purple humanoid with long squid head) on knees looking up at viewer; (humanoid, Illithid, Mind Flayer, menacing, long tentacles, tiny pinpoint eyes glowing purple, moist sheen, obsidian armor with maroon highlights, elaborate cloak); rocky plateau in a misty purple astral void

New way:

A dark Luis Royo painting of a rocky plateau in a misty purple astral void. A tall purple humanoid with a long squid head is down on one knee looking up at the viewer. He is an Illithid Mind Flayer. He appears menacing, with long tentacles, tiny pinpoint eyes glowing purple, and a moist sheen. He is wearing dark chitin armor with maroon highlights and an elaborate high-collar cloak.

4 responses to “AI Concepts: Rendering Revisited”

  1. Funny thing, I had experimented with the AI text enhancement feature at OpenArt and quit using it, because I concluded it made no real difference if my prompt was more descriptive or not.

    Now looking at your two sample sets, I’d say the second set of images is broadly more detailed (more my usual preference than yours?). And the second set doesn’t have any complete misses. Although the first set only includes one total failure (the one with a central eye).
    And this is a pretty small sampling.
    I admit to not being very familiar with Luis Royo, but the first set looks more like paintings the second set looks more photographic.

    Can you say in your own words what all you feel the difference is that you’ve learned? Like is it less likely to fail utterly? Is it more likely to give good options? Or is it a bigger impact on the style and mood?

    1. I think the biggest difference is that I’m more likely to get what I asked for. In my old style the text AI is guessing how to connect my words and phrases into a coherent description. And sometimes it gets it quite wrong. In the new style it better understands what I’m trying to say.

      I’d have to do some digging to come up with some better examples of where the first one goes wrong. I’ve literally had it rewrite my text in such a way as to describe a completely different scene than I had in mind. Like turning my description of Karlach into two different characters and making up their interaction out of its own imagination.

      1. Oh I believe that exactly, I’ve drawn similar conclusions from using a different AI for renders. Its helped me pinpoint what might be wrong with my description.

        But that was all very helpful, thank you!

      2. If you can get my trick to work it can be very illuminating when things wonky. You look at what it’s doing to your prompt and you can often see where it’s going off the rails. Then rephrase your prompt to fix the confusion.

Leave a Reply