Preview of COLL-E

Collaborative generation of images with DALL-E and ChatGPT

When do you need COLL-E?

Sometimes you are a member of a group of people who want to create images with generative AI that serve some practical purpose, such as illustrating the ideas in a slide presentation or a website.

So the goal is not to create amazing “AI art” but to avoid having to search desperately in stock image collections, hire an artist who is available 24/7, etc.

But even with the best image generators, it usually requires a good deal trial and error to arrive at images that illustrate your ideas clearly and in an appealing style. The image generator has at best limited understanding of your intentions with each image and with the presentation as a whole, so you have to keep nudging it in the right direction.

If you’ve tried ad hoc methods like having group members paste generated images into shared documents, you will have noticed that the process tends to be unstructured, inefficient, and ineffective.

Enter COLL-E, which was designed specifically to support this type of collaboration within the more general Groupsheets platform.

COLL-E currently makes use of the latest available versions of OpenAI's DALL-E (for image generation) and ChatGPT (for help with prompt formulation). Other sources of image and text generation may be included in the future.

How can you use it, and what are the benefits?

The screenshots below show how COLL-E offers unique forms of support.

In the example, two colleagues, Anna and Peter, are creating a slide presentation with advice for students who are about to attend a professional conference for the first time.

(They can work in the same or different locations, at the same or different times. In these examples, they request just 2 images at a time.)

1. Set up a structure for the collaboration

What you can do

To get started, you set up a structure with (a) a reminder of the high-level topic of the set of images to be generated and (b) a set of ideas that need to be expressed with the help of the images.

Benefits

Group members can generate images for a given idea in parallel.
They can help each other elaborate their ideas.
They can generate mages for different ideas on the same canvas, partly to ensure that the images in the total set fit together well.
When asked to suggest a prompt, ChatGPT can take into account not only the specific idea to be expressed but also the overall topic.

2. Generate and rate a couple of images for one idea

What you can do

To generate images for the first idea, Anna has chosen from the context menu of the first idea the option “Let ChatGPT suggest a prompt” (she could also have typed in a prompt herself).
Before using the prompt to request images from DALL-E2, Anna has edited the prompt in the way indicated in the markup; she has left the default image style “Watercolor painting” unchanged.
The generated images are now being shown in the full screen, along with details about how they were generated.
She has assigned a 1-star rating to each of the two generated images to indicate that they are not directly usable but may be worth developing further.
Anna now has several options for developing this result further, either via the context menu for this entire generation attempt (as shown here) or via the context menu for either of the generated images.
If she chooses one of these options, Anna will see the results in a full-window box with a similar structure.

Benefits

Even though COLL-E offers a good deal of unusual functionality and repesentations (as can be seen from the first screenshot above), users are free at all times to use the full-window view shown here, which resembles most other UIs for image generation.
ChatGPT often helps to improve prompt quality by supplying a more detailed description of the desired image than people normally type in themselves.
The human participants can exploit their deeper understanding of the purpose of the image by editing ChatGPT’s prompt.
Because of the markup, all participants can see the respective contributions of ChatGPT and the human user to the prompt.
The star rating system enables Anna to record her assessment that both result images may be worth trying to improve upon. Otherwise, she would be more or less forced at this point to abandon one of the images and continue working with the other one.

3. (A bit later:) Inspect the “generation history” so far

What you can do

At any time, you can change the view to see how particular results were created.
In this view, each generation attempt is shown in a card within the tree-like structure shown in the first screenshot: A card Y is directly to the right of X if the generation attempt that Y displays is the result of choosing one of the generation options offered in X.
Here, we can see (in the top card on the right) how Anna tried to improve on the first image that she had obtained -- without actually getting a better result.
The card in the bottom right shows how, a bit later, Peter tried in a different way to improve on Anna’s second result image.

Benefits

The generation history helps you to learn quickly about what image generation tactics do and do not work in the specific context at hand. For example, Anna’s attempt to get more expressive and attractive faces yielded no improvement at all. Seeing this result, you would be less inclined to try the same tactic again.
The tree-like representation of generation attempts makes it possible for you to return at any time to any previous generation attempt and try to improve on it. So the group has many more possible paths to obtaining satisfactory results. (To use an analogy to computational local search procedures: You are not restricted to simple hill-climbing but can explore multiple paths as in local beam search.)

4. Get an overview of the most promising results obtained so far

What you can do

In this view, you see for each generation attempt only the resulting images and their ratings.
You can also filter out all images that haven’t been given a rating that reaches a particular threshold (here: 2 stars). Then some cards will be blank and others won’t be shown at all.
You can now select any of the most promising generation attempts to try to improve on its results.

Benefits

The ability to focus temporarily only on the most promising results adds further support for the parallel exploration of options just described.
It also makes it easier to select the final set of images that will actually be made use of -- including taking into account how well the images generated to express the different ideas fit together.

5. Focus on one image while still seeing the overall context

What you can do

At any time, you can tap on an image to view it in one of two larger sizes, while still being able to see how the image is related to the other results obtained so far.

Benefits

You can now decide what (if anything) to do with a given image while taking into account the other generated images and possibly their generation history.