Preview of COLL-E

Collaborative generation of images with DALL-E2 and ChatGPT

When do you need COLL-E?

How can you use it, and what are the benefits?

1. Set up a structure for the collaboration

What you can do
  • To get started, you set up a structure with (a) a reminder of the high-level topic of the set of images to be generated and (b) a set of ideas that need to be expressed with the help of the images.
  1. Group members can generate images for a given idea in parallel.
  2. They can help each other elaborate their ideas.
  3. They can generate mages for different ideas on the same canvas, partly to ensure that the images in the total set fit together well.
  4. When asked to suggest a prompt, ChatGPT can take into account not only the specific idea to be expressed but also the overall topic.

2. Generate and rate a couple of images for one idea

What you can do
  • To generate images for the first idea, Anna has chosen from the context menu of the first idea the option “Let ChatGPT suggest a prompt” (she could also have typed in a prompt herself).
  • Before using the prompt to request images from DALL-E2, Anna has edited the prompt in the way indicated in the markup; she has left the default image style “Watercolor painting” unchanged.
  • The generated images are now being shown in the full screen, along with details about how they were generated.
  • She has assigned a 1-star rating to each of the two generated images to indicate that they are not directly usable but may be worth developing further.
  • Anna now has several options for developing this result further, either via the context menu for this entire generation attempt (as shown here) or via the context menu for either of the generated images.
  • If she chooses one of these options, Anna will see the results in a full-window box with a similar structure.
  1. Even though COLL-E offers a good deal of unusual functionality and repesentations (as can be seen from the first screenshot above), users are free at all times to use the full-window view shown here, which resembles most other UIs for image generation.
  2. ChatGPT often helps to improve prompt quality by supplying a more detailed description of the desired image than people normally type in themselves.
  3. The human participants can exploit their deeper understanding of the purpose of the image by editing ChatGPT’s prompt.
  4. Because of the markup, all participants can see the respective contributions of ChatGPT and the human user to the prompt.
  5. The star rating system enables Anna to record her assessment that both result images may be worth trying to improve upon. Otherwise, she would be more or less forced at this point to abandon one of the images and continue working with the other one.

3. (A bit later:) Inspect the “generation history” so far

What you can do
  • At any time, you can change the view to see how particular results were created.
  • In this view, each generation attempt is shown in a card within the tree-like structure shown in the first screenshot: A card Y is directly to the right of X if the generation attempt that Y displays is the result of choosing one of the generation options offered in X.
  • Here, we can see (in the top card on the right) how Anna  tried to improve on the first image that she had obtained -- without actually getting a better result.
  • The card in the bottom right shows how, a bit later, Peter tried in a different way to improve on Anna’s second result image.
  1. The generation history helps you to learn quickly about what image generation tactics do and do not work in the specific context at hand. For example, Anna’s attempt to get more expressive and attractive faces yielded no improvement at all. Seeing this result, you would be less inclined to try the same tactic again.
  2. The tree-like representation of generation attempts makes it possible for you to return at any time to any previous generation attempt and try to improve on it. So the group has many more possible paths to obtaining satisfactory results. (To use an analogy to computational local search procedures: You are not restricted to simple hill-climbing but can explore multiple paths as in local beam search.)

4. Get an overview of the most promising results obtained so far

What you can do
  • In this view, you see for each generation attempt only the resulting images and their ratings.
  • You can also filter out all images that haven’t been given a rating that reaches a particular threshold (here: 2 stars). Then some cards will be blank and others won’t be shown at all.
  • You  can now select any of the most promising generation attempts to try to improve on its results.
  1. The ability to focus temporarily only on the most promising results adds further support for the parallel exploration of options just described.
  2. It also makes it easier to select the final set of images that will actually be made use of -- including taking into account how well the images generated to express the different ideas fit together.

5. Focus on one image while still seeing the overall context

What you can do
  • At any time, you can tap on an image to view it in one of two larger sizes, while still being able to see how the image is related to the other results obtained so far.
  • You can now decide what (if anything) to do with a given image while taking into account the other generated images and possibly their generation history.