Data Solutions

AI on canvas. How to create images using Midjourney?

date: 16 February 2023

reading time: 10 min

The advancement of AI has been on everyone’s lips recently, and justly so. Given the latest developments in the field of AI, the Design Team at Future Processing decided to test the capabilities of Midjourney – an AI program that generates images from textual descriptions. How did AI handle the tasks it was assigned to perform? What functionalities and options does this tool offer?

Marta Papiernik, our Internal Communication Specialist, is talking to Robert Olszewski, who will let you in on what to pay attention to when buying a licence for the program and how to formulate specific commands. They are also touching on the issue of copyright and they are discussing the possible futures of the cooperation between human imagination and bots.

First steps with AI

What was your first encounter with Artificial Intelligence?

Robert: It all started with Photoshop. One day, I discovered some new tools had been added that allowed automated AI-based photo editing. These tools make it possible, for example, to cut out objects from the background or change the face expression, and many more. Sometimes, the results are good, sometimes not so much. What I find great is that Artificial Intelligence releases users from performing the most cumbersome manual tasks. I tested the option of colouring black-and-white photos and the results surprised me. Photoshop did a really great job. With just one click.

Why did you decide to try Midjourney?

I was really lucky to come across Midjourney. Simple tests yielded promising results, so I decided to continue working with this program. Before I learned how to generate 16:9 images, I created square graphics. They cannot be generated again in other proportions: the same input will always give a different output. And I already had some great outputs which I didn’t want to lose. For this, I used DALL-E – this AI system can be successfully applied to generate images and edit the existing ones, but Midjourney creates better artworks “from scratch”.

MidJourney generated this image in a square format.

Here, the proportions were changed using DALL-E. Notice that DALL-E did a decent job adding missing elements in the background. I just had to select a fragment of the image and write a command saying what to do next.

How does Midjourney work?

For a layperson, this may sound easy-peasy. You just need to type a couple of keywords, and there you go. In fact, Midjourney offers a voluminous user manual. I only got acquainted with some parts of it to pick up the basic commands and options. But I actually learned how to use the program by trial and error. I tried various prompts, for instance “Michael Barbell sitting in an old car with lots of friends”. The result was not very impressive:

Michael_barbells_body_wearing_corduroys_standing_next_to_a_car

Michael Barbell sitting in an old car with lots of friends.

It turns out that the images are generated as part of a process: you need to formulate the commands in a given structure and use more and more precise input every time, for instance, by defining the main element of the image and the background separately. That’s what I did: gradually, I added more accurate prompts. In the end, they looked like extensive setting descriptions from a 19^th-century novel. We might be witnessing the rise of a new profession: an AI artist.

And what do we pay for in Midjourney?

You pay for GPUs: the time Midjourney needs to generate an image. The more complicated the image, the more working time is needed. Just by learning how to use the program, I spent about 225 GPU minutes.

You need to be very careful when you buy a licence – always read the small print. In the case of Midjourney it turns out that in the Standard Plan you get access to the Relax Mode, which doesn’t charge the user for the time used – then again, it needs more time to generate images. The subscription is renewed monthly and the Standard Plan costs a lot. On top of that, once you cancel your subscription, you can no longer use Midjourney, even if you have some hours left. Remember about this.

The Relax Mode in the Standard Plan is a real game changer. It allows you to play with concepts and learn without pressure. You can create sketches and drafts, search for ideas and points of reference, and then use the Fast Mode to produce a high-quality image. If you want to try this program, I recommend buying the Standard Plan. You can also use the free trial, which offers only 25 GPU minutes (0.4 GPU hour).

How to create precise commands that will bring the desired effect?

First, you need to precisely define the elements that will create your image. The standard input is divided with commas. I started using square brackets and plus signs for my input. I didn’t find this in the manual but I noticed other people were doing just that. That way I managed to produce better results, as the image was divided into contextual segments.

Let’s take a look at this example of an image presenting a lemur:

/imagine [yellow lemur sitting next to bread] + [himalaian mountains] + [ferris wheel] –q 2 –ar 16:9

Once you enter your input, Midjourney will generate four suggestions. You can choose one of them and:

generate other four based on the one you’ve selected,
or, opt for the version you’ve chosen and use the Upscale Feature to improve its quality.

In the Remix Mode, you can add objects to the picture – for instance, make the lemur wear glasses. However, the program may fail to place objects correctly: then, the glasses will end up somewhere they don’t belong.

This is version 4 after upscaling. Now you can finish the project, using the Remaster Feature.

This is the remastered version. As you see, it’s more realistic and a lot of deformations have been eliminated. This is a high-resolution image in the Relax Mode. This way, you can produce a picture without using the time limit in Midjourney.

How did you define the style, colours, and general look of the images?

You can define the look of all the elements:

specify type of lighting (e.g., studio, cinematic, soft, hard, night);
request a style that is a reference to popular productions or artists: to achieve that, add the style at the end of the prompt, after a comma (or after a plus sign), for instance, “Van Gogh”, “Picasso”, or “Naruto”, “Ghost in the Shell”;
describe something less precise, like a general painting technique: “watercolour”;
choose the render type: a hyperrealist photograph or the so-called octane render (including detailed illustration realism).

A lot depends on your contribution and imagination.

The octane render creates an interesting style – a bit dark, very detailed: /imagine hawk looking down over a board game, realistic, octane render –ar 16:9 –q 2

An example of the use of prompts without brackets with a nice final effect. The 16:9 proportion was not defined here: /imagine orange rockman badger, energetic, music stage background, realistic photo –q 2

You selected themes that created an abstract description. How does Midjourney handle detecting all the elements and building a consistent whole out of them? Did you type in an entire command at once, with all the elements for a given image?

At the beginning of my adventure with Midjourney, I worked in iterations: I simply entered particular elements in the description, one by one. For example, to create a portrait, I typed “old car” in the input and waited to see the output. Then, I added “people” to the command. Then – “man wearing corduroy trousers”. With time, I felt ready to form extended input descriptions.

Extended prompts can work fine, too – there is no fixed rule. This image was created without additional parameters: /imagine Michael barbells body saying let it be, tincture person, fun and smile, always okay

A more extended input defining lighting, quality, and proportions, grouped in brackets: /imagine [little old car full of people inside, abstract man standing next to a car, head like barbells] + [crowd in far background] + [pants on top of a car], Cinematic Lighting –q 2 –ar 16:9

Several attempts at generating an autonomous spaghetti superman. The input included the style of lighting and painting technique. As you can see, Midjourney failed to represent spaghetti: /imagine storks flying, [superman, head made of spaghetti], beautiful lightning, soft Lighting, watercolor –q 2 –ar 16:9

How much time does it take to generate one image?

It’s really hard to say – from 15 minutes to an hour and a half, on average. This depends on how much impact you want to have on the final effect, how you manipulate and modify the image, how many versions you want to generate, how many objects you want to move, how many layers you’re going to add… You can also retouch the photo. In the Relax Mode, you can keep polishing a single image for a really long time: this mode is perfect for conceptual work. Then, you need to upscale the final effect in the Fast Mode, which will charge you for more GPUs, or working hours.

Is AI a danger to creators? Who is the real creative power?

You’ve mentioned that the final effect is largely dependent on the person’s imagination but you were also surprised by the outcomes.

Yes, all the time. We experimented with the program a lot with our team. Once, we asked it to generate a black Alfa Romeo on Tatooine. The result totally exceeded our expectations:

Midjourney V3: /imagine [alfa romeo giulia quadrifoglio on Tatooine] + [black], star wars, realistic –q 2

Midjourney V4: /imagine [alfa romeo giulia quadrifoglio on Tatooine] + [black], star wars, realistic –q 2

AI-generated images have a touch of uniqueness about them. Like I’ve said, every response to the same prompt is different. I managed to generate an abstract image of a man with a barbell instead of his head but at every other trial AI produced something else, like a superrealist image of a muscular guy on a beach. It was clearly based on an existing photo.

This brings us to the fundamental questions. Whose work is this? Who is the creator here? You, or AI? Or the author of the original element used to build the new picture?

Midjourney and other programs of this type use existing graphics. Without the training data, Artificial Intelligence is unable to create anything new.

How does it work exactly? First, the base is prepared for the picture. The workspace is divided into a number of sections of any shape. At this stage, the segments already bear contextual descriptions. Then, Midjourney extracts images from its database: it takes the necessary elements and fills the previously prepared sections with them. All the elements are then combined using a variety of tricks: the seams must make visual sense so that the picture won’t look like a poor cut-out or collage. At the end of the process, filters and styles are added.

The fact the Midjourney relies on an image database is the reason why artists, illustrators, and graphic designers have mixed feelings about it. In fact, more often than not, the bot exploits their works without their permission. Sometimes, deformed signatures of artists are actually visible on the generated images. So, who’s the one infringing the copyright? The creator of the bot? Or the person using Midjourney and reusing the images from the database? Or perhaps it’s the bot that should be charged with the infringement?

This issue opens a new chapter in the way people think about graphic design. I agree with the community of graphic designers who claim that a new field of law should be set up to regulate the problems connected with the use of Artificial Intelligence.

A new version of the program – Midjourney V4. The result above was achieved using the prompt: /imagine [king, baby face] + [nightforest in background], hyperrealistic –q 2

There are concerns that programs like Midjourney might drive illustrators out of the market.

And they are not unjustified. As an experiment, we entered the following command in the program: ‘King babyface night forest background’. AI generated a stunningly beautiful face of a baby king with blemished cheeks. This artwork could actually be used, for example, in a game, without changing anything. So, it may happen that board game artworks will not be commissioned to humans – Midjourney can create them for a fraction of the price.

What is your opinion on this?

I don’t like the aspect of copyright vagueness – the problem of robbing people of their achievements and creativity. In fact, when creating animations for the Future Processing campaign ‘Humans aren’t robots‘, I also rely on ready-made elements to speed up my work – but these are licensed artworks that I pay for.

Do you see Midjourney as an opportunity, though?

I’m defending this program because I see it as a wonderful source of inspiration and a prop at the stage of conceptual work. This really can take a load off your mind – when you’re having a bad day, you’re unable to generate even a single good idea on the spot, and the deadlines are merciless. AI can always suggest something that can be recreated, either with the use of stock images or with images you’ve created on your own.

Then again, “humans aren’t robots”: they can’t be creative on command.

Whereas bot tools like Midjourney can. I treat my relationship with AI as a kind of symbiosis. I use the images suggested by AI (for instance, four samples generated by the same command), then I add certain objects, move them, and narrow down the result to fit in a particular style.

Doesn’t it take away something from you? Like a sense of achievement or originality?

Not really. To get the expected results, I need to put effort in this work, which is a form of achievement.

Thank you for this interview, Robert.

Thank you.

Data_Solutions_Consulting_Future_Processing

Data Science and Engineering

Process data, base business decisions on knowledge and improve your day-to-day operations.

Let’s work together