Summary: AI shows more empathy than human doctors | “Submit” considered harmful | Europe too slow for AI | New Runway version, new music video | How to prompt AI when writing questions for user research | Canva buys AI-image generator Leonardo | Midjourney now in release 6.1
UX Roundup for August 5, 2024. (Ideogram)
AI Can Show Empathy (Two Studies)
Decels often proclaim that AI is inherently inferior to humans because it can’t exhibit empathy. It’s certainly a valid debate whether artificial empathy is equivalent to human empathy, or whether it’s a different kind of beast. However, as an empiricist and user advocate, I think the more important question is whether people feel better treated by AI than by other humans. And the answer is often “yes.”
Bridging the gap between machines and humans: two new studies show that AI often shows more empathy than humans do. (Midjourney)
The question of how people on the receiving end feel about empathy has now been studied empirically in the field of medicine. A JAMA Internal Medicine paper shows that AI is rated as showing significantly more empathy than human doctors. (In this study, empathy ratings were given by a team of licensed healthcare professionals.)
45.1% of ChatGPT responses were rated as “empathetic” or “very empathetic,” compared to only 4.6% of physician responses. This amounted to ChatGPT having a 9.8 times higher prevalence of empathetic or very empathetic responses compared to human physicians.
AI shows almost 10 times more empathy than human doctors. (Ideogram)
A different study was conducted by Yidan Yin, Nan Jia, and Cheryl J. Wakslak from the University of Southern California. In this study, participants “described a complex situation they are dealing with” and received responses from either an AI or a human responder. In a further twist, these responses were labeled as either generated by AI or by a human, but only half of these labels were true (the other half of responses were falsely labeled with the opposite of the truth).
The participants rated the messages generated by AI higher than the human responses for empathy criteria such as “feeling heard.” The AI responses scored 5.64 on a 1-7 scale, whereas the human responses scored 5.13, with the difference being statistically significant at p<0.001.
AI is better than humans at making people feel heard when they discuss their problems. (Leonardo)
Sadly, being told that a response was AI-generated (whether this was true or not) reduced the participants feeling of being heard. This last finding raises the ethical question of whether to disclose the use of AI when communicating with customers or patients. The very fact of telling people that they’re communicating with an AI will make them feel worse. I would never recommend lying to people, so if they ask whether they’re talking to an AI, state the truth. But maybe don’t volunteer this if you care about users’ mental health.
A sign like this will make patients feel worse. Ethically, is it better to be forthcoming and disclose AI, or is it better to improve patients’ mental health? (Ideogram)
Replace “Submit” Buttons with CTA Buttons
Zuko (an analytics vendor) has a useful article series about designing the details of web forms. As is often the case with usability, there are definitely a lot of devils in those details. I’ll report on their article about “submit” buttons since this button label has been a bugaboo of mine ever since “submit” was chosen as the default in early web browsers in the 1990s.
Zuku states that a submit button often has the highest abandonment rate on their clients’ forms. (Sadly, they don’t provide specific analytics data to quantify this claim, but I can respect that they honor client confidentiality.) The article lists a variety of technical issues with web forms (worth checking, for sure) and then proceeds to the UX question that’s near and dear to my heart:
“Submit” is a button label that doesn’t explain what will happen if you click it. It has low (really no) information scent as to what the next step in the process will be.
It’s better to use button labels with a clear call to action: say what clicking the darn thing will do! Examples include “place order,” “register now,” and “book demo.” I’m sure you can think of examples from your own transaction flows of labels that tell people what the button will do.
If you encounter a “Submit” button in the wild, what will happen if you click it? This button label has low information scent and only survives from inertia because it was the original default label for web forms in the 1990s. (Midjourney)
Europe Adapts Too Slowly to New Tech to Keep Up With AI
McKinsey has conducted a new analysis of how labor markets in the United States and Europe are expected to change due to AI. The expectation is that 30% of current hours worked will be automated by AI by 2030. McKinsey counts what they euphemistically call “occupational transitions,” meaning people whose old jobs have become useless so that they need to be retrained for something else. They estimate the number of people to be retrained as 12 million in both Europe and the U.S., or 24 million total.
While this sounds like a substantial change, it’s less than the pace of change in the United States before the Covid pandemic. (During Covid, occupational change in the U.S. was even faster, but this is obviously not we would wish to be replicated.) In the U.S., McKinsey expects 1.5% of workers to change jobs annually due to AI during the next decade, but during the period before Covid, occupational change in the U.S. was 1.9% per year.
Europe faces much worse problems. Expected occupational change due to AI is the same as in the U.S.: 1.5% per year. But Europe has traditionally only seen occupational change at a pace of 0.7% per year. In other words, Europe must double its speed of adapting to new technology.
Unless Europe becomes better at redeploying people from useless jobs to the jobs of the future, McKinsey predicts 10 million additional unemployed workers in Europe by 2030.
The business executives surveyed for the report say that they already face skills shortages when they try to hire today, and that they expect them to worsen.
Conclusion: we need to be better at continuing education and upskilling people so that they can continue to be useful in the new AI-driven economy. This is a much worse problem in Europe than in the United States, according to this report.
Luckily, AI is a skills booster and can be a great educational tool. But R&D in developing AI-based training tools is sorely lacking. The need for these tools will soon be overwhelming, but by then, it’ll be too late to develop them. Better education is not something that springs into being fully formed like Athena from Zeus’s forehead. We need to start efforts now on how best to use AI to reskill old staff.
According to Greek mythology, the goddess of wisdom, Athena, was born as a full-grown woman from the forehead of Zeus, the king of the gods. She was ready for action immediately. Unfortunately, the same will not be true for discovering the best ways of using AI to retrain workers. We will need some time for R&D to discover how to do this best. The time for this research is now, before the need becomes urgent. By then, it’ll be too late. (Midjourney)
Runway Gen-3 Alpha: New Version of Prime AI Video Generator
I made a new version of my song about Dark Design, now with video from Runway Gen-3 Alpha, using image-to-video based on my old Ideogram thumbnail. Music now from Udio 1.5, as a remastered version of my original song.
The animation is clearly a good deal better than the version I made on April 21 with Runway Gen-2. (I used the same Ideogram image as the basis for both versions, so it’s a fair comparison.)
I don't know whether I prefer the original version of the music (made with Udio 1.0) or this new version made with Udio 1.5. Tell me in the comments which version you prefer.
There are clearly many rendering artifacts: the female singer’s glove comes and goes randomly, and her dress is different when zooming out. The male singer loses his trademark dark glasses in some clips.
The image created with Ideogram that I used as the image-to-video prompt for my new music video about dark design.
Runway Runs Faster
Just a few days after I made this music video with Runway Gen-3 Alpha, they launched Runway Gen-3 Turbo. The Turbo model is claimed to be 7 times faster than Alpha. I haven’t tried the new model since I’m done with video creation for now, but faster response times always improve usability.
Less waiting for time to pass when using Runway’s new Turbo model. (Midjourney)
How To Prompt AI When Writing Questions for User Research
Patrick Neeman published a useful guide to writing user research questions with ChatGPT. (The advice generalizes to most other frontier models, such as Claude, Mistral, Falcon, etc.)
Neeman starts with the obligatory (and true) disclaimer that AI makes mistakes and that a human should review the questions before using them with study participants. That out of the way, he presents a variant of the UX prompting diamond, with specific examples relating to this one step in the UX process: writing the questions we want to ask users.
Constructing a series of user research questions is like building a pyramid. You start with a foundation of very broad questions that don’t presuppose anything, and then you can build targeted questions on top of this foundation. (Midjourney)
Start wide. Decide on the broad topic you want to research, but no more. Then prompt for something like “Create 10 user research questions about XXX,” where XXX is the topic you’re researching. Maybe “checkout flow in luxury product ecommerce.”
Refine. Then focus on one or more specific features. For each of the chosen features, use the same prompt, but now with XXX replaced by each of those specific features.
Refine further, targeting personas. “Create 10 user research questions about XXX, as a YYY,” where YYY is a persona. Don’t use “Mary” as the persona because AI wouldn’t know what persona was given the name Mary. (At least right now, it wouldn’t. If we “unhobble” future generations of AI systems with long-term memory and the same AI was used to help create the personas, then it should know who Mary is.) Instead describe the persona’s job title (for B2B) or main usage circumstances.
Ask for answers. For each of the questions created in step 3, feed them back to AI and ask it to provide 3 detailed examples of answers to that question.
This fourth step does not generate actual answers from real customers, so the AI answers can’t be used as the basis for your design. (When AI simulates users, the outcome is a simulation, not reality. We conduct research to learn about reality.) The reason to have AI answer its own questions is simply as a reality check on whether those questions are likely to produce the kind of answers we want.
I like this series of steps for refining questions with AI help. Neeman has released a “uxGPT Research Questions” specialized AI assistant for this process.
Patrick Neeman has published an AI assistant to help UX researchers formulate both big-picture and very specific questions. (Ideogram)
Canva Buys Leonardo
Canva has bought Leonardo, one of the leading AI image-generating tools. Canva is a major platform for what you may call “amateur design,” with 190 million MAU (monthly active users), many of whom are marketing professionals or other non-designers who are empowered to create their own deliverables (such as slide presentations or social media posts) with Canva’s easy-to-use tools. On the other hand, Leonardo, with 19 M registered users (MAU not disclosed) is a high-end GenAI tool with many advanced settings that require substantial experience to learn. Probably on the order of Midjourney in UX complexity.
Selfishly, I hope that Leonardo doesn’t downgrade its capabilities after the acquisition, now that I have already spent the time to learn the tool.
I have mixed feelings about this acquisition. On the one hand, it’s good if Leonardo gets an infusion of investment cash to boost its development of better image generation, and maybe even video generation. (Though competition is super-tough right now in the AI video space.)
On the other hand, history is rife with cases where an excellent small tool built by a focused team was acquired by a bigger company only to quickly degrade as it lost that special sauce that used to endear it to users. Corporate blandness is bad for UX, especially for targeted tools.
I am a big Leonardo user, as shown by the following pie chart of the tools I used to create the 92 images I published in my newsletter between July 1 and August 1, 2024:
Midjourney is best for the pure beauty of its images and the wide range of visual styles it supports, but it has terrible prompt adherence.
Ideogram has the best prompt adherence and is the tool of choice when you want a complex scene. Ideogram also has the best usability of current AI image-generating tools.
Leonardo is a good compromise with good beauty (if not the best) and strong prompt adherence (if also not the best) after the release of its Phoenix model. (Leonardo is a very full-featured tool that supports many other models that offer several advanced options, but since they have subpar prompt adherence, I’ve given up on using Leonardo with any other AI models than Phoenix.)
Dall-E has fallen behind in image quality. It has decent prompt adherence but is somewhat hard to control due to its insistence on rewriting the prompt behind the scenes. OpenAI supposedly has a better image model internally, but apparently they feel isolated from competitive pressure since they’re not releasing it. The many examples of OpenAI keeping improved AI models from its customers are a key reason I resist regulatory capture of the AI market in favor of big operators.
If you can only afford one image-generating tool, I recommend Leonardo if you are sufficiently geeky to learn a complicated UI. If you are only an intermittent user (or if you know you’re bad at learning complicated tech), I recommend Ideogram because of its superior usability. (Ideogram is also the cheapest, at only US $7/month for an annual plan, whereas Midjourney costs $8 and Leonardo is $10. These prices are for the cheapest subscription plans that only allow for the creation of a limited number of images per month. I pay for upgraded plans for all 3 services.)
Midjourney Version 6.1
Midjourney has released the long-awaited version 6.1. They promise improved image quality and improved text rendering, but not improved prompt adherence which has long been Midjourney’s weakness (see above news item). I haven’t tried the text yet, but here’s my experiment with a prompt I have been using as a benchmark for all Midjourney versions since version 1: a stage performance of a 3-member K-pop group dancing in a TV production of a K-pop song, requesting either a color pencil drawing or a photorealistic image.
Top row of images generated with Midjourney 6.0. Botton row of images generated with Midjourney 6.1. Left column shows the color pencil drawings (with stylize set to 400), whereas the right column shows the photorealistic images (stylize 600).
You can make your own judgment, but I probably prefer the top row, which came from Midjourney 6.0. So, in this experiment, there was no improvement from this dot-release. If you want to see more side-by-side comparisons, check 10 prompts in both models from MayorkingAI and a further 8 prompts in both models from OscarAI. (Note that MayorkingAI presents the 6.0 images to the left and 6.1 to the right, whereas OscarAI uses the opposite sequence.) My conclusion from this bigger sample is that 6.1 usually, but not always, does have higher image quality than 6.0. Maybe I was just unlucky with my particular benchmark example. (Now I’m stuck with it, though, and I’ll reuse it for future Midjourney releases.)
I’m being nice to Midjourney in this little experiment, because several of the images it gave me included many more than 3 singers:
One of the images Midjourney made with more than three performers. I like this one for the color pencil drawing, but I would have to be generous to interpret it as three central singers plus 7 backup dancers. It’s also more watercolor than pencil.
See also the video I made comparing Midjourney’s image quality from version 1 to version 6. Since the improvement in 6.1 is so small, I won’t bother producing an updated video.
For one last experiment, I dialed the stylize parameter down to 40, giving us the following image. This time, I think it’s fair enough to interpret the picture as showing 3 main singers plus some backup dancers. Certainly a better sense of movement (if that’s what you want to communicate) than the more staged look of the higher stylize values shown above.
Stylize 40 instead of 400. (Midjourney)
Watching Less Olympics
When I was younger, I watched a lot of Olympic Games coverage, especially running and gymnastics. (My family’s only Olympic history was when my great aunt placed 7th in women’s foil fencing at the 1932 Olympics; she told us about it for decades.) For the last twenty years, I have found myself watching less and less of the Olympics. The athletes are just as good now, but the growth of narrowcast media with specialized coverage means that YouTube is more engaging for me. This year, I’ve watched zero Olympics. (Midjourney)