top of page

UX Roundup: 20K Subscribers | Voice Sound vs Character | Voice UI for Old Users | CoT Explaining AI | Ideogram Release 2a | Easy Avatars | Track AI in Analytics | Gemini Images

Writer: Jakob NielsenJakob Nielsen
Summary: 20,000 newsletter subscribers | Voice sound vs character | Voice UI for old users | Does chain-of-thought help users understand AI? | Ideogram release 2a | Humva improves the usability of avatar creation | Tracking AI-derived traffic in website analytics | Gemini generates both text and images

The UX Roundup newsletter for March 17, 2025, is delivered to one of my 20,000 subscribers by a highly trained messenger tiger from UX Tigers. (Midjourney)


I know this newsletter is long, but you can watch a video summarizing the top stories from this newsletter in 3 minutes (YouTube).


20,000 Newsletter Subscribers

I recently passed 20,000 subscribers to my email newsletter. I reached the 10,000 subscriber mark in January 2024, so it’s taken me 14 months to double the subscriber base.


I made a song to celebrate this milestone (YouTube, 2 min.).


Every week, my newsletter is delivered to subscribers by swift carrier pigeons. Some people may receive their newsletter via Internet electrons instead. (Leonardo)


I now have subscribers in 148 countries and territories, whereas I only covered 126 countries and territories last year. A gain of 22 countries doesn’t sound like much, but there are now very few countries left in the world without subscribers, so it’s a hard number to grow much above 150, which I feel confident of reaching later in 2025.


My newsletter pretty much covers the globe these days, with only a few hold-out countries. (Midjourney)


Here is the distribution of subscribers by continent:



This is almost identically the same as the distribution when I hit 10,000 subscribers last year. Most importantly, Europe and North America each account for one-third of subscribers, with the rest of the world filling up the last third.


The only differences are that Africa and Asia have each grown by one percentage point, and Latin America has dropped by two percentage points. However, Latin America is only down in relative terms. The absolute number of my subscribers from Latin America increased by 70% during the year. The only reason this caused a relative drop is that the growth rates were higher in Africa (173%) and Asia (115%).


The ten countries with the most subscribers are:


  1. USA

  2. India

  3. United Kingdom

  4. Brazil

  5. Canada

  6. Germany

  7. Spain

  8. Australia

  9. Poland

  10. Italy


The only change in the top-10 membership is that The Netherlands dropped out (it’s now number 12 after a growth of only 96%) and was replaced by Poland (which grew by an impressive 218% during the year).


Even though email is the oldest Internet media form, it’s still the best way of reaching an international audience of committed followers regularly. Carving rune stones would have been truer to my Viking ancestry and would also achieve more permanence than fleeting email messages, which tend to be discarded after a week. But it would be a hassle to deliver 20,000 boulders to a worldwide audience. (Ideogram)


My email newsletter uses permission marketing — a term coined by digital-marketing guru Seth Godin in 1999. I only mail people who have explicitly opted-in by signing up for a subscription to the newsletter. (I detest spam, and opt-out is a poor excuse for sending people unwanted emails.)  Even though email is an ancient media form as Internet media go, it remains the best way to maintain customer loyalty and stay connected with fans on the Internet because email offers a direct connection with subscribers.


Every time I press that “send” button, 20,000 inboxes in 148 countries populate with my thinking. That can’t be beat. I may have 8 times as many followers on LinkedIn, but posting there only reaches a small fraction of the people who read my emails. (Or, to be more precise, who open my newsletter. How much of it they read, I don’t know.)


I get about 16 times as much exposure from sending out an email newsletter as from posting on LinkedIn. Given that I have 8x as many LinkedIn followers, we can easily calculate that an email subscriber is worth 128 times as much as a LinkedIn follower in terms of getting my message out.


Email beats social media big time, at least for a thought leader like myself who prefers to post long and strongly argued messages. People who deal in the quip of the day probably have a different experience.


To round out the Internet media types, video is yet another story: Each video I post is viewed by about 10 times as many people as the number of subscribers to my YouTube channel. What drives video views is incidental discovery powered by enticing thumbnails that people are attracted to click.


Video is a “hot” media form in the classic Marshall McLuhan classification: In his classic book Understanding Media (1964), McLuhan distinguished between hot and cool media based on their sensory engagement, resolution, and audience participation. McLuhan defined ‘hot media’ as those that deliver a high amount of detailed information to a single sense, thus requiring minimal active participation from the audience. Hot media present content in high definition, leaving little room for the audience to fill in or interpret details.


These media typically emphasize clarity, precision, and completeness of sensory data, demanding relatively little imaginative or interpretative effort from the audience.

In contrast, an email newsletter is decidedly cool, and thus more intellectual. Video packs a harder punch and can emote more than what I can accomplish with the newsletter format. That’s why I use both formats and have been experimenting with avatars (see story below) to widen my reach.


The email newsletter is a “cool” media type, whereas videos are a “hot” media type, in McLuhan’s classification. This may mean that the long-term impact is higher from growing my avatar videos than from the newsletter, since videos may be more persuasive and memorable. (Leonardo)


Currently, my YouTube channel racks up about the same number of weekly video views as the number of page views my articles get that week. (Mostly driven by the newsletter, but with an increasing share coming from search engines and AI agents.) However, I would not be surprised if videos were to dominate next year, with my avatar videos and UX songs getting two or three times the traffic of my written newsletter. (This assumes continued progress in AI avatars so that they will look perfect and have lip-synch as good as a video of a live influencer. Plus, further advances in text-to-speech generation so that the avatars’ AI voices become better than the best human voice actors. Both are plausible within a year and almost guaranteed within two years.)


However, the email newsletter remains nearest and dearest to my heart, so I am happy to see it grow. There’s also much to be said for reaching a core group of loyal followers instead of the fickle YouTube audience who are attracted by thumbnails instead of loyalty. A big “thank you” to all my 20,000 subscribers — especially to the 930 people who subscribed during the first month, back in the ancient days of 2023 when I started the “Jakob Rebooted” project of becoming an independent influencer after too many years in the salt mines of UX consulting where I had to bite my tongue to avoid offending clients.


My heartfelt thanks to all my subscribers — especially the faithful early ones. (Midjourney)


Today, I don’t care who I offend. I can be my true self again. Whoever is offended — let’s say, if they prefer a different style of avatar, is certainly welcome to unsubscribe. I run a fully opt-in operation and don’t want to impose myself on anybody who doesn’t like my authentic self or my style. Many times when I see people complain I feel like offering them a refund of their subscription fee, but then I remember that this would come to exactly zero dollars and zero cents, so I don’t bother cutting that check.


One last thing about my media outreach, whether by newsletter or video: if you consider a similar effort, don’t be discouraged by low subscriber numbers initially. It took me 21 months to grow my newsletter to the current 20K subscriber base, and in the first year of my video channel, weekly viewership was less than 10% of what it is now. I have done this before: I started my first newsletter in 1997 when I was a Sun Microsystems Distinguished Engineer, and we were all excited about the explosive growth of the dot-com wave. My second newsletter now has twice the subscribers that my old newsletter had at the equivalent time in its history (21 months after launch). You always start slow and then grow. Persistency is the key to newsletter success.


To assess the progress in AI video, compare the music video about the 20K newsletter subscribers with the music video I made only 2.5 months ago about hitting 150K LinkedIn followers.


Voice Sound vs. Voice Character

Olivia Moore is one of my favorite analysts regarding the use of AI. She’s a venture capitalist, not a user experience professional, but so far, the UX field has failed to produce much insight into AI use, so business analysts are all we have.


Moore recently posted her insight that the advanced voice modes of ChatGPT and Grok differ in an important way:


  • ChatGPT offers many voices, but only one character (personality).

  • Grok has only one male and one female voice, but that voice can speak as one of several different characters (i.e., take on different personalities).


This difference is similar to the difference in traditional web design between superficial visual design and the tone-of-voice decisions in content strategy. You can have your website look colorful or drab, but that’s a different level of UX design than the tone of voice the site uses to communicate its message (let’s say, upbeat or straightforward).


Ideally, appearance and content are aligned, but they are separate design decisions.

The same is true for an AI’s spoken voice and its personality. You can employ a gruff voice to speak soothing words, but aligning the voice and the personality would be best.


Grok’s voice mode only offers one female and one male voice: Ara (described as “an upbeat female voice”) and Rex (described as “a calm male voice”). (Leonardo)


Two of Grok’s multiple voice mode personalities are “genius” and “unhinged.” Deciding the voice and the personality for your AI are two very different ways of customizing the interaction. (Leonardo)


Voice Mode for Old Users

Søren Hafsjold Mohr (AI Lead at Nestlé Nordic) has an endearing story about how his 89-year old mother started to use AI. (In Danish only, sorry.) She has low vision and doesn’t want to type a lot, but the ChatGPT advanced voice mode was a perfect fit for her. Mohr reports that it was very easy for his Mom to use ChatGPT and that she quickly stopped thinking about it as something special and simply engaged in conversation about topics she was interested in.


This type of natural interaction is exactly what we want. One less natural feature is that when the user asked the AI about its name, it answered “my name is ChatGPT.” Which is true, but not exactly a name the promotes engagement. We should abandon these nerdy product names to reach a wider audience.


Voice mode is a natural and appealing way for older users to interact with AI, though it ought to have a better name. (Leonardo)


Does Chain-of-Thought Help Users Understand AI?

The “reasoning” models of AI typically display a scrolling readout of their chain-of-thought as they “think” through a problem. This is true for models like OpenAI’s o1, xAI’s Grok 3 in “think” mode, and DeepSeek R1.


One reason for this readout is simply to entertain the user during the currently very slow response times. For example, it took Grok 3 103 seconds to translate 92 lines of subtitles for one of my recent videos into Portuguese for my large Brazilian audience.


Here’s a short excerpt of Grok’s CoT when translating one of the subtitle lines (“getting some fresh air and fresh perspectives” in the English original):


"getting some fresh air" is "tomando um ar fresco," and "fresh perspectives" could be "novas perspectivas." So, "tomando um ar fresco e novas perspectivas."

Wait, but in Portuguese, it might sound better to say "tomando um ar fresco e buscando novas perspectivas," to make it clear that they are actively seeking new perspectives. However, the original says "getting," which could imply both taking in fresh air and gaining fresh perspectives. So, perhaps "ganhando um ar fresco e novas perspectivas" or "aproveitando um ar fresco e novas perspectivas." But "aproveitando" might be a bit too formal. Maybe "curtindo um ar fresco e novas perspectivas."

Actually, in the context of a hike, "curtindo" which means enjoying, might fit well. So, "curtindo um ar fresco e novas perspectivas."

Wait, but "fresh perspectives" might need a better translation. "Perspectivas frescas" sounds a bit literal, and in Portuguese, we might say "novas ideias" or "novos pontos de vista." But since it's a metaphor related to the fresh air, maybe keeping "perspectivas frescas" is okay, as it maintains the play on words.


Besides keeping users entertained, there’s potentially a bigger benefit from exposing the AI’s chain of thought. (Even though it’s reportedly not the full chain for proprietary reasons, since the AI labs worry that competitors may steal their approach to reaso0ning models.)

Jonathan Mendez recently suggested that users may be learning more about how AI works by reading through these CoT listings. I find this an intriguing possibility. Does anybody know of any studies of whether people’s mental models of AI improve after seeing their chain of thought? If not, we have discovered yet another topic for a master's thesis — or actually, there could be enough depth in this question for an entire Ph.D. However, I'd rather see the results in half a year, so I say that a bright MS student should go for it!


In general, there are so many unanswered UX questions related to the AI revolution that no graduate student has an excuse for studying the esoteric and useless topics that dominate academia. In particular, if you’re a master’s student, target your thesis work at investigating one of these unanswered questions and you more or less have guaranteed employment after graduating — in a future-proof AI-related job, no less.


AI reasoning models typically show their chain of thought to the users as they work through a problem. (Leonardo)


Ideogram Release 2a

The AI image generator Ideogram is out with an updated version called 2a. The “a” probably denotes a minor upgrade, unlike version 3, which we now eagerly expect. (Let’s hope it ships faster than Midjourney version 7.)


Ideogram was already the strongest image model in terms of rendering text, which admittedly is a low bar since most image models still garble anything but the simplest typography. (I’m looking at you, Midjourney!)


Version 2a now makes Ideogram even better at accurately rendering the specified words and also uplevels its typography skills. I’m sure a skilled human visual designer would still quibble with some of Ideogram’s typographic choices, but just wait a year or two, and I expect it to be a master.


For now, here’s a logo Ideogram designed for UX Tigers. I’m not abandoning my old logo (which has advantages in applications like watermarking my YouTube videos), but this is not bad for something made in less than a minute.


Potential UX Tigers logo designed by Ideogram 2a.


Humva Improves the Usability Of Avatar Creation

The last time I compared AI avatar services was in December, and I concluded that Humva had the best usability but was too limited. (Simplicity is one of the best ways to achieve good usability, but if the UI is too limited, users can’t create the outcomes they want.)

At the time, the only way to use Humva was to upload a manuscript and select which of their pre-defined avatars you wanted to read that script. Easy, but limited.


Humva has stuck to its usability mission but has added the ability to define new avatars and to upload a sound recording of what the avatar should say. I made two new avatar videos with Humva, both based on my article Vibe Coding and Vibe Design:



Two avatar versions I got from Humva, based on a still image I had made previously of “a Norwegian television presenter.” (Explainer to the left, song on the right.) While having some similarities, the two avatars are clearly different people.


The workflow to create custom avatars in Humva is slightly long but each step is easy enough:


  1. Upload an image of a person. Can be a real photo or an AI image. I chose my “Norwegian television presenter” since the video I made with this avatar is my best-performing video on YouTube.

  2. Select from several so-called “styles,” which include outdoors, studio, virtual, and cartoon.

  3. Humva then generates 12 different avatar photos based on variations within that style of your uploaded photo. As you can see from the examples above, these variations do not represent the same identical person, but are similar people. You pick the 5 you like the best.

  4. These 5 photos are transformed into avatars you can use to make videos. This takes about 3 minutes.

  5. Finally, select one of the 5 avatars and upload either an audio recording of what the avatar is saying or a manuscript. In the latter case, Humva generates the speech using TTA.


To create additional avatar videos, you proceed straight to step 5. After I had made by explainer video, it took me only a minute to also make the song video. (Not counting the time to generate multiple song versions in Suno and listening to about 30 seconds for each version to decide on the best performance of my song.)


The response time for a 2-minute video to generate is about 18 minutes, which is slow indeed. But you can jump into another window and do something else during that time.


If you want easy avatars, go for Humva. It can also create the base image for the avatar, so you don’t even need Midjourney. (However, since I had several good pre-designed avatar photos, I used them as the basis of my Humva avatars.)


I also used externally-made soundtracks: from ElevenLabs for my explainer and from Suno for my song. If you want to make singing avatars, you currently need to employ a music creation tool for the soundtrack, but you could stay within the Humva environment for a speaking avatar. I used ElevenLabs because I already have a subscription and because I am very impressed with ElevenLab’s ability to somewhat accurately create an emotional reading of my manuscript with an emphasis in the right places. (ElevenLabs is by no means perfect yet and can’t currently match a good voice actor’s reading of a manuscript. However, I don’t have a voice actor available and if I read the manuscript myself, it would come with a Danish accent — which might not be so bad for a Norwegian presenter, of course.)


While Humva is the easiest avatar tool, it falls short of features you would want for more elaborate videos. Compare with two avatar videos I made with HeyGen:



For both of these videos, I generated multiple versions with HeyGen, using different zoom levels for the avatar, ranging from tight closeups to far views. (Unfortunately, HeyGen couldn’t recognize the singer in my music video when I gave it the full image of the stage, complete with the robot band. Those clips are animated with King and don’t have lipsynch.)

The multiple views create more engaging videos, but certainly also demand an additional time investment for editing.


To emulate this same effect in Humva, I created avatar looks based on the “C-Pop singer” I had made for my “UX in 2025” video. I then made the same song three times, using three of Humva’s avatar looks, and cut between them in the final video. Unfortunately, just as Humva had done for the Norwegian avatar, it also made different people for the Chinese avatar. So the resulting video is not super-realistic, cutting between effectively four different singers performing the same song.


Here's the song I made: Avatar Creation Usability (YouTube, 2 minutes).


I think I’ll retain Humva for simple avatar videos where I just want to get something up on YouTube quickly. Humva’s usability shines for this purpose. But the combination of HeyGen and Kling (supplemented with Veo2 and other video tools for B-roll) can’t be beaten for more elaborate videos. If you want to experiment with avatars, I recommend that you start simple and use Humva. You can make a one-minute demo video for free, but for real projects, you need at least a $19/month paid subscription, if not a higher level.


“AI Platform” As Analytics Traffic Category

I noticed that the analytics provided by Wix, the hosting platform I use for my website at www.uxtigers.com, has introduced a new category for classifying traffic courses: “AI Platform,” which includes traffic coming from the likes of Perplexity, ChatGPT, and Gemini. It joins previous categories like organic search, paid search, organic social media, and email marketing.


This is a smart move, since AI agents will be accounting for ever-larger shares of website traffic and the ability to rank well in AI agents will replace SEO as the main imperative for a successful Internet business. This changeover is still a few years away, but it’s good to train website owners to consider this new channel now, before it’s too late for them to pivot their web strategy.


AI will soon be directing large flows of traffic to websites. Better start tracking your AI traffic now. (Midjourney)


Gemini Image Generation

Google’s Gemini 2.0 Flash Experimental (cute name) can now generate images in addition to text. In itself, a new AI image generator doesn’t qualify as sufficiently newsworthy to be added to this week’s already overly long newsletter. But the integration of a language model with image generation is an interesting trend. The more intelligent the model, the more sophisticated the images might be.


As an experiment, I asked Gemini to make an image in the style of a satirical editorial cartoon to illustrate the point made in the above paragraph in a humorous way. Here’s what I got. I don’t think it’s particularly funny, and I had even cranked the “temperature” (willingness to be wild) of the AI up to 2.


Cartoon made by Google Gemini 2.0 Flash Experimental in image mode. It does fairly well on text rendering, though one word is misspelled.


Gemini showed me the description it used to generate this cartoon:


  • Panel 1: A pompous editor is talking to his overworked subordinate sitting in front of a computer that is drawing the cartoon we are currently looking at.

  • Caption: Editor: "...And THAT is why a text-generating model that also generates images is too mundane to be news."


I fed this to Ideogram with some edits, including the instruction to make a funny editorial cartoon. Here’s Ideogram’s version:


Alternate cartoon by Ideogram. The drawing style is nice but not as funny as Gemini’s and the speech bubble is erroneously given to the “subordinate sitting in front of a computer” instead of the “pompous editor” (who’s also not drawn pompously enough to be funny). One thing I do appreciate about Ideogram is that it doesn’t superimpose a watermark like Gemini does.

 

Top Past Articles
bottom of page