Jakob Nielsen

11 minutes ago12 min read

UX Roundup: Sora: Not Impressed | AI Startups Growth | Hiring Staff with AI Skills | 150K LinkedIn Followers | AI Training on YouTube | Apple Intelligence Useless | Dr. AI | New o3 model

Summary: First impressions of the Sora video model are unfavorable | AI Startups grow fast | How to identify talent with strong AI skills in the hiring process | 150,000 LinkedIn followers | Preference setting for AI training from a YouTube channel’s videos | Apple’s AI features rated useless by most users | AI beat human doctors in diagnosing difficult cases | New o3 AI model is a superGeek

UX Roundup for December 23, 2024. Merry Christmas from UX Tigers! (Ideogram)

Sora: Not Impressed

I made a short video for my LinkedIn posting to promote this newsletter. Mostly I used animations of the hero image (a tiger with Christmas presents watching a train go by) from Kling 1.5, but I also wanted to try OpenAI’s new Sora video model.

I could not get Kling to make a clip showing the tiger waving at the train. Yes, Kling could make the tiger wave all right, but its paw turned into a human hand in every attempt. So I turned to Sora for this scene. Now I got a paw (good!), but the train didn’t move, somewhat destroying the scene.

I made a small compilation movie of Sora’s bad video generations, set to a simple song lamenting the poor results.

I wanted a B-roll video clip of a tiger waving its paw at a passing train, but two leading AI video models gave me human hands waving or couldn’t animate a moving train. (Ideogram)

AI Startups Grow Fast

Y Combinator is probably the leading startup investor in Silicon Valley. They posted an interesting 2024 in-review video (YouTube, 38 min.). Five tidbits:

Their AI startups are now growing an average of 10% monthly. In the pre-AI era, such hypergrowth was only realized by the very best startups, but now it’s the average growth rate.
They now believe that most of the value in AI will come from targeted applications, not general foundation models (e.g., ChatGPT or Claude).
In early 2024, the thinking was that AI was too risky for most enterprise applications because of hallucinations, but by the end of 2024, this conclusion has been reversed due to new models (like o1) with better reasoning that make fewer errors. Humans make mistakes, too, so the real goal is to reach an acceptable error rate, not zero errors. (My comment is that we already have quality assurance methods like six-sigma to overcome human errors and reduce problems to the required level. We may need more work on equivalent QA methods for AI.)

In 2023, AI sometimes seemed to have drunk a bottle of whiskey before answering your questions. By 2025, straitlaced business AI has gone on the wagon and insists on avoiding hallucinogenic substances. (Leonardo)

They recently funded a startup that aims to train an AI model on icon aesthetics, so that it can design new, good-looking icons. (My comment is that AI will soon be able to design attractive icons, but aesthetics are the least of UX design. For icons, the bigger questions are whether users recognize what the image is supposed to show and whether they associate that with the underlying feature being represented by the icon. See my previous coverage of icon usability.)
Whereas traditionally, almost all Silicon Valley startups focused on software, many of the recent startups they have funded are based on hardware innovation, mainly in robotics. A common test case seems to be doing the laundry, which robots can now do. (Certainly a use case I’ll pay for.)

After watching this discussion of the AI startup scene, I strongly urge you to go work for an AI startup (or start your own) if you’re currently stuck in a company that’s not hardcore AI-First. While legacy companies are stuck in meetings to discuss what to do about AI, AI startups are building a new world.

Delicate tasks like folding laundry used to be impossible for robots, but recent AI advances have made this an easy job for current robots. (Midjourney)

How to Hire AI-Capable Talent

Besides the 2024 trends I covered above, Y Combinator’s year-end video discussed how to hire staff with strong AI skills. It’s now a given that any ambitious company only wants to hire staff who are great at using AI, but how to identify this skill?

Last year, I recommended hiring UX staff with skills in using AI for design, and to do so by expecting candidates to use AI for the hiring exercises. Y Combinator takes this idea one step further, by requiring job candidates to use AI when solving the hiring exercises. The discussion in the video focused on hiring great programmers, and the Y Combinator folks recommended setting up a “pair programmer” scenario where the job candidate had to solve a problem in collaboration with AI.

As they point out in the video, a job candidate can’t fake deep AI expertise when asked to solve even a slightly complicated problem with AI. People who dabble with simple prompts cannot pull advanced AI use out of their hat on the fly. Nontrivial advanced AI skills for a certain domain — whether programming or product design — require time to build and can only be acquired by doing real projects with AI.

To hire programmers who are great with AI development tools, Y Combinator recommends an interviewing exercise where the candidate is asked to code in a “pair programmer” scenario, with AI as the teammate. We can probably use similar exercises to identify product design candidates who are good at using AI in the UX process. (Midjourney)

150,000 LinkedIn Followers

I recently passed the threshold of 150,000 followers on LinkedIn. (If you haven’t done so already, follow me on LinkedIn.)

For comparison, 150K is about the population of Alexandria, VA in the United States, Grenoble in France, or Cairns, Queensland in Australia.

I crossed the 100,000-follower mark in December of last year, so it took me a year to gain 50,000 followers, for a 50% year-over-year growth rate. (In contrast, I added 40,000 followers in 2023, so my absolute growth was 10K higher this year, even as the percentage-wise rate was lower.)

Gaining 150,000 LinkedIn followers worldwide. A nice milestone to celebrate at the end of the year. (Midjourney)

I am very grateful to every single UX fan who clicked that button to follow me. Followers are what keep an influencer like myself posting since I make zero money from my content. In fact, when I see people complain about something or other in one of my posts, I feel like saying, “I’ll refund your subscription fee: your check for zero cents is in the email.” (But I don’t do so: people are entitled to like or dislike what they want, and I’ll keep posting what I like, regardless.)

The check I feel like sending to people who complain about an image or don’t like my content, to refund what they paid for it. (Leonardo)

Even though it’s gratifying to see strong growth in my LinkedIn followers, the truth is that LinkedIn doesn’t offer much value to its content providers. I’ve only had a few postings this year that passed 100,000 views, and I don’t think any reached all 150K followers with any one post.

In contrast, my email newsletter (which you’re reading now) grew 80% in 2024, though from a smaller base. More importantly, the newsletter is the better way to reach my fans and generates many times more pageviews for my articles than LinkedIn. Each newsletter subscriber is worth about 200 times as much as a LinkedIn follower. (This is in terms of pageviews. Since I’m no longer selling anything, the dollar value is zero in either case, but if I ran a consulting business, the relative value of the newsletter subscribers would likely be even higher because of the effect of sustained loyalty on likelihood-of-doing-business.)

Growing a follower base on LinkedIn may not offer much true value to content creators. (Google ImageFX)

Watch the music video I made with a song about this event. (YouTube, 2 min.) For this music video, I made the singer with HeyGen’s new “motion avatars,” which have animated features beyond lip-synch. This allowed my singer to play his banjo, though not in tune with the music (they only synchronize the vocals, not the instrumentals). The B-roll was made with Kling 1.6; last week’s upgrade to the 1.5 release.

We’ve been getting new video capabilities literally every week in December, and these two upgrades did improve my music video relative to those I made a week ago. (CapCut has a new release almost daily, though mainly with microfeatures such as animated stickers, which I hardly use because they’re more targeted for inane TikTok videos.)

Allow AI Training on Your YouTube Videos

YouTube has introduced a setting to control whether “3rd-party” AI models can access your videos as training data to improve. This preference lives under Settings (at the bottom of the left rain in the “YouTube Studio” > Advanced Settings > scroll to the bottom to find the heading “Third-party training.”

I have to say that I find the phrase “third-party” suspicious. It does not seem we can decide whether to allow Google or YouTube to train on our videos. In fact, these two companies are not included in the list of AI companies channel owners can select to opt in or out from AI training.

This setting is currently off by default, meaning that AI companies other than Google can’t train on your videos. If you want to allow such training, and thus level the playing field between AI models, you need to take action. Personally, I just allowed all AI models to train on all UX Tigers videos.

I have two arguments why I encourage you to allow AI to use your work as training data, whether YouTube videos, articles, or other media formats:

AI will create an immense lift in humanity’s standard of living, especially in poor countries, that will benefit the most from AI-driven education and healthcare. We all have a moral duty to accelerate this lift and make the AI experience as good as possible for the billions of fellow humans using AI. (People will use AI whether or not it has been improved by training on your data, but it’ll be a little better if your work is included in the training set.) The super-intelligent AI generation we’re expecting to get around 2030 will also hyper-accelerate research in all fields from AI engineering itself (so AI will advance even faster in the 2030s than in the 2020s) to physics, chemistry, space exploration and exploitation, and medicine. Faster-improving science and engineering will lift humanity even more.
Adding your thoughts to AI’s intelligence base means that a little of you will live forever and form part of billions of people’s experience. Excluding your work from AI is equivalent to excluding your website from Google in the early 2000s: you might as well not exist. When AI has absorbed your thinking, you become part of humanity’s collective eternal wisdom.

The hero of Homer’s “The Iliad,” Achilles, was obsessed with his reputation and what people would think of him after he was gone: the ancient Greek concept of kleos (κλέος), which translates as eternal fame or renown. Allowing AI to train on your ideas gives you a modest measure of kleos without dying in battle like Achilles did. (Midjourney)

Apple’s AI Features Are Rated Useless by Most Users

In a new survey of 1,000 users of Apple iPhones with AI capabilities, 73% rated “Apple Intelligence” as “not very valuable” or “adding little to no value.” Only 11% of users said that Apple Intelligence is “very valuable” to the users’ iPhone experience.

Apple fumbled the AI future, and then frantically tried to catch up by adding underpowered AI to its phones. The main lesson is that AI is not like peanut butter: you can’t simply smear a layer of AI over a user interface and expect high usability.

Smearing peanut butter over an apple will not make it delicious. Similarly, adding bad AI to a smartphone won’t make it good. (Midjourney)

AI Beats Human Doctors in Diagnosing Difficult Cases

By now, we have many research studies showing that AI performs better than human physicians at many clinical problems. One more study confirmed this pattern and added insight into AI’s ability to order diagnostic tests to clarify its diagnosis.

The paper “Superhuman performance of a large language model on the reasoning tasks of a physician” (PDF, 25 pages) is by Peter G. Brodeur, Thomas A. Buckley, and 16 coauthors from various hospitals, Harvard Medical School, and Stanford University. (Hat tip Deedy Das.)

The researchers used GPT o1-preview from September 2024 to analyze 143 difficult clinical cases that were documented in writing. The AI was asked to list likely diagnoses and order additional tests that would determine which of the potential diagnoses was correct.

It would have been interesting to have data on the logical third step: What would the AI do when presented with the results from the tests it had ordered? Unfortunately, this was not a live clinical project. Instead, it was based on old documented cases, so no sick patients were present to be tested.

However, for the two initial steps (identify likely diagnoses for a difficult clinical case and order tests to determine the one correct diagnosis), AI performed great.

In 78.3% of cases, o1-preview produced a correct set of diagnoses (scored as having the correct final diagnosis included as one of the potential diagnoses it suggested). For comparison, human clinicians only achieved a diagnostic accuracy of 33%. Remember, these were very difficult cases. In other words, AI was 2.4 times better than the human doctors.

AI was substantially better than human doctors at diagnosing difficult cases based on clinical reports. (Leonardo)

In a previous study, GPT-4 diagnosed a subset of the difficult cases: For that subset, o1-preview was 88.6% correct, whereas GPT-4 was 72.9% correct — a lift of 15.7 percentage points in diagnostic accuracy. (Statistically significant at p=0.015) This is an impressive improvement in AI clinical performance in only 18 months (from the release of GPT-4 in March 2023 to the release of o1-preview in September 2024). A gain of 0.9 percentage points per month.

In 87.5% of cases, o1-preview ordered the correct follow-up tests, as determined by expert physicians. In 11% of cases, the test was judged to be helpful, if not decisive, in determining the diagnosis. Only 1.5% of the tests ordered by AI were deemed unhelpful. Anybody who’s ever had a difficult-to-diagnose condition will know that 1.5% wasted tests is a good score.

(Though since the tests weren’t actually carried out, we don’t know whether the AI would have been able to perform the final steps of interpreting the test results and designing the best treatment for each patient.)

Academic research always lacks behind the practical world. Even more so in the fast-moving field of AI. Two months later, and we’ve already moved on from o-1 preview to o1-Pro as the top AI model. Most likely, o1-Pro is even better at medical diagnosis than o1-preview. And for sure, the next-gen AI we’ll get in 2025 will be better yet.

I hate to write this since I detest the American lawsuit system and the expense of malpractice suits that’s driving up healthcare costs for everybody. But I know that the trial lawyers are smart enough to figure things out for themselves, so I’ll go ahead and say it: the day is coming soon when misdiagnosed patients will sue for malpractice if AI was not consulted for their diagnosis.

AI is moving fast. Its capabilities improved by 0.9 percentage points per month in the current study of clinical judgment. (Midjourney)

The New o3 AI Model is a Super-Programmer

OpenAI announced its new o3 model Friday. I’ve been critical of their product names in the past: which is the more powerful AI model: 4o or o1? Those names seemed chosen by the Abominable Snowman. With OpenAI’s terrible track record in product naming, o3 is not that bad a name for the successor to o1. (Reportedly, they skipped o2 because of the potential trademark conflict with British telecoms company O2.)

The o3 model is currently kept under lock and key for supposed “safety” reasons, meaning independent users can’t assess it. This is despite the lessons from the first two years of practical AI that there are no safety concerns. Better to let the world hammer away at new AI products and get any weaknesses disclosed publicly so that they can be fixed than to rely on secrecy and a closed group of “safety” fanatics to hobble our AI.

Because of the secrecy, we have to take OpenAI’s word for the capabilities of o3. It’s a powerful model, if we can trust OpenAI. However, the scores they’ve published are indeed impressive:

On the Codeforces programming competition, o3 achieved an Elo score of 2727, which puts it in the 99.5th percentile of competitive software developers who have attempted the contest.
Even more impressively, in the history of the Codeforces competition, only 174 software developers in the entire world have achieved a score higher than o3’s score. In other words, on this metric, o3 is the world’s 175th-best programmer.
On the Epoch AI’s FrontierMath test, o3 achieves a 25% score. This may not sound as impressive, but the best AI could do before on these insanely hard mathematics problems was 2%. Some of the world’s leading mathematicians have described the test as “extremely challenging.”

The o3 AI model announced 3 days ago is extremely good at mathematics and software development. It remains to be seen how good it is at other tasks. (Leonardo)

Do these results mean that software developers are destined for the unemployment lines? No, there’s more to software development than coding. But it does mean that software developers will have highly increased productivity, which again means dropping prices for software and that more software will be produced and bought. This, again, means that more design will be needed. SuperGeeks create demand for SuperUXers.