Summary: New podcast creation feature by ElevenLabs | Figma Plugin with UX design guidelines | Traditional company business models being disrupted by AI | New AI models: o1 and o1 Pro
UX Roundup for December 9, 2024. (Ideogram).
New Podcast Creation Feature by ElevenLabs
ElevenLabs is the leading AI-voice company. After Google stole the limelight with its NotebookLM podcast creation, ElevenLabs has launched its own version, called GenFM. (What’s with these stupid names for AI products and features?)
As an experiment, I created a new podcast with ElevenLabs (YouTube, 8 min.). This podcast is based on my article “Design Leaders Should Go ‘Founder Mode’”
I picked this topic because I had recently made a podcast based on that same article with NotebookLM (YouTube, 5 min.), so you can compare the two.
ElevenLabs’ strength is its voices, and indeed the new podcast hosts sound great. However, they do sound like individual presenters who take turns speaking, not like the bantering hosts in Google’s podcast. I feel that the new podcast is considerably less dynamic and fun to listen to.
However, exactly because the ElevenLabs podcast is based on turn-taking, I decided to create the video as if the two hosts were sitting in different locations and connecting via an Internet service like Zoom. I’ve recorded plenty of podcasts myself in this manner. Treating the two hosts as separate video feeds allowed me to animate them with HeyGen, which again offered lip synchronization, making the hosts look much more real. (Even if HeyGen’s lip movements are still not 100% natural.)
I added a bit of B-roll animated with King 1.5.
The ElevenLabs podcast is closer to a Q&A format than a discussion. On balance, Google’s NotebookLM podcast is more entertaining, but ElevenLabs’ GenFM podcast packs more actual information and hews more closely to my underlying article.
I used photorealistic hosts for my latest adventure in AI-generated podcasts because I am using lip-synching. (Midjourney)
Figma Plugin with Usability Guidelines
Talk about integrating answers within the workflow: Baymard Institute has released a free Figma plugin with its usability guidelines for web design. I strongly recommend getting this plugin if you design websites and use Figma.
Baymard Institute is the world’s best source for research-backed usability guidelines for e-commerce websites. If you run an ecommerce site with at least a quarter million dollars in annual revenue, you should actually subscribe to their paid service. Even though it costs $2,388 per year, you should be able to improve sales by at least 1% by applying the full set of guidelines.
For non-ecommerce websites, it’s harder to justify the cost. However, since many of the usability issues in any website are covered in the e-commerce research, the very biggest non-e-commerce sites should still subscribe. The new free service is a great offer for anybody else (which means most websites).
The benefit of embedding evidence-based usability findings right within your UX workflow is that they provide a just-in-time design rationale when team members or stakeholders question your design decisions. External resources are fine, but that extra step of having to go elsewhere for the research findings means that many people won’t bother.
Ease of use is paramount for wide take-up of usability guidelines, so I applaud this new initiative by the Baymard Institute.
Embedding advice within the workflow increases the likelihood that people will bother paying attention. The new Baymard plugin for Figma will help many web design teams produce more usable websites. (Leonardo)
Companies Being Killed by AI
The Economist had a short analysis of some companies that AI has hurt the most so far. (Subscription required.) Some facts from the article:
Stock valuation has dropped 99% for Chegg, an online education service, as students get homework help from ChatGPT instead of Chegg.
Stack Overflow traffic has been cut in half. Programmers used to turn to this website for answers to their coding questions. Now, AI development tools provide the answers.
RWS share price is down 57%, as customers get documents translated by AI instead of employing this translation company.
The key insight from these stats is that AI disruption is big, with effects reaching from cutting a company in half to virtually eradicating it. Next year, as we expect to get the next generation of AI with higher capabilities, more industries will be disrupted.
AI as a teacher is one of my four metaphors for working with AI. Just-in-time learning with AI beats legacy instruction by human teachers. Business as usual won’t cut it for education and training companies.
Elementary schools will probably continue to employ human teachers to motivate the kids and serve as role models, even as AI will handle the actual instruction with its ability to individualize the material for each student’s IQ and interests.
Many legacy business models (and legacy companies) will be eradicated by AI over the next 3-5 years. On the other hand, AI creates opportunities for many more new companies, but they’ll be smaller and more efficient than legacy corporations. (Midjourney)
New AI Models: o1 and o1 Pro
OpenAI finally launched the non-prerelease version of o1, their AI model that uses enhanced inference-time reasoning to improve output. Of more interest, they also launched a higher-end model, called o1 Pro, which requires an upgraded subscription ($200 per month instead of $20). The Pro model is allowed to consume even more inference-time compute, which should lead to even better results.
If we assume that the prices are “fair” (i.e., reflect the cost of providing the respective services), Pro is likely to consume about 10 times as much compute as the regular o1 model for each output.
In the release propaganda from OpenAI, o1 was said to be superior at math, coding, and writing. I put it to the writing test by asking it to write the same story I used as a recent test of ChatGPT 4o: a short children’s story about the wildebeest who thought he was an impala.
The new story is reprinted below. Is it any better? I do like the beginning, which employs more engaging writing. But I like the ending much less this time: The (unnamed) giraffe simply tells the wildebeest that he’s a wildebeest, and he turns around and accepts this fact. In the previous version, the wildebeest was playing with his impala friends and saved them from a lion by utilizing the unique strength of the wildebeests. This seems a much stronger ending and more likely to have the stated effect that the wildebeest accepted his biological nature.
From this small test, I’m not impressed by the supposed writing improvements in o1. In particular, you would imagine that story structure and plot progression should have been strengths of additional inference.
The “Pro” model is a more interesting development. In the product design field, it may help us with some thornier problems, such as system design or information architecture. I will be interested to see what design teams report as they upgrade their subscriptions and deploy the Pro model on tough problems.
However, I expect bigger gains for AI in product design next year as we get true next-generation AI models with higher levels of intelligence, as opposed to simply “thinking” harder for a longer time at the 2023 level of intelligence.
Information architecture design is one of the activities that may benefit from the additional reasoning offered by the new o1 and (especially) o1 Pro models. Some degree of human oversight will likely be recommended for the next 3 years until we have advanced two more generations of AI intelligence levels. (Leonardo)
Early reports from AI influencers indicate that o1 Pro may only be about 10% better than the “regular” o1. If I’m right that Pro consumes about 10x the compute, some people have found this to be a disappointing assessment. First, since o1 Pro is a new type of AI model with substantially more emphasis on using inference compute for deeper reasoning, it’s likely to take users some time to discover the best way to apply it to their problems.
Second, it’s actually not that unexpected that a 10x increase in compute only produces 10% better results. Remember that the AI scaling law estimates that it takes 100x more compute to move to the next generation of AI. AI intelligence grows as the logarithm of the effective compute. So to advance two generations of intelligence requires 100x100 = 10,000 times more effective compute.
We shouldn’t expect that much improvement in AI performance from only spending 10x more compute. 10% better sounds about right, though I expect that we’ll inch up toward 20-25% with better understanding of inference-time AI reasoning.
The o1 models (including o1 Pro) are an attempt to make AI think harder about each problem by consuming additional inference-time compute. However, in the long run, I expect more progress from using 100x more training compute to train next-generation AI that can think smarter. Of course, we should use both approaches since it often doesn’t matter whether the AI spends a few minutes more to work through a problem. (Ideogram)
Pay 10x more to gain 10% better AI performance? For some people, this is a poor bargain, but for many business applications, 10% better performance is easily worth $180 higher AI subscription fees per month. For sure, this is true for the use of AI in software development: getting better code written faster — even if only by 10% — is worth roughly $2,000 per month, assuming that a software developer costs the company $240,000 per year in salary and overhead expenses.
Design leaders should insist on having their companies pay for a few “Pro”-level AI subscriptions at the $200/month level for selected members of the design team. It’s a no-brainer to pay this much for the developers’ AI subscriptions: the ROI will be immediate and huge. Unfortunately, design will fall even further behind our engineering friends if we’re left with underpowered AI while they get the big-boy toys.
If only developers get upgraded subscriptions to the new “Pro” AI models, the gap between design and engineering will widen further. The design team will fall behind unless it also experiments with ways of accelerating its processes with o1 Pro and the other upgraded models that will no doubt soon be released by other companies. (Ideogram)
“The Wildebeest Who Thought He Was an Impala,” as Told by GPT o1
Once in the wide, grassy savanna lived a young wildebeest named Wundu. From the moment he opened his big, brown eyes, Wundu admired the graceful impalas leaping across the plains. He watched how they sprang into the air as if their hooves were made of tiny springs. They seemed so elegant, so swift, and Wundu thought, “I must be one of them!”
Wundu tried to copy the impalas. He bent his knees and attempted a mighty jump, but he didn’t soar; he simply plopped awkwardly into a patch of tall grass. He tried again and again—each time, his jumps looked more like wobbly hops. The impalas watched, tilting their slender heads, and wondered what this wildebeest was doing.
Still, Wundu refused to believe he wasn’t an impala. He practiced striding along the savanna with his head held high, as graceful as could be. But whenever he broke into a run, he thundered along the ground, scattering dust and alarming the bright-winged birds.
One afternoon, as Wundu stood by a watering hole, an old giraffe lowered her long neck and said, “Little one, why are you trying so hard to be an impala?”
Wundu answered, “Because impalas are elegant and swift. I want to be just like them!”
The giraffe smiled kindly. “You are a wildebeest. Strong legs, sturdy shoulders, and a proud, shaggy mane make you who you are.”
Wundu looked down at his strong, steady legs. He realized that he might never leap as an impala does, but he could thunder proudly across the plains, true to himself.
(260 words)
The wildebeest who thought he was an impala. But he can’t jump. (Leonardo)