Summary: AI-Assisted Video Editing with CapCut | Comparing services for generating AI avatars | The core NotebookLM team leaves Google | Monthly subscriptions are better than annual subscriptions for AI services
UX Roundup for December 5, 2024. (Ideogram).
AI-Assisted Video Editing with CapCut
I have finally transitioned out of Adobe Premiere to use the AI-enhanced video editor CapCut from ByteDance for my recent video projects.
It took me about 5 hours to learn the basics of CapCut, so it’s not as easy as touted. I still don’t quite understand the model it uses to relate video clips and the associated subtitles.
In CapCut, you can edit a video clip by editing an AI-generated transcript of that clip. The AI identifies the filler words and repeated words that are so common in live speech, enabling you to tighten up the video with a single click to remove those words.
Many of CapCut’s AI features are more useful for its core use case of editing TikTok dance videos than for editing my videos, so I have not experimented with them in depth. However, the application offers a wealth of AI-driven features to change the appearance of the on-screen talent. For example, you can look slimmer. (I might benefit from that, but only in a full-body shot, not in the headshots I was editing.)
AI-enhanced video editing brings many opportunities to use intent-based outcome specification to improve a video. (Midjourney)
A key point is that the features employ a true AI intent-driven user interface. CapCut doesn’t offer old-school command-based features to, for example, resize the video by x% in the horizontal dimension. Instead, it has a feature specifically for appearing slimmer, which applies image recognition to identify the talent’s body so that it only changes that and not other elements in the video. The user specifies his or her intent for the desired outcome, not the steps needed to achieve this outcome.
CapCut’s features for touching up makeup exemplify intent-based outcome specification. I don’t think this “killer smoky” eye shadow improves my looks, so I didn’t use it in my videos. Still, I did use the ability to lighten my skin color, because the camera’s color balance had gone rogue when recording the raw video and didn’t faithfully reproduce my proper pale countenance.
From a usability perspective, the key to good AI features is to base them on a task analysis of user needs. Yes, AI offers intent-based outcome specification, but for this interaction style to work well in practice, designers must understand likely user intents and design to facilitate them.
Old-school human factors ride again: even if you don’t perform a detailed time-and-motion study of users, you should conduct task analysis to identify the main pain points and tasks where people spend inordinate amounts of time to accomplish goals that can be taken over by AI if you can communicate the user’s intent and have the AI produce the desired outcome. (Midjourney)
AI Avatars: 3 Services Compared
I finally broke down and created an AI avatar video with HeyGen, which is the most widely lauded service for this task among the AI creator community. I used other avatar services for my initial attempts because HeyGen seemed overly complex.
Using HeyGen confirmed my suspicions: it was indeed harder to use than the other avatar tools I tried. Here are the 3 avatar videos I have created, so that you can watch them and compare the outcome:
Summary of my article on why design leaders should go “Founder Mode.” 2 minute video made with Humva.
Summarizing the controversial points in my keynote at the Y Oslo conference. 2-minute video made with D-ID.
Highlights from 6 of my articles about AI creativity. 2-minute video made with HeyGen.
Still image used as the basis for my avatar video experiment with HeyGen. (Midjourney)
I used Kling to add B-roll style animations to the last two videos for additional visual interest. Given the current state of avatar animation, it’s too boring to watch the same talking head for two minutes straight.
I used CapCut to reformat the videos into portrait aspect ratios for social media: Founder Mode video on Instagram, Keynote summary video on Instagram, AI creativity video on Instagram.
Here’s my analysis of the 3 AI avatar tools:
Humva: Much the easiest to use, with essentially no learning curve. They provide a long scrolling list of pre-defined avatars in each of two orientations (16:9 landscape and 9:16 portrait aspect ratios). Select the one you want and upload the manuscript for what you want the avatar to say. That’s it. The only additional feature is the option to add an emotion, such as “excited,” to the avatar’s facial expressions, but I found these modifications to be too exaggerated to be useful. Usability comes from the lack of features: each avatar has a given look, stage set, and voice — so you have to take the full package the way it comes. Animation is good, though the avatar’s movements aren’t fully natural. Voice is of medium quality but sounds fairly natural. (Watch the avatar video I made with Humva.)
D-ID: Almost as easy to use as Humva, especially considering its additional features. In addition to pre-defined avatars, you can use image-to-video to upload your own preferred avatar look, and you can use text-to-video prompts to have the tool generate a custom avatar. Each visual can be combined with a large number of available voices, with many different accents. The animation quality is lacking: the lip synch is rather scary and clearly not natural. Also, if you look closely at my avatar, you will see that only the top part has been animated, whereas the lower part of the video is a still image. This is best seen when looking closely at the avatar’s hair. (Watch the avatar video I made with D-ID.)
HeyGen: The most features with the highest quality, but the hardest to use, as is often the case for feature-rich applications. HeyGen also provides both pre-designed avatars and the ability to upload your own look. You can even upload existing videos as the basis for training a custom LoRA. This allows the creation of an avatar that looks very similar to an actual person. If you want to go through the bother of first using a service like Kling to produce multiple videos from an image, you can also use this feature for a supposedly-richer custom-designed avatar. (This workflow is too convoluted for me to have attempted it yet.) The available voices are even more expansive than D-ID and include multiple languages, not just English. Premium subscribers can generate custom voices to avoid having their avatar sound like other companies’ avatars. The video quality is the best of the 3 tools and includes 4K resolution at the highest subscription price. The audio quality is also the best. (Watch the avatar video I made with HeyGen.)
All 3 tools offer a free mode for creating a few short avatar videos, which is all you need to evaluate which one best serves your needs.
My recommendation: If you just want to make a few avatar videos for fun, use Humva at the free level. If you want to produce a full run of avatar videos for serious business purposes, take out a paid HeyGen subscription.
None of these AI avatars are fully human quality. We will need improved voice expressiveness and visual gesture richness before the avatars become as engaging as professional TV studio hosts. (However, even now, AI avatars are better than most humans, who are frightfully dull on camera.)
AI avatars are still not as engaging as the best human presenters. But just wait a year. (Leonardo)
The Core NotebookLM Team Leaves Google
NotebookLM was the one AI-first product from Google that took the world by storm. In particular, the podcast-style audio overviews went viral big time. I made 5 of these overviews, which shows you how much I liked the product:
The first two overviews (“Founder Mode” and the 4 metaphors for working with AI) were the most successful because they had a focused topic. The last 3 covered more ground, but for that exact reason, I was less happy with the way the AI had chosen what materials to select from the broader set of content I had provided as the basis for the podcast.
Last week, Raiza Martin, who was the product manager for NotebookLM, announced that she had left Google together with the lead UX designer and the lead developer for the product. This core team is starting a new company to build a currently-unannounced product.
Of course, it’s bad news for Google that they have lost the one core team that has been able to ship a useful AI product. But Google has so many other good people that it only needs to unleash the existing talent from suffocating internal politics to get many other great products.
On balance, I think this is good news for the world and the users. The future is not for the highest talent to work at legacy companies but for them to join up as small superTeams in independent companies. AI empowers such small teams to accelerate beyond anything previously possible. In the past, there were benefits from being nestled within the superior resources offered by legacy companies. Now, the legacy only impedes the most powerful talent who can get the same support from AI.
Raiza Martin has flown the nest, together with a superDesigner and a superGeek. Leaving BigCo for a startup bodes well for what this superTeam will ship next. (Midjourney)
Monthly Subscriptions Are Better Than Annual Subscriptions for AI Services
Stephen Gates from the CRZY design agency presented at great keynote yesterday at the ADPList AI Summit. I’ll report on one pragmatic issue and one deep issue he mentioned in his talk.
The pragmatic point is that Gates advised against taking out annual subscriptions to AI services. Even though they come with a hefty discount, it’s more prudent to subscribe on a month-to-month basis because things change so quickly in AI. A service that’s leading now may easily be overtaken by a competing service or a completely new company next month, particularly regarding the features that matter to your job.
By New Year’s Eve 2025, how many of the best-in-class AI services as of December 2024 will retain their crown? Not enough to make annual subscriptions a safe investment. (Leonardo)
Gates’s more profound insight was the need for repeatability in AI-based design. He used a client project as an example: for an artisanal whiskey product, he needed “photos” of 12 different cocktails for the brand’s recipe collection without spending $50,000 on a photo shoot.
For professional use, creating one good AI design is insufficient if you can’t repeat that feat on demand many more times.
In contrast, for amateur use, such as my own situation, one-time creations are fine. You’ve probably noticed that I don’t employ brand standards but experiment with different styles for almost all my illustrations. That’s more fun! For the same reason, I sometimes publish haikus, sometimes rock songs, sometimes podcasts, and lately, I’ve been on an avatar kick, creating newscasts about some of my articles — but with a different avatar each time.
But repeatability separates the pros with a real design business from amateurs like myself.
Unfortunately, repeatability is difficult with AI, which is inherently stochastic. I recommend embracing AI’s probabilistic uncertainty: giving us different results each time we ask the same question is one of the reasons AI is such a great ideation tool.
AI is like throwing dice in a craps game. Sometimes, you win, and sometimes, you have to cast the dice a few more times. (Midjourney)
For business, we want repeatability. Some AI tools have features that support consistency, such as generating persistent characters, which was almost impossible in 2023, somewhat supported in 2024, and hopefully a standard (and easy-to-apply) feature by 2025.
Gates recommended engaging in prompt management, which is more systematic and levels-up the formality of repeatability compared to my long-running recommendation to maintain a prompt library of prompts you’ve liked in the past. His agency currently does prompt management in a shared spreadsheet, which is clearly not ideal. Let’s hope we get better support for prompt management in 2025. It’s not clear to me whether this should be a separate tool, that works across the individual AI services, or whether prompt management is better offered as an integrated feature within each AI tool.
We need repeatable AI for systematic business use of AI in design. (Midjourney)