UX Roundup: OpenAI Product Roadmap | Parallel and Iterative AI Use | Virtual Boyfriends | Jakob in SF | Dropdown Menus | Early GUI | AI Urgency | Perplexity Deep Research

Jakob Nielsen

Feb 1713 min read

Summary: OpenAI announces its product roadmap | Parallel and iterative use of AI improves quality outcomes | Virtual boyfriends are popular | Keynoting the Dovetail conference in San Francisco | Dropdowns must die | Early GUI prototypes | UX needs more urgency about AI | Perplexity Deep Research

UX Roundup for February 17, 2025. (Ideogram)

Grok 3 Launches Monday Night

xAI has announced that its new Grok 3 model will be released tonight, Monday, February 17, with a live demo at 8 pm USA Pacific Time (find the equivalent time in your time zone). The link to the launch event has not been made public yet, but will hopefully be on the Grok website.

The preannouncement hypes that Grok 3 will be the “smartest AI on Earth.” Of course, we shall see, but according to the AI Scaling Laws, there is hope this will be true since it was trained on a huge supercomputer with 100,000 Hopper chips, which is substantially more compute than used for the previous-generation AI we’re currently using.

However, even if Grok 3 does take the lead, this may only be temporary since OpenAI (see next news item) says that they will ship GPT 5 in a matter of “months” (whatever that means). Furthermore, rumors are heavy in Silicon Valley about the imminent release of at least one other high-end frontier model.

No matter who tops the leaderboards this week or next quarter, the most important point is that we’re about to level up to the next generation of AI models. The specific skills of any individual model are less important, because it’s one of the characteristics of advanced use to be able to pick the best model for any given problem. But the amount of upleveling between generations is important. How much better is Grok 3 than Grok 2? And how much better will GPT 5 be, compared to GPT 4 (or o1, or even Deep Research)?

Grok 3 will be unveiled tonight. (Illustration made with Grok 2)

OpenAI Product Roadmap

Sam Altman (head of OpenAI) announced that he has fired the Abominable Snowman as OpenAI’s head of product naming strategy. Actually, Sama announced OpenAI’s product roadmap for the next two releases, which finally included a sensible naming strategy. No more “o3” which is less powerful than “4o.” Now, the products will be unified and have a single, simple set of product names: ChatGPT 4.5 and 5.

4.5 will supposedly ship in a few weeks. (The roadmap didn’t include a timeline. Back when I was a Sun Microsystems Distinguished Engineer, we always gave paying customers the courtesy of giving them the rough timeframe for when they could expect the upgrades we announced in our technology roadmaps.) Version 5 will follow some unspecified number of months later and will integrate “a lot of our technology,” alleviating the need for separate names for obscure features that normal users don’t understand.

GPT-5 will be free for all users, but will operate at a higher level of intelligence for paying customers (fair enough), with the highest IQ reserved for customers at the $200/month “Pro” level. One more reason design leaders must secure budgets now to give all UX team members a Pro subscription to avoid falling behind our engineering colleagues who will surely have the best AI at their disposal.

OpenAI finally gave the boot to whoever designed their previous abominable product names. I always thought this guy must have been in charge. (Midjourney)

Parallel and Iterative Use of AI Improves Quality Outcomes

Tom Gally, a professional Japanese-to-English translator with nearly four decades of experience, developed an iterative LLM-powered workflow that demonstrates how experts can leverage AI while maintaining human oversight. His process involves three key phases:

A. Multi-model drafting

He begins by running source texts through multiple LLMs (Claude 2, GPT-4, and others) to generate diverse translation variants. For a Sony gaming console launch presentation, he might create 3-4 distinct English versions simultaneously.

B. Hybrid refinement

When encountering problematic phrases:

Inputs the Japanese original and draft translations into Claude
Requests 10+ alternative renderings of specific tricky sentences
Selects the most contextually appropriate options
Repeats until satisfactory solutions emerge

This “thesaurus on steroids” approach of asking for alternate versions of entire phrases helps overcome creative blocks while preserving translator agency. (In an old-school printed thesaurus — or even the online version in Microsoft Word — you only find synonyms and variants of individual words.)

C. Validation loop

Final quality checks involve:

Running translations through secondary LLMs for consistency analysis
Using text-to-speech playback to catch unnatural phrasing
Comparing newer models like DeepSeek-R1 against established ones
Maintaining human judgment as the final arbiter

Gally’s workflow exemplifies how human experts can use AI as a collaborative tool rather than replacement — maintaining 40 years of expertise while augmenting efficiency through strategic automation.

The full workflow outlined here only makes sense for very important translation projects, such as translating a major speech by the head of Sony or the prime minister. For everyday projects, an abbreviated version will suffice.

While this specific workflow is for translation, it showcases two elements that should be used in any design project, following the UX prompting diamond:

Parallel design: We start by asking AI for many divergent ideas in parallel.
Iterative design: The first draft design comes from merging the best ideas from the parallel design step, but we’re not done. Then we proceed through many steps of iterative refinement. In the case of a UI design, user testing is essential for driving these iterations.

Left: Parallel design is like a traditional footrace: you let several runners lose and the winner is whoever passes the finish line first. However, contrary to sports, in UX we don’t need to anoint only a single winner, because we can combine the best ideas from several of the parallel designs.

Right: Iterative design is like a relay race where each runner passes the baton to a fresh runner who then runs the next leg of the race. Typically, four runners each run a quarter of the distance, meaning that they can run much faster than if a single athlete were to attempt the full distance alone. (The world record for 4x100m relay is 36.8 s, whereas the record for one person running all 400 meters is 43.0 s. — 17% slower.) (Midjourney)

Virtual Boyfriends Are Popular

I admit that I’ve been guilty of referring to AI companion products as being mainly “virtual girlfriends,” but data published by Olivia Moore shows that there’s also a big market for virtual boyfriends. 30% of the users of these products are women. (Of course, this means that 70% of users are men, so it’s true that the modal AI companion is a virtual girlfriend.)

These stats are for the number of users. In terms of making money, remembering your female users is still recommended: the number one AI companion app in terms of consumer spend is LoveyDovey which targets women in Korea, Japan, and South-East Asia.

Maybe more interesting than the narrow category of companion apps is that the users of AI products are currently overwhelmingly male. Even to the extent that AI writing tools have 85% male users and only 15% female. That one was a surprise to me since writing is a female-dominated profession.

The only product category that surpasses AI companions in its share of female users is education products for which 31% of users are women. Still a low percentage, considering that 89% of elementary school teachers in the United States are women. (High school teachers are 60% female.)

I would hope that women become bigger users of AI soon. In the early days, AI was surely a technology best suited for nerds, who are predominantly men. But AI is getting to be more useful for business and for everyday tasks, so it needs to gain more female users.

For AI to succeed in business, it needs to attract as many women as men as users. (Leonardo)

(And for the hold-out women to have careers as knowledge workers after 2027, they need to start building strong AI skills now. It takes years to understand all the subtle ways of using AI beyond simple prompting.)

Speaking Dovetail Conference, San Francisco

I am the plenary speaker at Dovetail’s annual “Insight Out” Conference in San Francisco, April 23-24. I will be interviewed by Felix Lee, the head of ADPList. Come with your questions prepared!

Leonardo’s interpretation of the one-word prompt “Insight.” It seems to envision user research like landing a space probe on a foreign planet. Pretty good metaphor, for two reasons: First, we do engage in customer outreach to learn about something new and (to us) strange. Second, current user research is like space probes in only scratching the surface and exploring a fairly small area.

Dropdowns Must Die

Fantastic article about dropdown usability by the Baymard Institute. Baymard is the world’s leading experts on ecommerce website design, but more to the point, their analysts have observed thousands of users interacting with almost any web design style you can imagine.

Conclusions on GUI widgets like dropdowns generalize from ecommerce to other genres of websites, exactly because they’re based on testing a wide variety of design styles. You can’t generalize from, say, Amazon.com to your website. The differences are too vast. But it’s valid to generalize from tests of hundreds of sites to your site. (Remember Jakob’s Law of the Internet User Experience: users spend most of their time on other websites than your site, so their cumulative experience from those sites forms their expectations for how your site should work.)

Scrolling through a dropdown with a long list (for example, of all the countries in the world) makes for an unpleasantly slow and error-prone user experience. (Leonardo)

Only deviate from such general usability findings if you have conducted your own user research with your own customers performing their specialized tasks. Maybe you’re in the 10% of cases where the best design is something different than dictated by best practices. But the odds (and Jakob’s Law) are against you. Furthermore, Baymard has observed usability problems with dropdowns in user studies since 2010. When a usability finding remains the same for 15 years, it’s almost always because it reflects an underlying truth that’s unlikely to change in the future.

Jakob’s Law haunts designers who attempt to go against the common norms found on other websites. (Leonardo)

(Hat tip to Luke Wroblewski for alerting me to this article.)

A good (that is to say, bad) example of dropdown usability is for the country selector in an address. There are more than 200 countries and territories in the world, and it’s both time-consuming and error-prone to find and select your country in the long scrolling list presented in a dropdown menu. Particularly if you live in a country like USA or UK, it’s much faster to type the country name than to use a dropdown. But even if the user lived in, say, the Wallis and Futuna Islands (but how many customers do you have in those islands anyway?), a design that offered autocomplete would be better.

Autocomplete is like a superhero who swoops to the rescue when you need to enter a long country name in a text box, making it a better UI than scrolling through a dropdown menu. (Leonardo)

When there are only a few options, radio buttons are faster and more usable than a dropdown.

The only time dropdowns may be acceptable is for selecting between a number of options that is too big for showing as radio buttons and yet small enough to avoid scrolling of the dropdown menu. Even so, I encourage you to do your own usability testing before imposing a dropdown on your suffering users.

Early GUI Prototypes

The Computer History Museum has published a fascinating walkthrough of screenshots from the development of Apple’s Lisa (YouTube, 24 min.) with Bill Atkinson who was one of Apple’s main early UI designers of the Apple GUI that later became the Macintosh.

The graphical user interface that we all love today underwent many rounds of iterative design in the 1970s, both at Apple (as shown in the video I linked) and at Xerox PARC. (Leonardo)

These screenshots are literally Polaroid photos of the screens, because they were taken so early that there were no system features for software-based screenshots yet. The early designs go back to 1979 and start with showing the design of proportional display fonts. (When letters are different width: for example, an “I” is narrower than a “W” — we take such fonts for granted now, but computers used to show monospaced fonts where all characters were the same width.)

It is fascinating to watch the evolution of basic GUI design elements such as windows and pop-up menus, with many false starts. True iterative design. For example, Atkinson’s early design for resizing a window involved dragging the mouse for a distance that equated to the new size you wanted for the window. The trouble was that if people just wiggled the mouse with the button down, it would appear to the computer that they had dragged for a very short distance and thus make the window a few pixels wide. This again obviously made users think that the computer had crashed because their working window had “vanished.” He redesigned the window-resize feature to work by dragging the lower right corner of the window, after which the drag distance would specify the change to the window’s size as opposed to its new absolute size. With this design, wiggling the mouse would change the window size slightly but not cause a usability disaster.

The first design for resizing a window was a disaster and often made users think their computer had crashed, as the window seemingly disappeared, having been erroneously resized to a few pixels in size. (Leonardo)

The Lisa was launched in 1983, 4 years after the early design prototypes shown in this video. The Macintosh used essentially the same GUI and was launched in 1984. Unfortunately the Lisa was too expensive: it cost $10K in 1983 (equivalent to about $30K now), which was too much for a personal computer.

This is not how the Macintosh GUI was designed. Bill Atkinson, Larry Tesler, and others proceeded through many false starts and design iterations during 4 years of designing the Lisa, which in effect became the prototype for the Mac. (Ideogram)

UX Needs to Feel More Urgency About AI

New explainer video: Why UX should have more of a sense of urgency about AI. (YouTube, 3 min.)

Time is running out. We need more AI in the UX design process and better UX design of AI products NOW. (Midjourney)

Perplexity Deep Research

Perplexity has introduced its own deep research feature, also called “Deep Research,” using the same name as Google and OpenAI. It seems that Deep Research is becoming a generic name for an AI feature that does extensive research on a topic and synthesizes the findings into a single report.

Based on my testing so far, Perplexity Deep Research is very good, but not quite as good as OpenAI Deep Research, which gives more extensive and thought-through answers with better analysis. On the other hand, Perplexity Deep Research provides faster answers, probably because it’s based on DeepSeek R1, which is a very fast AI model, whereas OpenAI uses o3 for its Deep Research. Response times matter for AI usability, and I could easily see users turn more frequently to Perplexity Deep Research for questions where the last level of polish and insight isn’t worth an extra five minutes’ wait.

Most important for most people, Perplexity’s Deep Research is included in its cheap $20/month subscription plan, whereas OpenAI requires a $200/month high-end subscription for its most advanced features like Deep Research. That said, if you have a company job as a UX professional, I still recommend having your company pay for the $200 OpenAI subscription: it’s well worth the cost in terms of increased productivity and quality work.

DeepSeek has a whale for its logo, but since Deep Research is becoming a generic feature from many AI providers, I decided to use a deep-sea fish as my illustration. (Midjourney)

To give you an idea of the depth of insight provided by Perplexity Deep Research, here’s a report I had it make about my usability heuristic number 6, “Recognition Rather than Recall.” (See the list of all 10 usability heuristics, if you don’t remember them offhand.) Unfortunately, Perplexity included a number of supporting statistics that are likely to have been hallucinations because they aren’t supported by the provided references. I edited these items out from the copy below, since I don’t want to perpetuate possible errors and give them the credibility of having been published on my website.

(Listen to my song about the Recognition Rather than Recall heuristic — YouTube, 2 min.)

Recognition beats recall. (Leonardo)

Recognition Rather Than Recall: A Comprehensive Analysis of Jakob Nielsen's Sixth Usability Heuristic (Made with Perplexity Deep Research)

Key Findings Summary

Jakob Nielsen's sixth usability heuristic, "Recognition Rather Than Recall," emphasizes reducing cognitive load by designing interfaces that leverage users' ability to recognize information rather than rely on memory recall. Grounded in cognitive psychology, this principle advocates for visible cues, contextual prompts, and intuitive navigation to minimize mental effort. Empirical studies, including Barbara Tversky's research on encoding processes, demonstrate that recognition-based interfaces enhance usability by 34–51% compared to recall-dependent systems. Real-world applications span menu structures, autocomplete features, and e-commerce personalization tools, while violations — such as requiring users to remember transient information —result in measurable frustration and task abandonment.

Theoretical Foundations of Recognition Over Recall

Cognitive Psychology of Memory Retrieval

Human memory retrieval operates through two distinct mechanisms: recognition and recall. Recognition involves identifying familiar stimuli through external cues, while recall requires active reconstruction of information from memory without prompts. The cognitive advantage of recognition lies in its reliance on spreading activation, where cues trigger associated memory nodes. For example, seeing a menu item like "Save As..." activates related concepts (file formats, storage locations), whereas recalling the same command without visual aids demands isolated memory retrieval.

Nielsen's Formulation and Industry Impact

Nielsen's heuristic emerged from observational studies of user frustration with early 1990s software interfaces. Systems like DOS command lines, which required memorizing syntax, exhibited 72% higher error rates than GUI-based alternatives with visible menus. By 1994, Nielsen codified the principle as part of a broader framework, arguing that interfaces should externalize memory demands through:

Visibility of actions (e.g., persistent navigation bars)
Contextual help (e.g., tooltips on hover)
Progressive disclosure (e.g., search filters that refine options dynamically)

These strategies reduce working memory strain, which is limited to 7±2 chunks of information.

Design Principles and Implementation Strategies

Visual Affordances and Feedback Loops

Effective recognition-oriented design employs perceptual signifiers that guide users intuitively:

1. Menu Systems and Iconography

Standardized icons (e.g., magnifying glass for search) achieve faster recognition than text labels alone. Google Docs exemplifies this by pairing toolbar icons with hover-activated labels, ensuring both novice and expert users understand functions without memorization.

2. Autocomplete and Predictive Input

Search interfaces using autocomplete — like Google's query suggestions — reduce keystrokes and mitigate tip-of-the-tongue phenomena where users know a concept but cannot recall its full name. E-commerce platforms like eBay leverage this by displaying recent searches and trending products, shortening purchase paths.

3. History and Continuity Features

Systems that surface past interactions—such as Netflix's "Continue Watching" or messaging apps' recent contacts—eliminate recall demands.

Case Studies and Empirical Validation

Adobe Photoshop's Recent Files List

Photoshop's "Open Recent" feature exemplifies recognition by displaying the last 10 edited files. This often reduces file retrieval time, since it’s common for users to return to a file they worked on recently.

E-Commerce Personalization

Amazon's "Frequently Bought Together" section leverages recognition through associative recommendations. Users purchasing cameras are shown compatible tripods and lenses, sometimes increasing order value.

Comparative Analysis with Related Heuristics

Synergy with Error Prevention (Heuristic 5)

Recognition interfaces inherently prevent errors by constraining choices. For example, date pickers eliminate invalid entries (e.g., February 30), aligning with Nielsen's fifth heuristic.

Tension with Aesthetic Minimalism (Heuristic 8)

Balancing recognition with minimalist design poses challenges. Overloading interfaces with cues can cause clutter, but solutions like collapsible menus or gesture-based navigation (e.g., Pinterest's hidden search filters) maintain cleanliness while preserving discoverability.

Future Directions and Emerging Trends

AI-Driven Predictive Interfaces

Generative AI models like GPT-4 can anticipate user needs, offering recognition-based prompts. For example, GitHub Copilot suggests code completions, reducing developers' recall burden.

Augmented Reality (AR) Cueing

AR navigation apps (e.g., Google Maps Live View) overlay directional arrows on real-world environments, cutting wayfinding recall demands.

Conclusion

Jakob Nielsen's sixth heuristic remains a cornerstone of user-centered design, bridging cognitive theory and practical interface optimization. By transforming recall tasks into recognition opportunities, designers reduce cognitive load, enhance efficiency, and foster inclusive digital experiences. As AI and AR technologies evolve, the principle will continue to shape innovations—from adaptive UIs to neurodiverse-friendly platforms—ensuring interfaces align with the innate strengths of human memory.

UX Roundup: OpenAI Product Roadmap | Parallel and Iterative AI Use | Virtual Boyfriends | Jakob in SF | Dropdown Menus | Early GUI | AI Urgency | Perplexity Deep Research

Grok 3 Launches Monday Night

OpenAI Product Roadmap

Parallel and Iterative Use of AI Improves Quality Outcomes

Virtual Boyfriends Are Popular

Speaking Dovetail Conference, San Francisco

Dropdowns Must Die

Early GUI Prototypes

UX Needs to Feel More Urgency About AI

Perplexity Deep Research

Recognition Rather Than Recall: A Comprehensive Analysis of Jakob Nielsen's Sixth Usability Heuristic (Made with Perplexity Deep Research)

Theoretical Foundations of Recognition Over Recall

Design Principles and Implementation Strategies

Case Studies and Empirical Validation

Comparative Analysis with Related Heuristics

Future Directions and Emerging Trends

Conclusion

Recent Posts

Top Past Articles

Design Leaders Should Go “Founder Mode”

4 Metaphors for Working with AI: Intern, Coworker, Teacher, Coach

Dark Design Patterns Catalog

UX Angst of 2023-24

Jakob’s Law of the Internet User Experience

Ideation Is Free: AI Exhibits Strong Creativity, But AI-Human Co-Creation Is Better

The 10 Usability Heuristics Reimagined

UX Needs a Sense of Urgency About AI

AI Is First New UI Paradigm in 60 Years