top of page
Writer's pictureJakob Nielsen

AI With “Computer Use”: Agents, Accessibility Fix, Facilitating Usability Tests

Summary: AI’s new capability to operate computers directly could redefine user experience, from aiding disabled users to automating complex tasks. AI will soon be able to run usability studies, once it can observe the user’s actions and the computer’s response.

 

The Claude AI foundation model has launched a new feature called “Computer Use.” Not very imaginative branding, but I like names that simply say what the product does rather than introduce unnecessary mystery. In this case, “Computer Use” means that the AI can now use your computer. It does what it says on the tin!


There are two aspects to Claude with Computer Use:


  • Claude can see what happens on the computer screen and use this visual information to interpret what the computer is doing.

  • Claude can act on the objects present on the computer screen: it can pull down menus, click commands, buttons, and links, and it can type text into form fields.


In other words, AI operates the user interface much like a human user would: it understands the system state and evaluates its options for the next step, after which it executes the appropriate action.


With “Computer Use,” the AI can take control of the user’s computer and operate the user interface by clicking or typing, because it sees and understands the UI elements and content on the screen. (Leonardo)


AI Agents

The intended application of “Computer Use” is to facilitate the much-hyped AI agents, where the AI takes action on behalf of the user. For example, I was recently stuck in Heathrow Airport overnight after missing my connection due to significant delays of my incoming flight from Oslo. I needed a place to stay for the night, and an AI agent could perform the following steps on my behalf:


  1. Research what hotels are close to Heathrow and have an available room for that night.

  2. Evaluate these room options based on my preferences and choose the best place for me to stay.

  3. Book that room on the hotel’s website, including entering my credit card information, loyalty program membership number, etc.


Steps 1 and 3 are relatively easy for current AI, and will be child’s play for the next-generation AI we expect to get in a few months. I am less certain about the feasibility of step 2, at least until we move up the AI capability ladder by another two or even three generations (i.e., until 2027 or even 2030). Human preferences are highly changeable and not so easy for an AI to learn, even after intense observation of its human.


AI agents are supposed to coordinate multiple applications and websites on behalf of the user, to make a desired action happen. Many analysts think this is the next big thing in AI. (Midjourney)


For example, in this specific case, I had to catch my rebooked flight the next morning. Therefore, I prioritized a hotel within walking distance of Terminal 5, because I didn’t want the added uncertainty of a ground transfer. After checking in for my flight, I planned to have breakfast in the lounge, meaning that I did not want a room rate that included breakfast, nor did I care whether the hotel served a good or bad breakfast. (Normally, I care about breakfast and prefer hotels with good food.)


For the next few years, I think the better use of AI agents is to refrain from full autonomy, at least for complicated tasks that depend on user preferences. It would be very useful to have AI agents that could perform research tasks without supervision, then present a shortlist of options in a comparison table with explanations that relate to known user needs, and then perform any follow-up work (such as making the reservation) after the user has made his or her choice.


Accessibility Fix

AI “Computer Use” has the potential to drastically improve usability for disabled users within a year or two, while we wait for the ultimate solution to the accessibility problem.

As I described in a previous article, the old-school approach to accessibility has failed to provide disabled users with acceptable usability. Traditional accessibility is too convoluted (and thus too expensive) for most companies to build proper systems that cater to this relatively small customer base.


In the long run, the solution must be to abandon the idea of using assistive technology to transform a user interface that’s designed for non-disabled users into one that presents information and actions in a completely different medium. It’s a lost cause to take a graphical user interface (a two-dimensional visual medium) and try to read it out loud for a blind user (using a one-dimensional auditory medium). Usability will always be terrible, and the only reason disabled users agree to use these systems is that they have been the only option until now.


In the long run, generative UI offers the proper way to empower disabled users: AI can generate a custom user interface for each user, optimized for his or her specific circumstances. For example, if the user is blind (or is a jogger wanting to keep eyes on the trail and interact by voice and hearing), generative UI will create a fully auditory user experience, optimizing for this medium, rather than treating is as a poor second cousin.


Unfortunately, it will likely take us about 5 years before AI can realize this vision.

In the meantime, “Computer Use” can come to the rescue in only maybe a year or so. During this interregnum (starting next year and ending in about 5 years), user interfaces will remain the same, but usability for disabled users will improve immensely by delegating the job of operating those user interfaces to an AI.


The AI will interpret the GUI, even if the user can’t see it. The AI will summarize the available options and whatever content it thinks the user will be interested in. (Sorry to reveal this, but most users only care about a small fraction of your content. Sighted users scan your text, but blind users currently suffer under linear reading-aloud, which makes it hard to skip to the stuff they want.)


Of course, the user can always ask the AI to elaborate on anything it summarized too briefly.

After reviewing the options, the user will decide how to proceed and tell the AI what he or she wants to happen. The AI will then issue the corresponding commands to the computer, and the interaction cycle continues to the next step.


AI with “Computer Use” can operate a complicated user interface on the user’s behalf. This frees users with disabilities from needing to know the specific controls and from having to directly interact with something they can’t see or have difficulties touching. The user can focus on the desired outcome while making the AI do the detailed work. (Ideogram)


Facilitating Usability Tests

The final benefit I foresee from “Computer Use” is rather a specialized one, but it’s my specialty, so I care about it. AI will now be able to be the facilitator in a usability study.


Until now, it’s been a distinct limitation of the use of AI in user research that it didn’t know what the user was doing. AI can analyze questionnaire responses and interview transcripts. But analyzing a usability test session was beyond AI, because it could only know what the user was saying and not what the user was doing. (As I’ve said a million times, it’s vastly more important in user research to watch what the users are doing than to listen to what they are saying.)


However, step one of “Computer Use” is the ability for the AI to observe the computer screen and understand what’s happening with the UI. We need a few additional capabilities to turn this into a full-fledged observer of a usability test: the AI must also observe and interpret the user’s actions. I think that this added feature may take a year to materialize because it’s probably not a high priority for the AI labs to help us UX professionals. However, the feature should be easy enough to build because “Computer Use” already can analyze the relationship between actions and results in a UI and identify those actions.


Initially, “Computer Use” may simply serve as a kind of wingman for a human study facilitator: The AI will watch and interpret the study and take notes that will help the human in completing the results.


But stepping up from observer to facilitator should happen shortly afterwards.


Enabling an AI to run usability studies will drop the cost of user research, allowing us to conduct more research. User research will also finally become fully international, representing all the world’s users and not simply the ones who speak the same language as the design team. It’s trivial for AI to translate to and from any major and mid-sized language. (Truly small languages like, say, Faeroese — the language for 48,000 people on the Faroe Islands — do lack behind.)


While I believe that AI with “Computer Use” will soon be able to facilitate user test sessions, I think it will take a few more years before we can trust it to analyze the data and present findings and recommendations to the design team. We’ll likely still need human UX experts to interpret the outcome of AI-facilitated test sessions.


Let me end by warning about a possible additional step, beyond having an AI facilitate usability studies with real (human) customers. Do not replace those test users with an AI. The facilitator can easily be AI, once AI capabilities allow it to observe user actions adequately. However, study participants must be real, representative users, not simulated users. The entire purpose of any user research is to uncover the behavior of target customers — and they are always unpredictable, which is why we continue to need user research after these many years of accumulating usability insights.


YES: We will soon have AI capable of observing (and even facilitating) users during a usability study. (Midjourney)


NO: Do not use AI to test another AI that pretends to be a human user. (Midjourney)


Conclusion

Many current digital barriers will dissolve as AI learns to see and touch our virtual world. AI interface manipulation presents three key opportunities: automated task completion, enhanced accessibility solutions, and streamlined user research. We won’t get all these benefits immediately, but a timeframe of a year or two is realistic, now that the basic capability has been built.

Top Past Articles
bottom of page