Summary: User productivity was 158% higher when answering questions with ChatGPT than with Google. Satisfaction scores were also much higher for AI users than for search users. As with previous research, AI use narrowed the skill gap between users at different education levels.
For more than 25 years, search has been the dominant way for Internet users to get answers to their questions. Google has profited from this user behavior to the tune of a $1.6 Trillion market cap as of August 2023, as have many other search engine companies around the world.
Does the venerable search still reign supreme in gratifying our relentless quest for knowledge? Not according to new research by Ruiyun (Rayna) Xu and colleagues from Miami University, Hong Kong Polytechnic University, and The University of Hong Kong. The researchers recruited 95 participants and randomly assigned 48 to use ChatGPT and 47 to use Google.
The findings echo those of previous research on the productivity impact of using AI:
Users are faster with AI than with traditional tools (here, Google)
AI narrows skill gaps, disproportionately aiding those who struggle most with traditional tools
Users like AI more than legacy technology
How do users get their burning questions answered? Used to be, they would turn to a search engine. Now, people increasingly ask AI. (“Question mark” by Leonardo.AI)
Study Setup
In either experimental condition, participants attempted to answer three problems:
Who was the first woman in space?
Identify 5 websites that can be used to book a flight between Phoenix and Cincinnati.
Check the accuracy of three statements about the 2009 Copenhagen Climate Change Conference: the summit dates, how well the agreement reached at the conference matched the expectations of the UK government, and the proposal presented by the United States government.
Task 1 was a simple information-finding problem, which was made a little more difficult by the fact that many websites in the United States (where the study was conducted) describe the first American woman to travel into space and not the first human woman (who was from the Soviet Union). Task 2 was possibly a little biased in favor of Google by asking for a list of websites, which is exactly what a traditional search engine produces by default. Finally, task 3 was a complex fact-checking task where users could easily be led astray.
Users were timed, the correctness of their answers was scored, and they answered a subjective-satisfaction questionnaire.
ChatGPT Beats Google, Big Time
The research used ChatGPT version 3.5, not the much better version 4 that’s the current product. Despite this considerable handicap for the AI side, it defeated Google resoundingly. The time to answer the three questions was:
ChatGPT: 5.79 minutes
Google: 14.95 minutes
This corresponds to a productivity gain of 158% when using AI instead of Google. This is the largest productivity gain I have seen so far in the studies I have analyzed. (The runner-up is the 126% productivity gain for programmers using the GitHub Copilot.) The difference between these two task times was statistically significant at p<0.01.
Productivity is calculated as follows: with ChatGPT, users can perform 10.4 of these task sets per hour, whereas with Google, they can only perform 4.0 task sets per hour. 10.4/4.0 = 2.58, which is the ratio between the amount of work produced with the two tools, corresponding to a lift of 158% for ChatGPT.
Even though people performed the tasks much faster with AI than with search, the quality of their answers to the questions was unchanged. The users’ solutions to the tasks were scored on a scale of 0–10, where 0 would indicate a totally wrong solution and 10 a completely perfect one. ChatGPT scored 8.55, and Google scored 8.77. The difference between these two numbers was within the margin of error in the study and not statistically significant.
Thus, the difference is likely a simple matter of randomness. But Google might just possibly have edged out ChatGPT in answer accuracy by a whisker. This was due to the use of ChatGPT 3.5 in the study. It produces one of the notorious AI hallucinations when asked, “Is the following statement true or false? ‘The 2009 United Nations Climate Change Conference, commonly known as the Copenhagen Summit, was held in Copenhagen, Denmark, between 7 and 15 December.’” Even though the statement is false, ChatGPT 3.5 claims that it’s true, and the users in the study repeated this claim and consequently received a low score for their answers to task 3.
I just put the same question to ChatGPT 4, which correctly answered: “The statement is false. The 2009 United Nations Climate Change Conference, commonly known as the Copenhagen Summit, was held in Copenhagen, Denmark, but the dates were December 7 to December 18, 2009, not December 7 to December 15 as stated.”
Thus, if the study had been conducted with the currently-best AI version, ChatGPT would have outscored Google for answer accuracy.
Finally, users’ subjective satisfaction was far higher for AI than for search. On a 1–7 scale, with 7 indicating the highest level of satisfaction, the two technologies scored as follows:
| ChatGPT | Statistical significance | |
Information quality | 5.90 | 4.62 | p<0.01 |
Ease of use | 6.00 | 5.57 | p<0.1 |
Usefulness | 6.19 | 5.30 | p<0.01 |
Enjoyment | 5.87 | 4.74 | p<0.01 |
Satisfaction | 6.06 | 5.27 | p<0.01 |
The only question where ChatGPT didn’t totally dominate Google was ease of use. For this question, we should remember that most users have a decade or more of Google experience but only a few months of experience using ChatGPT. Familiarity with a user interface breeds usability. So it’s actually a shockingly poor performance on the part of Google that it scored below a new and unfamiliar tool for ease of use.
(Google users self-assessed their prior experience with search engines as 4.98 on a 7-point scale, whereas ChatGPT users self-assessed their prior experience with this tool as 2.83, indicating a huge — and expected — experience advantage for the Google users.)
AI as Egalitarian Catalyst
Much previous research has shown that using AI narrows the gap between the best performers and the worst performers. This holds true for productivity gains and for creativity and ideation tasks. The gaps are narrowed because AI helps poor performers more than it helps the highest-skilled humans.
This new study roughly confirms that this finding holds for question-answering as well. On task 1 (find a fact), the AI users performed equally well, no matter their educational level, whereas highly-educated Google users performed better than their less-educated counterparts. (Education was scored as a 5-value parameter: no college, some college, bachelor’s degree, master’s degree, doctorate.)
Similarly, on task 3 (assess the truth of statements about the climate summit), the AI users performed roughly the same, no matter their education. In contrast, the Google users performed better the more educated they were.
Task 2 (find websites for buying airline tickets) was inconclusive regarding the impact of education on performance.
Even though the conclusion is not as strong in this study as the previous research, it’s still the same: using AI narrows the skill gap between poorly educated and highly educated users.
Battle of the Bots: Google and ChatGPT are duking it out for question-answering superiority. In the recent study, AI beat search, which bodes poorly for Google’s future, even though I expect them to survive the fight. (“Fighting robots” by Leonardo.AI)
Is Google Doomed?
The research study I’m discussing clearly shows that AI is superior to traditional Google search, particularly if using ChatGPT 4 instead of 3.5, as employed in the study. AI gives faster results, users like it much better, and it is particularly helpful for people without graduate degrees.
I guess that Google is well aware of this situation, having conducted its own internal competitive usability study of Google search vs. ChatGPT and other modern AI tools. (I have no internal information from Google to confirm the veracity of my guess — if I did possess confidential information, I would not be writing this article.) This again means that Google’s upper management would have seen the writing on the wall several months ago when ChatGPT 4 was released on March 14, 2023.
In any case, the publicly available usability data that we now have (and which Google management surely had their staff collect for them in March unless they’re completely incompetent) conclusively shows that it was not an option for Google to rest on its old-school search laurels. If they had done so, they would definitely be doomed.
Now, Google has a chance to create an integration of search and AI that combines the best of both worlds, with fewer hallucinations and more updated information. Will this suffice for them to survive? I think they will but in much-reduced circumstances.
First, dedicated AI providers move much faster and don’t have the legacy of a hundred thousand employees with old-school thinking. Currently, Google is estimated to have 47,400% more employees than OpenAI. (I severely criticize OpenAI’s lack of UX staff and their correspondingly inadequate usability, but even if OpenAI hired an entire 50-person UX department, that wouldn’t change the relative legacy-thinking drag on Google by much.) AI may be able to integrate updated information reasonably soon.
Second, even if Google does achieve parity with startup AI providers, its cash cow will suffer a severe wound. Traditional search is the only user experience on the Internet perfect for advertising: users tell you precisely what they are about to buy, and the search engine serves up a list of places to buy, many of which are paying advertisers.
The AI user experience seems much closer to using a traditional content website, where banner blindness has long dictated dramatically lower advertising rates than what Google has been enjoying. Thus, it’s likely that Google will survive, but it will no longer be a cash machine.
Google’s fountain of infinitely flowing gold from search ads is about to run dry, as many users switch to AI, which is likely much less lucrative in terms of advertising revenue. Replacing search dollars with AI pennies, as it were. (Gold fountain by Leonardo.AI)
(Full disclosure: I was on the advisory board for Google back when the company was a start-up so small that we held board meetings around the ping-pong table because it was the only table big enough in the entire company to host a meeting. I have sold all the stock I received from them and now have no financial interest in the company.)
Reference
Ruiyun Xu, Yue Feng, and Hailiang Chen (2023): “ChatGPT vs. Google: A Comparative Study of Search Performance and User Experience.” arXiv:2307.01135, DOI: 10.48550/arXiv.2307.01135