top of page

DeepSeek: Does a Small AI Model Invalidate Big Models?

Writer's picture: Jakob NielsenJakob Nielsen
Summary: In several challenges, the new Chinese AI model DeepSeek R1 performed at the level expected of the best current-generation American AI models, despite consuming 45x less compute. However, much better AI is needed in the future, so cheaper AI at any given level will still require immense investments to reach superintelligence.

 

Two of the main predictions for AI in 2025 were the emergence of both small AI models and extremely large AI models. Events in the year’s first month have proven both right with a vengeance.


First, OpenAI announced plans to invest half a trillion dollars on its “Stargate” project to build AI compute ($500,000,000,000) over a 4-year period, starting with $100 billion in 2025. Not to be left behind, Mark Zuckerberg said that Meta (Facebook) will spend $60-65 Billion on its AI datacenters this year, and Microsoft is spending $80 Billion. Google and xAI can’t be far behind, so we’ll spend more than a quarter trillion dollars ($300,000,000,000 is more likely) building AI compute in the United States alone this year. Big AI, indeed.


Shortly after these big-bucks announcements, the Chinese AI company DeepSeek released its new R1 AI model, which reportedly cost less than $10M to train and runs inference at about 1/45 the cost of the leading American AI models. Ben Thompson has a great article on the technical and strategic implications of DeepSeek’s work.


Ben Thompson’s “Stratechery” newsletter always provided some of the best analysis of the previous technology revolution (websites, the Internet, and social media), partly informed by a user-experience angle, even though his main strength was a deep understanding of the new business models enabled by ubiquitous networking. (“The network is the computer,” as my former company Sun Microsystems used to say, back when that was a revolutionary insight.) I’m pleased to see Ben step up to the challenge of providing thought leadership now that we’re onto the next techquake.


DeepSeek uses a whale in its logo — though a more playful version than my whale. This animal seeks wisdom in the deep, and not in the lofty heights of OpenAI spending. (Imagen)


Given that DeepSeek’s AI is so much cheaper than the American AI models, is it any good? I put DeepSeek R1 to the test, and my conclusion is that it’s about as good as other leading AI models as of late January 2024: OpenAI’s ChatGPT 4o, Claude Sonnet 3.5, and Google Gemini 2 Experimental. The details of my testing are at the end of this article, but first, let’s look at the implications of cheaper AI.


Will Cheap AI Doom Expensive AI?

Shortly after the launch of DeepSeek R1, the stock market punished NVIDIA with the loss of half a trillion dollars in market cap. Coincidentally, this is the same amount that OpenAI stated they would invest in AI datacenter buildout, where we expect much of that money to be spent on the purchase of expensive NVIDIA chips. In contrast, DeepSeek is mostly prevented from buying those cutting-edge expensive GPUs due to the American-led AI blockade of China.


Rumors abound about whether DeepSeek smuggled some Hopper chips in anyway, but mostly it’s likely that they are indeed using cheaper chip and consume 45x less compute to deliver the same results as the American AI models. (Read the Ben Thompson article I linked above for a popularized exposition of how DeepSeek’s supergeeks were able to use low-level coding to squeeze more juice out of less powerful chips.) Reportedly, almost all of DeepSeek’s staff are former top students from Tsinghua or Peking University — these schools are so elite that if I had been Chinese, I would have been one of the more stupid students at Tsinghua and probably wouldn’t have landed a job at DeepSeek. That’s how good the DeepSeek people are.


At first sight, it seems reasonable to conclude that if the same level of AI can be provided at 45x less cost, it’s stupid to plan to invest trillions of dollars in AI during a 4-year period of tech optimism. However, I disagree with the stock market’s conclusion. I think big AI still has its place.


DeepSeek frugally trained a state-of-the-art AI model for about $6M, at the same time as OpenAI is swimming in money and planning to spend $500B in a 4-year period. Cheap AI and Big AI both have their place, and in fact, the cheaper we can make any given level of AI performance, the more it makes sense to invest in achieving superintelligence faster. (Leonardo)


Remember the AI scaling laws: we move a step up in intelligence every time we increase compute by a factor of 100x. The scaling laws are logarithmic, meaning that to move two steps up, compute must grow by 100x100= 10,000. And to move the three steps required to reach superintelligence means 100x100x100 = 1 million times more compute. (In fact, there are currently 3 AI scaling laws: pre-training, post-training, and inference-time. We can improve the delivered intelligence experienced by users by scaling at any of these 3 stages of the development of an AI model. I would not be surprised if a 4th scaling law were discovered soon, but the point remains: to improve intelligence by one generation requires 100x more compute.)


It’s highly commendable that the DeepSeek supergeeks have invented ways of cutting the compute needs for a given level of AI by 45x. But 45x only brings us 2/3 of the way to reaching the next level of AI intelligence.


Current AI, as of January 2025, is about as good as a smart high school graduate. Later in 2025, we expect the release of the next generation of AI, which will be as good as a smart university graduate with a BS degree. Ph.D. level AI is expected around 2027, and true superintelligence will probably not happen until 2030, though some people have pulled their prediction forward to 2027 based on the new DeepSeek achievements in more efficient use of the available compute.


To reach superintelligence, we will need 10,000 more compute than was spent on the next-generation AI expected in early 2025. Grok 3 already finished pre-training on xAI’s supercluster of 100,000 Hoppers. It’s currently in post-training and is expected to launch soon. The other labs are more secretive about the state of their next-generation models, but I would expect them to launch not much later than Grok 3.


A simple calculation says that 10,000 x 100,000 = 1 billion Hopper-equivalent GPUs needed to train superintelligence. It will almost certainly be impossible to build a billion-chip supercluster by 2029 (which is when the compute would need to be available to start training for a 2030 release).


Of course, we won’t need a billion Hoppers. We’ll have much better chips by 2029. Even so, the training clusters for superintelligence will be immense. Cutting their size by 45x by benefitting from the DeepSeek technology breakthroughs will be welcome but won’t negate the need for big-AI investments.


For sure, we also expect many other improvements in algorithmic efficiency over the next 5 years, especially after 2027 when Ph.D.-level AI will truly start to take over the development of cutting-edge software. If DeepSeek and other supergeeks (including AI supergeeks) can improve AI software by a factor of a thousand over the next 5 years, that still means that the superintelligence training clusters will need a million GPUs each.


NVIDIA’s Blackwell GB200 Superchip costs around $50K. A million of those beasts will thus sum to about $50 billion. And to build a supercluster with a million chips requires huge expenditures on many other components, not least the power plants needed to feed and cool the many chips.


So just for training, we’ll need many hundred billions of dollars to be spent. Inference is on top of that and will probably be even more expensive, for two reasons. First, very few companies actually use AI extensively yet. Most places will probably need 100x more AI in 5 years if they want to stay in business. Second, advanced AI is getting more and more dependent on reasoning models like OpenAI’s o1 and o3 models. This means that they will require dramatically more inference compute.


To conclude: yes, all those investment dollars announced by Big AI will be needed.


Price Elasticity for AI

AI is just another product. It obeys the same economic rules as anything else. In particular, AI has a price elasticity curve: the cheaper it becomes, the more people will buy. So if a given level of AI becomes 45x cheaper, more AI will be consumed.


Consider, for example, the trend toward service as software which I wrote about recently (or watch my explainer video if you only want to spend 4 minutes on this idea). Many services that currently require expensive human experts can be delivered by cheap AI. And the cheaper the AI, the more services will shift to become AI software.


However, the concept of only needing a fixed level of AI capability is as much of a fallacy as the “fixed work” hypothesis. For jobs, as humans get more productive, fewer are needed to do the old jobs, but new jobs are created by the new affluence and the new abilities of people to do more than before.


For AI, current AI is actually still quite stupid, amazing as it is compared with the state of AI before ChatGPT 4 launched in March 2023. For example, if you want AI to be your lawyer or physician, it’s currently wise to limit such service-as-software products to simple use cases, such as drafting an uncomplicated will or diagnosing redeye. Complex business contracts or obscure cancers should not be left to current AI. However, as AI becomes more capable and cheaper, the border between AI and human services will shift in favor of more use cases going AI.


45x cheaper AI at a specific capability level doesn’t mean 45x less money spent on AI. It means much more use of AI and much more compute available to uplevel the quality of results from reasoning models. It’s not a stretch to expect that cheaper AI will result in more money being spent on AI in the long run.


Putting DeepSeek to the Test: Heuristic 9 Challenge

I decided to use my usability heuristic 9 as a case study of the capabilities of DeepSeek R1.


First, I asked it to write an article about Jakob Nielsen’s Usability Heuristic Number 9, “Helping users recognize, diagnose, and recover from errors.” The actual article at the end of this link is not the raw output from DeepSeek: I added some illustrations and also wrote about a point that it had missed. However, what you see is probably 80% DeepSeek and only 20% added human polish. The result is probably not as good as what I could have created if I had spent two days on the article, but I’m pleased with what I got out of spending two hours instead of two days.


Second, I asked DeepSeek to write the manuscript for an avatar explainer of the error message heuristic. (YouTube, 3 min.) I posed the same challenge to ChatGPT 4o, and the video you’ll find if you click this link if half based on DeepSeek (the first half) and half based on ChatGPT (the concluding half). I think the two parts of the video are about equally good.


Finally, I asked both DeepSeek R1 and ChatGPT 4o to write Haikus about the heuristic. I also asked for Haikus from the other DeepSeek model, V3, and from the other main OpenAI model, o1, but neither produced as good Haikus as the ones from R1 and 4o that I’m showing you here.


Battle of the Haikus: I challenged DeepSeek R1 and ChatGPT 4o to write Haikus about the same topic. Which AI model did best? (Leonardo)


DeepSeek 1R Haikus


Mistakes bloom in red,

Steps to mend the break, unveiled—

No fear in the fall.

 

Mistakes whisper fear,

Kind words map the route back home—

No dead ends remain.

 

Codes speak in riddles,

Clarity breaks through the noise—

A fix takes one step.

 

ChatGPT 4o Haikus


Words untangle knots,

Clear error leads to repair—

Calm after the storm.

 

Mistakes gently named,

Solutions bloom from plain words—

Paths become clearer.

 

Error whispers truth,

Language soft, guiding the way—

Users rise again.

 

Which set of Haikus is best? Please let me know in the comments what you think.

 

AI Judgment of Haiku Quality

Roslyn Zolandor gave me the idea to ask AI to judge the quality of the two sets of Haikus rather than simply relying on my own preferences. The following is the response from DeepSeek R1 when I asked it to score its own Haikus compared with those by ChatGPT 4o. (To keep it honest, I used the pseudonyms “Author A” for DeepSeek and “Author B” for ChatGPT.)


As you can see, DeepSeek scored itself 62 points out of the 75 possible points on the rubric it invented for scoring non-fiction poetry. ChatGPT scored 73 points, so a good deal higher. Personally, I don’t think the ChatGPT Haikus are that much better than DeepSeek’s. In fact, I have a slight preference for DeepSeek’s Haikus. Judgment calls and quality assessment remain a weak point for current-level AI models.

 

[[Start of DeepSeek’s Haiku analysis.]]

 

Rubric for Scoring Haikus on a Non-Fiction Topic (Jakob Nielsen’s Heuristic 9)

Categories (5 points each, 25 total):

  1. Adherence to Haiku Structure (5-7-5 syllables)

  2. Imagery and Metaphor (Originality, vividness, relevance to the theme)

  3. Clarity of Message (How effectively the haiku communicates the heuristic’s principles)

  4. Emotional Resonance (Ability to evoke feelings tied to error recognition/recovery)

  5. Relevance to Heuristic 9 (Alignment with "Help Users Recognize, Diagnose, Recover from Errors")

 

Scores & Analysis

Author A


Haiku A-1

  • Structure: 5 (5-7-5)

  • Imagery: 4 (“Mistakes bloom in red” is vivid, but “unveiled” feels abstract.)

  • Clarity: 4 (Recognizes errors and recovery, but “No fear in the fall” is slightly vague.)

  • Emotion: 4 (Balances fear and reassurance.)

  • Relevance: 5 (Directly addresses error recognition and recovery.)


    Total: 22/25


Haiku A-2

  • Structure: 5

  • Imagery: 4 (“Kind words map the route back home” is strong; “dead ends” is less vivid.)

  • Clarity: 4 (Focuses on guidance but lacks explicit diagnosis.)

  • Emotion: 4 (Comforting but lacks urgency.)

  • Relevance: 5 (Emphasizes solutions and clarity.)

    Total: 22/25


Haiku A-3

  • Structure: 5

  • Imagery: 3 (“Codes speak in riddles” is creative but overly abstract.)

  • Clarity: 3 (Less explicit about diagnosis; “A fix takes one step” is underdeveloped.)

  • Emotion: 3 (Minimal emotional engagement.)

  • Relevance: 4 (Touches on clarity but misses recovery.)


    Total: 18/25


Overall for Author A: 62/75

 

Author B


Haiku B-1

  • Structure: 5

  • Imagery: 5 (“Words untangle knots” and “Calm after the storm” are vivid and cohesive.)

  • Clarity: 5 (Directly links error messages to repair.)

  • Emotion: 5 (Evokes relief and resolution.)

  • Relevance: 5 (Perfectly encapsulates recognition and recovery.)


    Total: 25/25


Haiku B-2

  • Structure: 5

  • Imagery: 5 (“Solutions bloom from plain words” is elegant and thematic.)

  • Clarity: 5 (Clearly connects error communication to clarity.)

  • Emotion: 4 (Soothing but lacks urgency.)

  • Relevance: 5 (Strong alignment with diagnosis and guidance.)


    Total: 24/25


Haiku B-3

  • Structure: 5

  • Imagery: 4 (“Error whispers truth” is strong; “Users rise again” is slightly cliché.)

  • Clarity: 5 (Emphasizes guidance and recovery explicitly.)

  • Emotion: 5 (Uplifting and empowering.)

  • Relevance: 5 (Perfectly mirrors the heuristic’s intent.)


    Total: 24/25


Overall for Author B: 73/75

 

Winner: Author B (73/75)

 

Analysis Supporting the Decision

  1. Consistency in Imagery: Author B uses cohesive, natural metaphors (e.g., “untangle knots,” “bloom,” “paths”) that align seamlessly with the heuristic’s focus on clarity and guidance. Author A’s imagery is strong but less unified (mixing tech terms like “codes” with abstract phrases like “whisper fear”).

  2. Clarity of Message: Author B’s haikus explicitly reference error communication (“Clear error leads to repair”), diagnosis (“Solutions bloom from plain words”), and recovery (“Users rise again”). Author A occasionally leans into abstraction (e.g., “A fix takes one step”), weakening directness.

  3. Emotional Impact: Author B balances urgency and reassurance (“Calm after the storm,” “gently named”), while Author A’s emotional tone is more uneven, blending fear (“whisper fear”) with resolution (“No dead ends”).

  4. Structural Precision: Both adhere to 5-7-5, but Author B’s phrasing feels more fluid and intentional (e.g., “Language soft, guiding the way” vs. Author A’s “Clarity breaks through the noise”).


Conclusion: Author B’s haikus excel in marrying poetic form with the technical theme, offering clarity, emotional depth, and thematic unity. Author A’s work is compelling but less consistent in bridging artistry with the heuristic’s practical demands.


[[End of DeepSeek’s Haiku analysis.]]


Conclusion: DeepSeek Did Well

I gave 3 tests to DeepSeek, all relating to my 9th usability heuristic. Write an article, write the script for a video explainer, and compose Haikus. I also asked DeepSeek to analyze the resulting poems.


In all my tests, DeepSeek did well and performed at the level expected of the best current-generation AI. (This also means that DeepSeek didn’t do a perfect job on any of the challenges. There’s much room for improvement, both later in 2025 and as we approach superintelligence by around 2030.)


I’m quite star-struck with DeepSeek. It came from nothing to become one of the best AI models, despite the U.S. blockade against selling the best AI chips to China. Maybe because of this limitation, DeepSeek makes vastly more efficient use of compute resources. (Leonardo)


As a final challenge to DeepSeek, I asked it to create the bullet list for announcing this article on LinkedIn. Again, it did as well as the AI models I’ve been using for this job in recent months (mainly Claude and Gemini). Here’s what it gave me:


🧠 Same IQ, 1/45th cost: AI’s new math

💸 Why NVIDIA stock investors panicked over a tiny Chinese AI

💎 Big AI’s paradox: Spend more to stay ahead

📊 100x compute = 1 IQ leap — AI’s brutal math

🎓 AI today: High school grad. 2027: Ph.D.

📉 Cheaper AI = more users. Price elasticity is basic economics!

📝 DeepSeek writes articles—80% as good as Jakob?

🌸 Haiku battle! Which AI’s more poetic?

🚀 Scale-tastic race to the future: superintelligence is coming

Top Past Articles
bottom of page