Developers at leading U.S. AI firms are praising the DeepSeek AI models that have leapt into prominence while also trying to poke holes in the notion that their multi-billion dollar technology has been bested by a Chinese newcomer’s low-cost alternative.
Chinese startup DeepSeek on Monday sparked a stock selloff and its free AI assistant overtook OpenAI’s ChatGPT atop Apple’s App Store in the U.S., harnessing a model it said it trained on Nvidia’s lower-capability H800 processor chips using under $6 million.
As worries about competition reverberated across the U.S. stock market, some AI experts applauded DeepSeek’s strong team and up-to-date research but remained unfazed by the development, said people familiar with the thinking at four of the leading AI labs, who declined to be identified as they were not authorized to speak on the record.
What is DeepSeek and why is it disrupting the AI sector?
WHY IS DEEPSEEK CAUSING A STIR?
The release of OpenAI’s ChatGPT in late 2022 caused a scramble among Chinese tech firms, who rushed to create their own chatbots powered by artificial intelligence.
But after the release of the first Chinese ChatGPT equivalent, made by search engine giant Baidu 9888.HK , there was widespread disappointment in China at the gap in AI capabilities between U.S. and Chinese firms.
The quality and cost efficiency of DeepSeek’s models have flipped this narrative on its head. The two models that have been showered with praise by Silicon Valley executives and U.S. tech company engineers alike, DeepSeek-V3 and DeepSeek-R1, are on par with OpenAI and Meta’s most advanced models, the Chinese startup has said.
They are also cheaper to use. The DeepSeek-R1, released last week, is 20 to 50 times cheaper to use than OpenAI o1 model, depending on the task, according to a post on DeepSeek’s official WeChat account.
But some have publicly expressed scepticism about DeepSeek’s success story.
Scale AI CEO Alexandr Wang said during an interview with CNBC on Thursday, without providing evidence, that DeepSeek has 50,000 Nvidia H100 chips, which he claimed would not be disclosed because that would violate Washington’s export controls that ban such advanced AI chips from being sold to Chinese companies. DeepSeek did not immediately respond to a request for comment on the allegation.
Bernstein analysts on Monday highlighted in a research note that DeepSeek’s total training costs for its V3 model were unknown but were much higher than the $5.58 million the startup said was used for computing power. The analysts also said the training costs of the equally-acclaimed R1 model were not disclosed.
WHO IS BEHIND DEEPSEEK?
DeepSeek is a Hangzhou-based startup whose controlling shareholder is Liang Wenfeng, co-founder of quantitative hedge fund High-Flyer, based on Chinese corporate records.
Liang’s fund announced in March 2023 on its official WeChat account that it was “starting again”, going beyond trading to concentrate resources on creating a “new and independent research group, to explore the essence of AGI” (Artificial General Intelligence). DeepSeek was created later that year.
ChatGPT makers OpenAI define AGI as autonomous systems that surpass humans in most economically valuable tasks.
It is unclear how much High-Flyer has invested in DeepSeek. High-Flyer has an office located in the same building as DeepSeek, and it also owns patents related to chip clusters used to train AI models, according to Chinese corporate records.
High-Flyer’s AI unit said on its official WeChat account in July 2022 that it owns and operates a cluster of 10,000 A100 chips.
HOW DOES BEIJING VIEW DEEPSEEK?
DeepSeek’s success has already been noticed in China’s top political circles. On January 20, the day DeepSeek-R1 was released to the public, founder Liang attended a closed-door symposium for businessman and experts hosted by Chinese premier Li Qiang, according to state news agency Xinhua.
Liang’s presence at the gathering is potentially a sign that DeepSeek’s success could be important to Beijing’s policy goal of overcoming Washington’s export controls and achieving self-sufficiency in strategic industries like AI.
A similar symposium last year was attended by Baidu CEO Robin Li.
OpenAI CEO Sam Altman wrote on X that R1, one of several models DeepSeek released in recent weeks, “is an impressive model, particularly around what they’re able to deliver for the price.” Nvidia said in a statement DeepSeek’s achievement proved the need for more of its chips.
Software maker Snowflake decided Monday to add DeepSeek models to its AI model marketplace after receiving a flurry of customer inquiries.
With employees also calling DeepSeek’s models “amazing,” the U.S. software seller weighed the potential risks of hosting AI technology developed in China before ultimately deciding to offer it to clients, said Christian Kleinerman, Snowflake’s executive vice president of product.
“We decided that as long as we are clear to customers, we see no issues supporting it,” he said.
Meanwhile, U.S. AI developers are hurrying to analyze DeepSeek’s V3 model. DeepSeek in December published a research paper accompanying the model, the basis of its popular app, but many questions such as total development costs are not answered in the document.
China has now leapfrogged from 18 months to six months behind state-of-the-art AI models developed in the U.S., one person said. Yet with DeepSeek’s free release strategy drumming up such excitement, the firm may soon find itself without enough chips to meet demand, this person predicted.
DeepSeek’s strides did not flow solely from a $6 million shoestring budget, a tiny sum compared to $250 billion analysts estimate big U.S. cloud companies will spend this year on AI infrastructure. The research paper noted that this cost referred specifically to chip usage on its final training run, not the entire cost of development.
The training run is the tip of the iceberg in terms of total cost, executives at two top labs told Reuters. The cost to determine how to design that training run can cost magnitudes more money, they said.
The paper stated that the training run for V3 was conducted using 2,048 of Nvidia’s H800 chips, which were designed to comply with U.S. export controls released in 2022, rules that experts told Reuters would barely slow China’s AI progress.
Sources at two AI labs said they expected earlier stages of development to have relied on a much larger quantity of chips. One of the people said such an investment could have cost north of $1 billion.
Some American AI leaders lauded DeepSeek’s decision to launch its models as open source, which means other companies or individuals are free to use or change them.
“DeepSeek R1 is one of the most amazing and impressive breakthroughs I’ve ever seen – and as open source, a profound gift to the world,” venture capitalist Marc Andreessen said in a post on X on Sunday.
The acclaim garnered by DeepSeek’s models underscores the viability of open source AI technology as an alternative to costly and tightly controlled technology such as OpenAI’s ChatGPT, industry watchers said.
Wall Street’s most valuable companies have surged in recent years on expectations that only they had access to the vast capital and computing power necessary to develop and scale emerging AI technology. Those assumptions will come under further scrutiny this week and the next, when many American tech giants will report quarterly earnings.
Click here to change your cookie preferences