DeepSeek – Impact on the AI Market
(GLG)-06/02/2025
As companies around the world race to keep up with the latest developments in AI, Chinese firm DeepSeek jolted the space on January 23 when it debuted its “R1” large language model (LLM), sending reverberations across markets.
To understand the wide-ranging impact of DeepSeek-R1’s launch, GLG’s Evan Moore sat down with industry veteran William Fong, PhD, whose more than 26 years of experience at Microsoft included leading the company’s AI and digital transformation efforts.
Here’s a summary of the key takeaways from the conversation:
Can you unpack the “mix-of-experts” approach that DeepSeek used? What are its advantages and pitfalls compared to conventional large language models?
Generally, you have a smaller language model, train it to be a specialized expert (like a mathematician, for example), and then, when you want to ask questions about that subject, you go to that specific small language model. But DeepSeek has managed to do that within the large model they published. They have multiple types of these experts within their 671 billion parameters. That is an advantage for users, because you don’t need to go to multiple models – you just go to one and you get really high-definition results from it.
There are other advantages. If you’re looking at an expert inside the model, you’re not using all the parameters. You’re only using whatever part that expert happens to live in. Your latency, your inferencing, your costs – all that goes down. You don’t need high-intensity chips to randomly take a stab in the dark on your 671 billion parameters. You know exactly where to go, because it knows exactly what your question is about.
There are some disadvantages, too. As you start to have more and more experts, computing gets complicated, expensive, and clogged up, because you have multiple experts struggling to do the work, and multiple parameters live at a time, because you’re not serving just one person. You have to remember this: You’re serving whoever is inferencing from that model, whoever, in any given instant, happens to be on a server somewhere. If you have multiple experts all working together, it can get very difficult to manage and coordinate.
It’s a balance. With DeepSeek, you get flexibility and adaptability in one model that does a lot, especially when you have multiple experts that you’ve trained specifically. But it gets difficult to maintain. And as you grow the number of experts, the amount of computing you use and the traffic controlling gets confusing as well. But overall, what they did is an absolute plus, in my opinion.
How do you think DeepSeek’s cost efficiency could impact Generative AI spending levels, model API revenue generation, and chip demand?
I’m very skeptical of the cost they’ve mentioned, which I don’t think is equivalent to the CapEx of a company. They’re just saying, “This is the computing cost.” It’s only what they’ve done to organize their training materials, pre-train the materials into their model, and do some fine-tuning.
They haven’t told us the costs regarding where they got the data from. That data is expensive. There’s no disclosed cost in terms of everything around it . Here’s the point: Even if the overall cost is 4-5 times as expensive – which includes the cost of data, overhead, etc. – OpenAI is charging for Operator at $200 per user per month, the pricing absolutely will change, because you’re not going to be able to charge that moving forward.
You can write an operator at a much cheaper price than what Open AI’s Operator is. To be fair, Open AI’s Operator is a little different because their Operator can see your screen, you can browse and do things like that. It’s a little more advanced. But you watch: Six months down the road, you’re going to have a DeepSeek Operator, I think, and they’re charging you $5 a month per user. Or a DeepSeek mathematician for $5 a month or $2 a month.
Look at Copilot. It was $30 per user per month. It kind of still is for the enterprise users, because it’s protecting your privacy and your data behind the firewall. But everyone else, it’s only an extra $3 a month now. If you have Office 365, the price goes up by $3 per month and you get the entire suite of Office 365 Copilot. You’re going to start seeing this shift downstream, where it becomes a lot more affordable.
The other question that might be coming up in terms of the pricing is, what about these GPUs? They didn’t do the frontier research. They simply copied, I think. Actually, they didn’t just simply copy – they used a lot of the techniques that the hyper-scalers used, that Llama 3 has, and then organized really efficiently the way they trained it. Whether they used H800s or GPU as a service, these H100s, who knows? It doesn’t matter. The fact is that they were able to efficiently optimize the training.
In the future, you may not need an H100. You might simply use an older GPU to do the same job, it just may take a little bit longer to do it. Or you may not need to spend $50,000 for a Blackwell chip – you just buy the NVIDIA DIGITS device, which is $3,000 and has a Grace Blackwell 10 inside it, and you stack these together. Jensen just announced this. Is there a need for everyone to have access to H100? In my opinion, no. Moving forward, assuming what they publish is right, accurate, and fully transparent, you can do a lot with a lot less using the techniques that DeepSeek published in their white paper.
How quickly might we see other models catch up to DeepSeek or surpass OpenAI’s o1 model?
Very quickly. It’s not because they already have it – they don’t. The fact is that DeepSeek, the actual foundation model itself, is different, because it’s been built on a mixture of experts. Not one large foundation model, but multiple. The foundation itself has been modified, and they modified Llama to do that. Any other company, especially the closed-source ones, can do that.
I have a feeling that before you know it, o3, o4, Gemini 2, 2.1, all of that is going to have these features built into it. Why would you go to a frontier model if it doesn’t even offer a mixture of experts? If you don’t offer precision optimization for computing, or multi-head latent attention, all these technical things – if you don’t have that in your closed models – you are cutting yourself out of a big chunk of business. I suspect they will have it in their updates very soon – if not tomorrow, next week, or next month.