A Peek At DeepSeek

Recently the market was deeply reactive to DeepSeek wiping out $1 trillion in market cap and plunging NVIDIA shares among other companies. As a relatively new (2023) Chinese company with less than two hundred people, the impact is both political (amid competitive tensions between the United States and China) and a challenge to the approach large tech companies have been taking with AI. If we set aside geo-political differences for a moment and consider approaches to AI as a technology, we should not be surprised at this outcome. Optimization of AI models is inevitable. It just took a start-up mentality to get there. Fueling this trendas with many technologies that have come beforethere are three key pillars to consider: constraints, innovation, and open source.

Constraints

The progress of AI so far has largely been gauged by Large Language Models (LLMs) such as ChatGPT, Gemini, and Llama. These models were developed with deep investments from some of the largest tech companies (OpenAI backed by Microsoft, Google, and Facebook, respectively) with some of the most expensive and demanding infrastructure. Generational progress has been underpinned through innovative methods, but largely attained with increasingly more chips processing ever more data. Access to vast amounts of data is a key reason why large tech companies have been at the forefront. Feeding this trend are some of the most powerful GPUs ever built. As models have grown, so has the memory and number of chips needed to run them. The conundrum comes when DeepSeek despite limited resources (200 people and $6M in training) and without access to any of the latest and greatest GPUs, released their latest model, DeepSeek-V3, which rivals best-in-class LLMs including GPT-4o. Resource constraints have forced DeepSeek to optimize and this path has paid off. DeepSeek’s cost per token is $0.55 per 1M tokens whereas OpenAI cost per token is $15 per 1M tokens, a -96% reduction. Considering the performance is comparable, it makes for a compelling option.

Innovation

As any startup can affirm, constraints can lead to innovation. A company doesn’t need to be first nor the largest; constraints can focus efforts on providing a minimal product with direct customer value. DeepSeek attained parity with other LLMs and then differentiated through efficiency. It is hard to know where DeepSeek got its data or exactly how its models are trained (there is some speculation); it is possible to distill a model from the output of an existing model. If this is the case, it’s moreover copying than innovation. Nonetheless, new on-comers benefit from existing solutions being able to start with state-of-the-art and take it further. DeepSeek’s design uses a Mixture-of-Experts (MoE) architecture which combines multiple, smaller models. MoE not only allows better specialization, but lowers the cost to serve a response as it only needs to exercise the specific models that are actually needed. Overall, this provides saving on processing cost without compromising the response. This isn’t a new technique, but many of the competing LLMs are monolithic in nature and therefore require more memory to serve. Once you have an expert model in a given area, you now have that expertise moving forward. Whatsmore, it can be readily be recombined with other models to handle increasing complexity. At the end of the day, MoE is a key innovation for DeepSeek that not only drives down costs, but opens the door to combine many different models in the future.

Open Source

Closed-source can maintain a competitive edge and protect intellectual property (at least for a while), but there is an alternative. Open source projects inherently allow collaboration and recombination of elements. Developing on top of existing solutions is as simple as pulling a repo and forking the code. Open software draws development from a more diverse set of contributors across the industry. This not only accelerates development itself, but makes for a more robust and secure solution. Although there are several open source LLMs in the market such as Llama and Mistral, the difference with DeepSeek is that it is comparable to the best closed-source models. For AI, open source data, models, and weights are all accelerators and once introduced, it becomes universally available. Being open source, DeepSeek can now draw upon all of these positive factors to further accelerate artificial intelligence and it is only the beginning.

Leave a Comment