A new Chinese language model 'R1' from the company DeepSeek is currently being rolled out worldwide. The model is similar to large language models like ChatGPT from OpenAI and Gemini from Google.
At DTU, many researchers work with artificial intelligence (AI), and DTU also teaches students how to use AI. At the department DTU Compute, artificial intelligence is a very large research area, and the launch of R1 has been received with enthusiasm, says Professor Ole Winther.
But why?
Around R1: Three quick questions for Ole Winther
What is R1?
R1 from the Chinese company DeepSeek is a language model similar to ChatGPT and Gemini developed by the American tech giants OpenAI and Google. Unlike the American language models, R1 is open source, as DeepSeek has made the mathematical formula/code behind the model openly available.
If we visit https://arxiv.org/abs/2501.12948, we can see the training recipe that DeepSeek uses, and trained models can be downloaded here: https://github.com/deepseek-ai/DeepSeek-R1.
When we use the language model, it also thinks aloud, so we can follow the process and see how the language model reasons its way to something.
In general, all language models are trained in two stages. In stage 1, the models are trained on all data from the internet. Then, a model is trained based on a context and asked to predict the next word. This training has been done in advance.
In stage 2, the large monopoly language models partially draw a curtain, so we only get an answer that is a summary of what it has thought. Here, DeepSeek’s R1 differs because they show how they do the fine-tuning in stage 2. They get the model to think aloud.
It is similar to when we think aloud and say, "oh no, that was wrong – maybe I should try this way..." The process of correcting oneself to finally arrive at an answer or solution can be followed with DeepSeek. DeepSeek calls this behavior an "aha moment."
Our work as researchers has often been to try to reproduce something that others have done in the language model. This can be quite frustrating. But DeepSeek has set this free.
How does open source benefit research and businesses?
Because the code is open source, everyone can get the 'brain' behind the language model. We can then fine-tune it ourselves to do new things.
It is a Chinese model, so we cannot ask about everything. For example, it will not tell about the demonstrations that ended violently in Tiananmen Square in 1989. But we can change the code ourselves so that it can answer what R1 is set not to answer.
DeepSeek’s language model is good at Danish. When we ask it something, it translates the question into English, finds the answer, and quickly writes back in Danish.
Besides my research career, I am involved in a Danish startup that develops search and chat that speaks Danish. In that context, we could use DeepSeek’s language model and build applications in Danish using it. The same can be done by other companies. Similarly, the government’s Digitalisation Strategy has stated that Denmark should have its own Danish language model. Here, we could also, in principle, take DeepSeek’s open-source tool and build on top of it.
There has been much talk about whether to send data to China, and since DeepSeek is a Chinese company, data ends up in China if we use their app. However, if we take R1 and set it up with our own dataset, there are barriers in between, and we do not deliver data further.
Why is the launch of R1 called an eye-opener?
It is due to several reasons.
For us researchers, R1 feels like a kind of liberation moment because we have constantly heard monopoly companies say that only they could develop such complicated language models. Now it is quite clear that what they have developed is not as extraordinarily difficult as the sales pitches say. It shows that it is certainly something that Europe can also participate in.
Moreover, DeepSeek has only used 2,000 GPUs to train their base model DeepSeek V3, whereas the American counterpart GPT-4 required nearly 10 times as many. This has challenged the belief that there is a need for a significant expansion of AI data centres. Not surprisingly, Nvidia's stock value has therefore dropped immediately after R1 was released.
At the same time, we must acknowledge that language models are soon becoming so advanced that they can be used as robots unleashed on social media. This can be dangerous because it increases the risk of influence and interference in societal matters such as elections.
For example, it will be possible to create a 'bot farm' with 100,000 new fake profiles and instruct the language model to behave in a certain way and express specific opinions while appearing reasonable and well-argued. If we humans cannot see that it is a bot writing, we risk being manipulated by a mass of bots created for the occasion. Therefore, language models should also lead to social media developing methods to detect and exclude fake profiles. And this is something we must also demand as a society.