The Internet is Freaking out about DeepSeek (English, Russian)

Tuesday, January 28, 2025

The Internet is Freaking out about DeepSeek (English, Russian)

Original post is below.

Интернет сходит с ума из-за DeepSeek. Акции NVIDIA упали на 15%. Давайте разберемся без лишнего шума — что такое DeepSeek и почему это важно?

DeepSeek R1 — это новая модель с открытым исходным кодом, выпущенная небольшой командой, работающей в китайском количественном хедж-фонде, якобы в качестве побочного проекта.

Модель демонстрирует впечатляющее соотношение цены и производительности, приближаясь к модели o1 от OpenAI в тестах, но при этом стоит примерно в 30 раз дешевле.

Её запуск настолько дешев, что это можно сделать на обычном компьютере с игровой видеокартой. Для этого не нужен дата-центр.

There is more below.
Ниже есть продолжение.

DeepSeek утверждает, что обучение модели обошлось всего в 5 млн. долларов (по сравнению с 500 млн. долларов для o1), хотя это заявление вызывает большие сомнения.

--
Почему это так важно:

1/ R1 примерно эквивалентна o1, которая отстает на одно поколение от передовых моделей (o3). Поскольку она полностью открыта, любой может использовать её в своих проектах. Это означает, что стоимость моделей уровня "SOTA минус 1" фактически снижается до нуля.

2/ Это отличная новость для PointOne и других, кто работает на уровне приложений. Это заставит всех поставщиков моделей выпускать новые модели быстрее и дешевле.

3/ Скорость дистилляции моделей ошеломляет. В течение месяца после выхода новой SOTA-модели её можно сжать до небольшой модели, которая будет работать за 1% от стоимости.

4/ Методы обучения с подкреплением, используемые в R1/o1/o3 (о которых я писал ранее), теперь также открыты, что позволяет всему сообществу ИИ быстрее создавать более совершенные модели рассуждений.

5/ Коммодитизация передового интеллекта переносит ценность с уровня моделей на другие возможности, такие как агентное поведение.

--
Без лишнего шума:

- Будет ли китайское правительство красть мои данные? Только если вы используете официальное приложение DeepSeek. Но сами веса модели полностью открыты, так что вы можете запускать их где угодно и сохранять свои данные в тайне.

- Уничтожает ли это ценность NVIDIA? На мой взгляд, совсем нет. Долгосрочная ценность NVIDIA будет заключаться в основном в инференсе, а не в обучении. Представьте, сколько вычислительных мощностей понадобится для запуска 10 млрд (или 100 млрд, или 1 трлн) ИИ-агентов, которые будут выполнять всю мировую интеллектуальную работу через 20 лет. Нам всё ещё понадобятся GPU для этого. Даже если модели станут намного эффективнее, не стоит недооценивать скрытый спрос на интеллект.

- Угрожает ли это всем компаниям с закрытыми моделями? Это определённо подстегнёт их двигаться быстрее. Я всё ещё считаю, что прорывы на переднем крае будут совершаться закрытыми лабораториями, и там будет что захватывать. Но эти достижения будут проникать в открытый исходный код гораздо быстрее, чем мы все ожидали.

--
В целом: это отличное развитие для человечества.

The internet is freaking out about DeepSeek. NVIDIA stock is down 15%. Let's cut through the hype and noise - what is DeepSeek and why does it matter?

DeepSeek R1 is a new open-source model released by a small team working at a Chinese quant hedge fund, purportedly as a side project.

The model has achieved spectacular price/performance ratios, coming close to OpenAI's o1 model in benchmarks but at roughly 1/30 of the price.

It's actually so inexpensive to run that it can be done on a single computer with a gaming GPU. Doesn't require a data center.

DeepSeek claims it cost only $5M to train (vs. $500M for o1), although that claim seems highly suspect.

--
Why this is so important:

1/ R1 is roughly equivalent to o1, which is one generation behind the state of the art (o3). Because it's fully open-source, anyone can use it in their projects. This means the value of SOTA minus 1 models is effectively brought down to zero.

2/ This is fantastic news for PointOne and others building at the application layer. It will force all model providers to release the latest models faster and cheaper.

3/ The speed of model distillation is staggering. Within a month of a new SOTA model coming out, it can be distilled down to a small model that can run at 1% of the cost.

4/ The reinforcement learning training methods used in R1/o1/o3 (which I've written about previously) are also now open-sourced, which allows the entire AI community to build better reasoning models faster.

5/ The commoditization of SOTA intelligence shifts value at the model layer to other kinds of capabilities like agentic behavior.

--
Cutting through the hype:

- Will the Chinese government steal my data? Only if you use the official DeepSeek app. But the model weights themselves are fully open-source so you can run them anywhere you want and keep your data private

- Does this destroy the value of NVIDIA? In my opinion, not at all. The long-term value of NVIDIA will mostly be in inference, not training. Imagine how much compute is needed to run the 10B (or 100B or 1T) AI agents who will be doing all the world's knowledge work in 20 years. We're still going to need GPUs to run that. Even if it's a lot more efficient than today's models, we should not underestimate the latent demand for intelligence.

- Does this threaten all the closed-source model companies? It certainly lights a fire under them to move faster. I still think that advances at the cutting edge will be made by closed-source labs, and there will be value to be captured there. But those advances will percolate into the open-source at a much faster rate than we all expected

--
Overall: this is a great development for humanity.