Microsoft is now hosting DeepSeek R1, even though it suspects it of illegal data abuse

Gaylene Grumbles January 30, 2025

1 3 minutes read

Microsoft is now hosting DeepSeek R1, even though it suspects it of illegal data abuse

Serving tech enthusiasts for over 25 years.

TechSpot means tech analysis and advice you can trust.

A hot potato: Microsoft is raising eyebrows after announcing that it will host DeepSeek R1 on its Azure cloud service. The decision comes just days after OpenAI accused DeepSeek of violating its terms of service by allegedly using ChatGPT outputs to train its system, allegations Microsoft is currently investigating.

DeepSeek R1 began making waves in the AI world when it launched last week. Chinese developer DeepSeek touted it as a freely available simulated reasoning model that rivals OpenAI’s o1 in performance but at a fraction of the training cost. While OpenAI has priced its o1 model at $60 per million output tokens, DeepSeek lists R1 at just $2.19 per million – a remarkable contrast that sunk stock for AI-adjacent companies like Nvidia.

Microsoft’s decision to host R1 on Azure is not too unusual on its surface. The tech giant already offers over 1,800 AI models through its Azure AI Foundry, giving developers access to a variety of AI systems for experimentation and integration. Microsoft doesn’t discriminate since it profits from any AI platform operating on its cloud infrastructure. However, the decision seems ironic since OpenAI has spent the last week aggressively criticizing the model for distilling ChatGPT outputs.

🚀 DeepSeek-R1 is here!

⚡ Performance on par with OpenAI-o1
📖 Fully open-source model & technical report
🏆 MIT licensed: Distill & commercialize freely!

🌐 Website & API are live now! Try DeepThink at https://t.co/v1TFy7LHNy today!

🐋 1/n pic.twitter.com/7BlpWAPu6y

– DeepSeek (@deepseek_ai) January 20, 2025

OpenAI claims the AI startup violated its terms of service by using “distillation,” as reported by Fox News. Distillation is when developers train an AI model using outputs from a more advanced system. Suspicions arose after users discovered that an earlier model, DeepSeek V3, sometimes referred to itself as “ChatGPT,” suggesting that DeepSeek used OpenAI-generated data to fine-tune its system.

The move also seems somewhat hypocritical, considering Microsoft security researchers reportedly launched an ethics probe into DeepSeek, on Wednesday. Anonymous sources claim that the investigation focuses on whether DeepSeek extracted substantial amounts of data through OpenAI’s API during the fall of 2024.

Despite the frustrations with DeepSeek, OpenAI CEO Sam Altman has publicly welcomed the competition. In a tweet on Monday, Altman acknowledged R1’s cost efficiency, calling it “an impressive model” but vowing that OpenAI would soon deliver “much better results.” Analysts expect the company may release a new model, o3-mini, as early as today.

deepseek’s r1 is an impressive model, particularly around what they’re able to deliver for the price.

we will obviously deliver much better models and also it’s legit invigorating to have a new competitor! we will pull up some releases.

– Sam Altman (@sama) January 28, 2025

OpenAI’s outcry over DeepSeek’s data practices is notable given its own history of alleged data abuse. The New York Times has filed a lawsuit against OpenAI and Microsoft, accusing them of using copyrighted journalism without permission. OpenAI has also struck deals with publishers and online communities – such as The associated Press and others – to access user-generated data for training.

The whole situation exposes the AI industry’s hypocritical relationship with data ownership. Investment firm Andreessen Horowitz, another Open AI investor, argued in a 2023 legal filing that training AI models should not be considered copyright infringement, as they merely “extract information” from existing works. If OpenAI truly believes in that principle, then DeepSeek is just playing by the same rules.

The current landscape of the AI industry is more or less a free-for-all. We have no laws on the books to govern AI directly, and those laws that affect it indirectly, like copyright and trade laws, are twisted into a favorable interpretation by the AI firms that are breaking them.

Gaylene Grumbles January 30, 2025

1 3 minutes read