DeepSeek-R1

DeepSeek-AI, et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948, arXiv, 22 Jan. 2025. arXiv.org, https://doi.org/10.48550/arXiv.2501.12948. Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Large language models (LLMs) have made remarkable strides in mimicking human-like cognition, but their ability to reason through complex problems—from math proofs to coding challenges—remains a frontier. In a recent breakthrough, DeepSeek-AI introduces DeepSeek-R1, a family of reasoning-focused models that leverages reinforcement learning (RL) to unlock advanced reasoning capabilities, without relying on traditional supervised fine-tuning (SFT) as a crutch. The paper “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” unveils a paradigm shift in how we train LLMs to think critically, with implications for both research and real-world applications. ...

2025-01-25 · 4 min · Cong Chan

CoT on BBH - Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

CoT on BBH:M. Suzgun et al., ‘Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them’. arXiv, Oct. 17, 2022. Available: http://arxiv.org/abs/2210.09261 Method Applying chain-of-thought (CoT) prompting to BIG-Bench Hard tasks Evaluate few-shot performance via standard “answer-only” prompting and chain-of-thought prompting on BIG-Bench Hard Benchmark Results/Analysis/Findings Benchmark: BIG-Bench Hard (BBH). These are the task for which prior language model evaluations did not outperform the average human-rater. many tasks in BBH require multi-step reasoning ...

2022-11-13 · 4 min · Cong Chan

Efficient Training of Language Models to Fill in the Middle

Bavarian, Mohammad, et al. Efficient Training of Language Models to Fill in the Middle. arXiv:2207.14255, arXiv, 28 July 2022. arXiv.org, http://arxiv.org/abs/2207.14255. data: https://www.github.com/openai/human-eval-infilling TL:DR Autoregressive language models can effectively learn to infill text by moving a span of text from the middle of a document to its end, without harming the original generative capability. The training models with this technique, called fill-in-the-middle (FIM), is useful, simple, and efficient, and should be used by default in future autoregressive language models. The study provides best practices and strong default settings for training FIM models and releases infilling benchmarks to aid future research. ...

2022-11-11 · 8 min · Cong Chan