DeepSeek-R1

DeepSeek-AI, et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948, arXiv, 22 Jan. 2025. arXiv.org, https://doi.org/10.48550/arXiv.2501.12948. Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Large language models (LLMs) have made remarkable strides in mimicking human-like cognition, but their ability to reason through complex problems—from math proofs to coding challenges—remains a frontier. In a recent breakthrough, DeepSeek-AI introduces DeepSeek-R1, a family of reasoning-focused models that leverages reinforcement learning (RL) to unlock advanced reasoning capabilities, without relying on traditional supervised fine-tuning (SFT) as a crutch....

<span title='2025-01-25 00:00:00 +0000 UTC'>2025-01-25</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;Cong Chan

CoT on BBH - Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

CoT on BBH:M. Suzgun et al., ‘Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them’. arXiv, Oct. 17, 2022. Available: http://arxiv.org/abs/2210.09261 Method Applying chain-of-thought (CoT) prompting to BIG-Bench Hard tasks Evaluate few-shot performance via standard “answer-only” prompting and chain-of-thought prompting on BIG-Bench Hard Benchmark Results/Analysis/Findings Benchmark: BIG-Bench Hard (BBH). These are the task for which prior language model evaluations did not outperform the average human-rater. many tasks in BBH require multi-step reasoning...

<span title='2022-11-13 00:00:00 +0000 UTC'>2022-11-13</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;Cong Chan

Efficient Training of Language Models to Fill in the Middle

Bavarian, Mohammad, et al. Efficient Training of Language Models to Fill in the Middle. arXiv:2207.14255, arXiv, 28 July 2022. arXiv.org, http://arxiv.org/abs/2207.14255. data: https://www.github.com/openai/human-eval-infilling TL:DR Autoregressive language models can effectively learn to infill text by moving a span of text from the middle of a document to its end, without harming the original generative capability. The training models with this technique, called fill-in-the-middle (FIM), is useful, simple, and efficient, and should be used by default in future autoregressive language models....

<span title='2022-11-11 00:00:00 +0000 UTC'>2022-11-11</span>&nbsp;·&nbsp;8 min&nbsp;·&nbsp;Cong Chan