One of the most significant AI advances since the last Radar is the breakthrough and proliferation of reasoning models. Also marketed as "thinking models," these models have achieved top human-level performance in benchmarks like frontier mathematics and coding.
Reasoning models are usually trained through reinforcement learning or supervised fine-tuning, enhancing capabilities such as step-by-step thinking (CoT), exploring alternatives (ToT) and self-correction. Examples include OpenAI’s o1/o3, DeepSeek R1 and Gemini 2.0 Flash Thinking. However, these models should be seen as a distinct category of LLMs rather than simply more advanced versions.
This increased capability comes at a cost. Reasoning models require longer response time and higher token consumption, leading us to jokingly call them "Slower AI" (as if current AI wasn’t slow enough). Not all tasks justify this trade-off. For simpler tasks like text summarization, content generation or fast-response chatbots, general-purpose LLMs remain the better choice. We advise using reasoning models in STEM fields, complex problem-solving and decision-making — for example, when using LLMs as judges or improving explainability through explicit CoT outputs. At the time of writing, Claude 3.7 Sonnet, a hybrid reasoning model, had just been released, hinting at a possible fusion between traditional LLMs and reasoning models.
 
  
                        
                    
                    
                 
    
    
  