Can LLMs plan like humans? New prompting technique redefines possibilities

Despite their sophistication, LLMs often falter in tasks requiring autonomous planning. For example, benchmarks like Blocksworld and logistics scenarios demand multi-step reasoning, accurate state tracking, and dynamic adaptability. Traditional models, including GPT-4, frequently fail to match human performance in these areas, achieving just 30% accuracy on planning benchmarks compared to a human baseline of 78%.

CO-EDP, VisionRI | Updated: 29-01-2025 17:09 IST | Created: 29-01-2025 17:09 IST

Can LLMs plan like humans? New prompting technique redefines possibilities — Representative Image. Credit: ChatGPT

Planning - the ability to devise, adapt, and execute long-term strategies - is a hallmark of human intelligence. While Large Language Models (LLMs) such as GPT-4 have excelled in natural language processing, reasoning, and problem-solving, their capability to autonomously generate detailed, long-horizon plans has remained limited.

Recognizing this critical gap, researchers Bilgehan Sel, Ruoxi Jia, and Ming Jin from Virginia Tech have introduced a groundbreaking solution in their study, "LLMs Can Plan Only If We Tell Them," which will be presented at ICLR 2025. Their approach, called AoT+ (Algorithm-of-Thoughts Plus), offers a transformative way to overcome the inherent challenges faced by LLMs in planning tasks.

The challenge of autonomous planning in LLMs

Despite their sophistication, LLMs often falter in tasks requiring autonomous planning. For example, benchmarks like Blocksworld and logistics scenarios demand multi-step reasoning, accurate state tracking, and dynamic adaptability. Traditional models, including GPT-4, frequently fail to match human performance in these areas, achieving just 30% accuracy on planning benchmarks compared to a human baseline of 78%.

This shortcoming stems from three primary challenges. Chain-of-Thought (CoT) prompting, which facilitates step-by-step reasoning, is effective for simpler tasks but struggles in problems that demand iterative backtracking or exploration of alternative paths. Additionally, reliance on external tools for state tracking and verification—such as frameworks like Tree-of-Thoughts (ToT) and LLM-Modulo—adds significant computational complexity and cost, limiting their practical application. Finally, LLMs often hallucinate states, misrepresenting intermediate steps due to cognitive overload during the planning process. These challenges highlight the urgent need for a more efficient, self-contained approach to enable LLMs to plan effectively.

Introducing AoT+: A game-changing approach

AoT+ builds upon the earlier Algorithm-of-Thoughts framework, empowering LLMs to autonomously generate, refine, and execute plans without external intervention. By introducing innovations in state management and trajectory augmentation, AoT+ delivers remarkable improvements in planning accuracy and efficiency.

AoT+ incorporates periodic structured state generation, a mechanism that restates the current problem state during the reasoning process. This approach minimizes cognitive load by allowing the model to focus on relevant information without being overwhelmed by the entire context history. Consequently, it ensures that the model maintains an accurate and consistent representation of the problem state, significantly reducing errors and hallucinations. Additionally, AoT+ employs random trajectory augmentation, which replaces carefully curated human-authored examples with random search trajectories interspersed with correct solution steps. This method enhances the generalizability of the model while simplifying the prompting process, enabling it to explore solutions more effectively without heavy reliance on human intuition.

Another critical enhancement in AoT+ is its use of memoization for state management. Drawing inspiration from dynamic programming, AoT+ periodically caches intermediate problem states, allowing the model to retrieve relevant information efficiently without the need to reprocess the entire context. This innovation reduces computational overhead and ensures clarity in planning, addressing a major limitation in prior approaches.

Performance and breakthroughs

The introduction of AoT+ has led to significant advancements in multiple benchmarks. In the Blocksworld domain, AoT+ achieved an accuracy rate of 82%, surpassing human-level performance and outperforming previous methods like CoT and LLM-Modulo. The use of periodic state updates and efficient trajectory design allowed the model to excel in this classic planning task. Similarly, in logistics scenarios involving multi-step transportation, AoT+ demonstrated dramatic improvements, achieving 80% accuracy compared to the 14% accuracy of CoT. Its autonomous management of complex dependencies between actions eliminated the need for computationally expensive external verification mechanisms.

Beyond planning, AoT+ proved its versatility in reasoning benchmarks such as List Functions and Abstract Causal Reasoning (ACRE). By leveraging memoization, it enabled precise tracking and manipulation of logical states, showcasing its potential in diverse applications. Moreover, AoT+ not only improved accuracy but also significantly reduced computational costs. By avoiding iterative prompting and external state verification processes, it achieved a sixfold decrease in computational expense compared to methods like ToT and LLM-Modulo, making it a practical and scalable solution for real-world applications.

Implications for AI development

The success of AoT+ marks a pivotal step in bridging the gap between human and machine planning capabilities. Its advanced prompting techniques demonstrate that LLMs can achieve state-of-the-art performance in planning tasks without relying on external tools or extensive computational resources. This breakthrough opens up possibilities in various domains. In autonomous robotics, AoT+ can be applied to robots performing long-term operations, such as warehouse management or disaster response, where efficient and adaptive planning is essential. Similarly, industries like logistics, healthcare, and manufacturing could benefit from LLMs capable of managing complex scheduling and resource allocation challenges. Additionally, by unlocking latent reasoning abilities, AoT+ paves the way for AI systems to tackle creative problem-solving in fields like science, engineering, and the arts.

Future directions

While AoT+ represents a significant advancement, the study highlights opportunities for further improvement. Enhancing its real-time adaptability, expanding its application to larger-scale models, and integrating it with dynamic learning environments are promising areas for exploration. Moreover, investigating its performance in creative and inductive reasoning tasks could unlock new potentials for AI in abstract domains. These directions underscore the untapped potential of AoT+ in redefining the boundaries of AI reasoning and planning.

FIRST PUBLISHED IN:
Devdiscourse

Can LLMs plan like humans? New prompting technique redefines possibilities

The challenge of autonomous planning in LLMs

Introducing AoT+: A game-changing approach

Performance and breakthroughs

Implications for AI development

Future directions

TRENDING

Trump Administration Offers Buyouts to Shrink Federal Workforce

Congo Conflict: UN Calls for Action Amid Rising Tensions

Judicial Intervention Temporarily Halts Trump's Funding Freeze

Rubio Grants 90-Day Humanitarian Aid Waiver Amidst Aid Review

OPINION / BLOG / INTERVIEW

AI drives breakthroughs in early detection of cervical cancer

AI’s legal limit: Why machines can’t deliver justice

AI won’t take over, but it will take us somewhere unexpected

Do language barriers undermine AI’s role in global health communication?

DevShots

Latest News

India Launches First Pickleball Equipment Testing Lab

Murmu Sets Stage for Budget Session

Global Devotees Flock to Prayagraj for Maha Kumbh Mela 2025

Lost and Found: The Journey of a Missing Boy in Delhi

Connect us on

SECTORS

EDITIONS

OTHER LINKS

OTHER PRODUCTS

CONNECT