Prompt engineering has been a hot topic in the AI industry over the past year, but Anthropic now appears to be developing a tool to automate it, at least in part.
Anthropic released several new features Tuesday to help developers build more useful applications using its startup’s language model, Claude. According to a company blog post, developers can now use Claude 3.5 Sonnet to create, test, and evaluate prompts, and use its prompt engineering techniques to create better inputs and improve Claude’s answers to specialized tasks.
Language models are very generous when it comes to asking you to do certain tasks, but sometimes even a small change in the wording of the prompt can make a big difference in the results. Normally, you would have to figure out the wording yourself or hire a prompt engineer, but this new feature provides quick feedback to make it easier to find improvements.
This feature is housed in the Anthropic Console, under the new Evaluate tab. The Console is the startup’s test kitchen for developers, built to attract companies looking to build products with Claude. One of these features, unveiled in May, is Anthropic’s built-in prompt generator, which takes a brief description of a task and uses Anthropic’s unique prompt engineering to construct much longer, more specific prompts. Anthropic’s tool won’t completely replace prompt engineers, but the company says it will help new users and save time for experienced prompt engineers.
Within Evaluate, developers can test how effective their AI application’s prompts are in different scenarios. Developers can upload real-world cases to the test suite or ask Claude to generate a variety of AI-generated test cases. Developers can then compare the effectiveness of different prompts side by side and rate sample answers out of 5.

In one example from Anthropic’s blog post, a developer found that the application was providing too short answers for several test cases. The developer was able to adjust one line in the prompt to make the answer longer and apply it to all test cases simultaneously. This can save developers a lot of time and effort, especially those with little or no experience in prompt engineering.
Dario Amodei, CEO and co-founder of Anthropic, said in an interview with Google Cloud Next earlier this year that rapid engineering is one of the most important things for widespread enterprise adoption of generative AI. “It sounds simple, but with a rapid engineer and 30 minutes, you can make an application that didn’t work before work,” Amodei said.