Agents user guide

Last updated Nov 26, 2025

Best practices for publishing and deploying agents

To ensure that your agent is robust, production-ready, and aligns with enterprise standards, follow these best practices before publishing and deploying to Orchestrator. These steps cover validation, governance, and readiness across the full agent lifecycle.

Quickstart essential gates

Before publishing, confirm that these foundational checks are complete:

Table 1. Agent publishing validation gates

Essential gate	What to check	Where to do this
Prompts and examples finalized	System/User prompt includes role, constraints, 3–5 input-mapped examples	Agent Builder → System and User Prompt
Tools described and bound	All tools have name, description, input/output schema	Agent Builder → Tools
Guardrail logging enabled (optional)	Tool calls are logged for audit/debug (enable in guardrail configuration)	Tools → Guardrail builder
Context sources connected	At least one relevant knowledge base is grounded	Context Grounding → Sources
≥30 interactive tests conducted	Manual tests cover typical, edge, and malformed inputs	Agent Builder → Test Run
Evaluation set(s) created	≥30 curated test cases, covering real-world usage	Agent Builder → Evaluations tab
Evaluation performance validated	Evaluation set(s) score ≥70% with no regressions	Agent Builder → Evaluations tab

Tip:

You can use these gates as a pre-deployment checklist, and incorporate them into your own release process.

Finalize your agent definition

Before publishing, ensure your agent is fully scoped, prompt-aligned, and context-aware.

Define scope and boundaries: List in-scope and out-of-scope intents in the system prompt. Ensure tools and escalation paths match these boundaries to avoid scope creep.
Refine prompts and arguments: Write structured system and user prompts. Use realistic examples mapped to input arguments. Validate inputs to guard against malformed or adversarial data.
Apply the least-context principle: Only pass essential context to the LLM. Use Context Grounding to avoid bloated payloads.
Complete tool descriptions and guardrails: For each tool, define name, purpose, schema, and side effects. Add logging, filters, retries, and escalation behavior.
Normalize tool output: Ensure all tools return consistently structured responses to prevent runtime issues.
Connect relevant context sources: Add necessary indexes and tune thresholds for relevance and freshness.

Validate through testing and evaluation

You should validate performance, resilience, and reasoning quality.

Run interactive tests: Test at least 30 varied scenarios, including edge cases, malformed inputs, and multilingual examples.
Evaluate with curated test sets: Create ≥30 test cases with assertive evaluators. Use methods like LLM-as-a-judge, exact match, or trajectory scoring.
Ensure performance stability: Track scores across prompt or tool changes. Target a consistent evaluation score ≥70% before deploying.

Prepare for production deployment

Validate downstream integrations and infrastructure readiness.

Run smoke tests from workflows: Trigger the agent from Studio or Maestro to verify end-to-end data flow and success handling.
Verify platform readiness: Confirm credentials, folders, RBAC, and tenant setup in Orchestrator.
Inspect traces and logs: Review execution traces for long prompts, inefficient tool usage, or over-retrieved context.
Enable human-in-the-loop escalation: Configure escalation apps and verify outcome handling. Pass relevant transcripts and memory updates.

Implement governance and release controls

Treat your agents like enterprise software: versioned, reviewed, and owned.

Maintain versioning and change logs: Use semantic versioning and track changes for behavior audits and rollback.
Capture approval workflows: Get sign-off from security, ops, and product teams before production deployment.
Draft operational documentation: Create a runbook and quickstart guide. Include inputs/outputs, credential rotation, and recovery steps.
Train support teams: Walk through agent logic, escalation handling, and fallback procedures.

Monitor and iterate after deployment

Agent quality doesn’t stop at launch. Bake in continuous improvement.

Plan a gradual rollout: Use canary deployments or traffic splitting to validate behavior at low volume.
Schedule continuous evaluation: Re-run evaluation sets periodically. Monitor traces for drift and degraded performance.
Review regularly: Revisit prompts, tools, and context quarterly to reflect changes in business rules or data sources.

On this page