TRiSM - Safety and Security for Agents

Published: 1/20/2025

TRiSM - Safety and Security for Agents

Cognitive Architecture Overview

Our cognitive framework stands apart by enabling truly Agentic AI—systems that think, learn, and adapt with minimal human supervision. Over the next few blogs, we’ll break down the major pillars behind our approach. In this article, we introduce the first key pillar: TRiSM—the bedrock that keeps our agents self-aware, secure, and free from disruptive issues like prompt injection or hallucination.

Picture

TRiSM: Self-Understanding & Self-Governance

At its core, TRiSM is a system for self-understanding. It grants AI agents an internal model of what they’re allowed to do, how to react to malicious or misleading inputs, and how to avoid generating false or harmful outputs.

Unlike a typical “moderation layer,” TRiSM isn’t just a filter. Instead, it effectively empowers the agent to think critically about the information it’s processing.

Real-World Example: EduTutor at RIT: A clear illustration of TRiSM in action is our EduTutor project, developed in collaboration with the Rochester Institute of Technology (RIT). EduTutor harnessed an earlier version of TRiSM to handle sensitive educational content and user interactions:

Picture

In this screenshot, EduTutor detects a modern prompt injection attack aimed at the Gemini model. The tutor immediately recognizes and deflects the malicious request, protecting the user experience.

The tutor’s effectiveness stems from TRiSM’s self-governing checks. If a user tries to slip in harmful or extraneous instructions, the agent doesn’t just blindly comply—it identifies the risk internally and then opts to reject or pivot away from the threat.

Comparing Research Agents: Ours vs. Perplexity

TRiSM’s importance becomes even clearer when we pit our Research Agent against other AI solutions that sometimes hallucinate or produce nonsense.

For instance, Perplexity is a well-known answer bot, yet in certain contexts, it struggles with factual accuracy or becomes prone to confusion.

Query: Tell me about Brayden Levangie’s recent X posts

Perplexity’s Response

Picture

Perplexity Results Perplexity not only hallucinated the post from the “unspecified date,” but the link also only navigates to my X profile, instead of the mentioned post.

Our Research Agent’s Response

Picture

Research Agent Results In this case, the Research Agent did not hallucinate—it was able to provide me up-to-date tweet information, including the direct links to each post it summarized.

What Makes the Difference? The TRiSM layer enforces a constant “sanity check.” It cross-references known data sources, mindful that generating an incorrect or misleading statement would violate its internal rules. As a result, your AI becomes more reliable—an essential quality for use cases ranging from academic research to regulated enterprise workflows.

Why TRiSM Matters

Agentic AI, by definition, will operate with a degree of autonomy that some organizations may find unsettling. The idea of giving AI a high level of freedom means you must have confidence in its behavior. That’s where TRiSM’s safety net becomes vital.

It solves issues like:

  • Prompt Injection: Preventing users or malicious actors from hijacking the agent’s logic.
  • Hallucinations: Curbing the agent’s tendency to fabricate or mix in unverified data.
  • Unethical Requests: Blocking or sanitizing instructions that lead to potentially harmful outcomes or compliance breaches.

TRiSM ensures that while your AI agent is fully capable of self-management, it still strictly follows your organizational policies and ethical boundaries.

Looking Ahead

TRiSM is only one piece of our broader cognitive architecture, which also includes advanced agent memory, multi-agent orchestration, and more.

We’ll delve into those in future posts. For now, consider TRiSM your agent’s built-in compass—steering it away from pitfalls and ensuring it remains both powerful and trusted as it operates in real-world applications.

Stay tuned for our next update, where we outline Universal MCP—a universal integration layer that frees your agents from rigid connectors or custom code.