Autessa Blog

Notes on building AI systems that work in production.

Long-form writing on AI infrastructure, agent evaluation, vision-based automation, capability-based agent architecture, internal tooling, drift detection, data security, and unified agent memory.

View all posts as markdown

Building

8 posts
Four fragmented systems versus one converged platform Left: Postgres, Pinecone, Kafka, and S3 connected by tangled integration arrows. Right: a single AutessaDB box containing the same primitives as internal layers. Four systems, four boundaries PostgreSQL Relational records Pinecone Vector embeddings Kafka Event streams S3 Object storage Integration layer: four auth models, four encryption configs, four audit trails One system, one boundary AutessaDB PostgreSQL-based converged data layer Relational Rows and joins Vector Similarity search Events Streams and triggers Objects Files and blobs One access model. One encryption layer. One audit trail.
Infrastructure

Why Do AI Projects End Up With So Many Infrastructure Systems?

How AI infrastructure sprawl happens, what it actually costs, and why convergence beats consolidation.

9 min read·Read post·.md
Four-dimension agent evaluation with v1 and v2 overlay A radar chart plotting groundedness, task completeness, argument faithfulness, and efficiency for agent v1 and agent v2. v2 is stronger on completeness but weaker on groundedness. Groundedness Task completeness Argument faithfulness Efficiency 75 50 25 Agent v1 Baseline Agent v2 New retrieval strategy v2 completeness: +12% v2 groundedness: −8% Ship decision requires weighing both axes.
Evaluation

How Should You Evaluate AI Agents in Production?

Why spot-checking fails, which four dimensions actually matter, and how continuous evaluation catches regressions before your users do.

10 min read·Read post·.md
CSS selectors versus visual understanding of a form Two copies of the same login form. The left highlights the CSS selectors an RPA script memorizes. The right highlights the visual labels and spatial cues a vision-based agent reads. Traditional RPA sees the DOM Sign in Email input#user_email Password input[name="pw"] Sign in .form-actions > button.primary Vendor ships a CSS refactor → selectors change → script breaks. Vision-based agent sees the screen Sign in Email label: "Email" Password label: "Password" Sign in visible text: "Sign in" Vendor ships a CSS refactor → labels unchanged → automation still works.
Automation

Why Do Automated UI Workflows Break When an Application Updates?

Traditional RPA sees the DOM. Humans see the screen. That gap is why your automations break every time a vendor ships a redesign.

8 min read·Read post·.md
Hardcoded decision tree versus capability-based agent Left: a deeply nested if-else decision tree with many branches. Right: a flat set of declarative capabilities that the agent dynamically selects from at runtime. Hardcoded decision tree customer query order question? refund request? other window open? loyalty tier? exists? call A call B call C call D call E call F Every new business rule → new branch → sprint ticket. 243 paths at 5 decision points, 729 at 6. Emergent behavior nobody designed. Capability-based agent Runtime reasoning check_order_status input: order_id refund.standard max: $500 refund.loyalty_exception requires: loyalty_tier escalate_human no constraints Each capability: description, inputs, constraints. Agent selects at runtime. New rule → add a capability, not a branch.
Agent Architecture

Why Do AI Agents Become Brittle Workflow Engines Instead of Intelligent Systems?

Most agent architectures devolve into decision trees with an LLM in the middle. Here's why, and what a capability-based alternative looks like.

8 min read·Read post·.md
Internal tools backlog: arrivals outpace delivery, shadow IT fills the gap A line chart showing request arrivals climbing faster than delivery throughput over four quarters, with the widening gap labeled as the growing backlog and a dotted line branching off labeled shadow IT. Demand outpaces supply Q1 Q2 Q3 Q4 tools 0 10 20 30 40 Requested Delivered Shadow IT unofficial tools backlog widens Arrivals new tool requests / quarter Delivery engineering throughput Shadow IT spreadsheets, Airtable, unofficial
Internal Tools

Why Is the Internal Tools Backlog Always Six Months Long?

Engineering cannot keep up with internal tool demand. People build shadow IT to cope. AI-generated tooling changes the math.

8 min read·Read post·.md
Groundedness score slowly declining over six months A line chart of an AI application's groundedness score declining gradually from near-perfect to the abandonment threshold over six months, with three horizontal bands: still useful, trust eroding, quietly abandoned. Drift, not failure Still trusted Trust eroding Quietly abandoned 100% 85% 70% 60% Groundedness launch month 1 month 2 month 3 month 4 month 5 month 6 No alert fires. No ticket filed. Users start double-checking. The tool feels "off." People stop using it. Still running. Still costing money. Continuous scoring catches the 3 percent monthly slope weeks before users complain.
Operations

Why Do Internal AI Applications Stop Working After Six Months?

AI applications do not crash. They drift. Here's how gradual degradation erodes trust, and what it takes to keep AI tools aligned with a changing business.

8 min read·Read post·.md
Four perimeters versus one perimeter Left: the same customer record replicated into four systems, each with its own padlock and its own access control. Right: one perimeter around a single data layer where field-level policies travel with the data. Four perimeters, inconsistent PostgreSQL customer.ssn masked ✓ Vector store embedding of ssn... masked ✗ Event stream conv.ssn (plaintext) retention: none Object store contract.pdf AES-256 ✓ Four access models. Four encryption configs. Gaps live in the seams. One perimeter, consistent AutessaDB Field-level policy travels with the data customer.ssn → masked everywhere customer.embedding → respects masking conversation stream → same retention One access model. One encryption layer. The security boundary is the data itself.
Security

How Do You Secure AI Application Data When It Is Spread Across Multiple Systems?

The real security risk in AI applications is not in any single system. It lives in the seams between four.

9 min read·Read post·.md
Dynamic memory and static memory unified under one query An agent in the center reaches into a unified memory layer that contains both dynamic memory (conversation history, observed events, accumulated state) and static memory (policies, contracts, manuals). One query, two kinds of memory unified memory layer · one search path · one access model Dynamic memory what the agent has learned, observed, accumulated • Recent conversation history • Customer account state • Processed events (team updates, invoices) • Patterns recognized over time changes minute-to-minute Static memory curated knowledge humans maintain • Policy documents, contracts • Product manuals and specs • Compliance guidelines • Folder hierarchy and cross-references changes rarely, versioned Agent The agent reasons across both in one pass, not four separate lookups.
Agent Memory

Why Do AI Agents Forget Everything Between Conversations, and Why Can They Not Search Your Files Properly?

Most agent limitations are not model limitations. They are memory limitations. Here's what unified dynamic and static memory unlocks.

11 min read·Read post·.md