QuotaCheap Playbook

MCP Observability: cần log gì khi agent gọi tools

Hướng dẫn log MCP tool calls cho AI agent production: timeline, input/output redaction, latency, retries, user/session/workflow id, cost correlation và fai…

Không có observability, MCP workflow sẽ biến thành hộp đen: agent gọi tool, tool lỗi, cost tăng, không ai biết vì sao.

Hướng dẫn log MCP tool calls cho AI agent production: timeline, input/output redaction, latency, retries, user/session/workflow id, cost correlation và failure taxonomy.

MCP Observability: cần log gì khi agent gọi tools MCP giúp agent làm được nhiều việc hơn, nhưng cũng làm workflow khó debug hơn.

Một request có thể gồm nhiều model turns, nhiều tool calls, nhiều retries và nhiều side effects.

Nếu logs chỉ ghi “request failed”, bạn gần như mù.

Observability tốt giúp trả lời ba câu hỏi: 1.

Agent đã định làm gì?

Tool nào được gọi, với input/output nào?

Workflow đó tốn bao nhiêu token, thời gian và tiền?

Tool call timeline Mỗi workflow nên có timeline: user request model call 1 tool call A tool result A model call 2 tool call B final answer/action Timeline giúp thấy agent có loop không, tool nào chậm, lỗi xảy ra trước hay sau model reasoning.

Correlation ids Không có id chung thì log rất khó nối.

Nên có: request id session id workflow id user id workspace id tool call id model request id Khi dùng QuotaCheap cho model gateway, hãy lưu request id/model usage cùng workflow id để nối model cost với tool events.

Input/output phải redact Bạn cần log đủ để debug nhưng không được log secrets.

Nên log: tool name schema version input summary resource ids output summary error code/message Không nên log: access token cookies raw private content nếu không cần payment credentials full customer PII Latency và retries Mỗi tool call nên có: start time duration status retry count timeout flag upstream latency nếu có Rất nhiều cost leak đến từ tool timeout và model retry.

Nếu không log retry, bạn sẽ không biết vì sao bill tăng.

Failure taxonomy Đừng gom mọi lỗi thành ToolError.

Nên phân loại: validation error auth error permission denied not found rate limited timeout upstream error unsafe action blocked human approval required Phân loại lỗi giúp agent quyết định: sửa input, retry, dừng hay hỏi người dùng.

Cost correlation MCP logs chỉ cho biết tool side.

Model gateway logs cho biết token/cost side.

Production cần nối hai phần này.

Mỗi workflow nên có summary: total model calls input tokens output tokens cache read tokens nếu có tool calls failed tool calls total latency estimated cost Đây là nơi QuotaCheap có value rõ: request logs, token usage, latency và billing visibility qua OpenAI compatible API.

Dashboard tối thiểu Một dashboard MCP/agent nên có: top workflows by cost top tools by error rate top tools by latency retry heavy workflows unsafe actions blocked cost per successful task Kết luận MCP without observability là một cái hộp đen có tay chân.

Nó có thể làm được việc, nhưng khi sai thì không ai biết sai ở đâu.

Hãy log theo workflow, nối tool calls với model usage, và xem cost như một phần của observability chứ không phải chuyện cuối tháng mới nhìn.

Cách áp dụng trong một team nhỏ Nếu bạn là founder hoặc technical lead của một team nhỏ, đừng bắt đầu MCP bằng một platform quá lớn.

Hãy chọn một workflow có giá trị rõ và rủi ro vừa phải, ví dụ: đọc tài liệu nội bộ, tra cứu issue, tạo draft ticket, lấy usage report hoặc kiểm tra trạng thái một service.