Research Explainers May 30, 2026 10 min read

Agent Skills Need a Trust Layer

The paper’s practical point: agent skills are becoming the packaging layer for procedural AI work, which means they need governance before they become another unsafe software supply chain.

Source note: Renjun Xu and Yang Yan. “Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward.” arXiv:2602.12430, 2026-02-12. https://arxiv.org/abs/2602.12430

Why This Paper Matters

Agent skills look deceptively simple.

At the surface, a skill is just a small package: a SKILL.md file, some instructions, maybe scripts, maybe references, maybe assets. The agent loads it when the user’s task matches the description. The skill tells the agent how to approach a kind of work.

That sounds like prompt engineering with better filing.

It is not.

The paper argues that skills are turning into a new abstraction layer for AI agents. Instead of expecting the model to carry every workflow in its weights, or stuffing every instruction into the prompt, a system can load procedural knowledge on demand. The agent can become better at a task without retraining because the task knowledge lives in a portable module.

That gives teams leverage. It also creates risk.

A skill can be documentation, but it can also change what context the agent sees. It can tell the agent which tools to use. It can include executable code. It can point at external resources. It can influence permissions and behavior before the user ever sees an output.

So “How do we write good skills?” is only the first question.

The important question is “What kind of trust system do skills need?”

This survey matters because it puts architecture, learning, deployment, and security in one frame. Skills are not a cute productivity feature. They are starting to look like package management for agents.

The Idea in Plain English

The simplest way to understand a skill is this: a tool does something, while a skill teaches the agent how to do something.

A tool might expose a function: search the web, query a database, edit a file, send an email, or run a script. The tool has inputs and outputs.

A skill is higher level. It can say: when you need to analyze a spreadsheet, first inspect the sheet structure, then check formulas, then summarize anomalies, then create a chart, and use these scripts if the file is large. It can bundle the workflow, reference material, code, example files, and constraints.

The paper describes this as progressive disclosure.

The agent does not load every skill in full all the time. It first sees lightweight metadata: name and description. If a task matches, it loads the skill instructions. If needed, it loads deeper resources such as scripts or reference files.

That matters because context is expensive. A large agent could have hundreds or thousands of possible skills. If every instruction, script, and reference file were loaded into the context window at startup, the system would collapse under its own library. Progressive disclosure turns skills into a searchable procedural memory.

The paper also separates skills from the Model Context Protocol. MCP connects agents to tools, data sources, and external systems. Skills describe what to do with those capabilities. In the paper’s framing, MCP is the connectivity layer and skills are the procedural layer.

That distinction is useful. An MCP server can expose the CRM. A skill can tell the agent how to run a renewal-risk review using the CRM, support tickets, meeting notes, and an approval step.

What the Researchers Tested

This is a survey paper, not a new model release.

The authors map the emerging agent-skills field across four areas.

First, they describe the architecture: the SKILL.md specification, progressive context loading, the execution lifecycle, and the relationship between skills and MCP.

Second, they survey skill acquisition. Some skills are written by humans. Some are learned through reinforcement learning. Some are discovered by agents exploring software environments. Some are represented as structured execution graphs. Some are composed from smaller skills. Some multi-agent workflows can even be compressed into single-agent skill libraries.

Third, they review deployment in computer-use agents. This includes GUI agents, visual grounding, OSWorld, WindowsAgentArena, AndroidWorld, ScreenSpot-Pro, SWE-bench Verified, and other benchmarks where agents need to operate software through perception, planning, and action.

Fourth, they review security. This is the sharpest part of the paper. Skills create a new attack surface because they combine natural-language instructions, code, references, and agent trust. The paper synthesizes recent studies on prompt injection through skills, vulnerabilities in community skill registries, and confirmed malicious skill packages.

The authors then propose a Skill Trust and Lifecycle Governance Framework: verification gates, trust tiers, and ongoing runtime monitoring.

What They Found

The survey’s core finding is that skills are becoming a packaging layer for agent expertise.

That creates leverage and supply-chain risk.

Skills Are More Than Better Prompts

The paper draws a clean line between prompt engineering, tool use, and skill engineering.

Prompt engineering helps shape behavior inside a single interaction. Tool use gives the model access to external functions. Skill engineering packages procedural expertise so the agent can load it when relevant.

That means a skill can encode how work should be done rather than only which API should be called.

This is a meaningful shift. Many real tasks do not need one function call. They need a sequence: inspect, plan, retrieve, transform, verify, handle edge cases, and report. A skill can carry that workflow in a reusable format.

For organizations, that makes skills a way to encode institutional knowledge. A good skill can preserve how a team handles invoices, sales research, code review, slide cleanup, support triage, compliance checks, or internal reporting.

The upside is repeatability. The risk is that the agent starts trusting procedural packages it did not properly verify.

Progressive Disclosure Is the Architectural Trick

The paper’s most important architectural idea is progressive disclosure.

At level one, the agent sees lightweight metadata. At level two, it loads the SKILL.md instructions. At level three, it loads supporting files such as scripts, references, and assets.

This design solves a real context problem. A skill library can be large without dumping all of its content into the model at once.

But the same design creates a security problem.

The dangerous instruction may be absent from the visible description. It may be buried in the deeper skill file. The risky behavior may appear only after a script is loaded. The permission mismatch may show up when the agent follows the workflow in a real environment.

So a governance system has to understand the levels. It cannot simply scan the short description and assume the skill is safe.

Skills and MCP Solve Different Problems

The paper frames skills and MCP as complementary rather than competing.

MCP standardizes access to tools, resources, and prompt templates. Skills tell the agent how to use capabilities in a task-specific way.

This matters because a lot of agent talk blurs tools, workflows, memory, prompt design, and orchestration into one pile. The paper gives a cleaner stack.

The model supplies general reasoning. MCP supplies connectivity. Skills supply procedural knowledge. The agent runtime decides when to load skills, when to execute them, and how to monitor them.

That division makes the system easier to inspect. If an agent fails, the question becomes more precise: was the model wrong, was the tool wrong, was the skill wrong, was the context selection wrong, or was the permission boundary wrong?

Skill Learning Is Moving Beyond Human-Written Files

The survey covers several ways skills can be acquired.

Human-authored skills are the most immediately useful. They are easy to version, inspect, share, or adapt.

But the paper also points to systems that learn skills from experience. SAGE uses reinforcement learning with a skill library and reports higher AppWorld task performance with fewer steps and fewer tokens. SEAgent learns from software exploration and improves OSWorld success rates across novel environments. Other approaches encode computer-use knowledge as structured execution graphs or compose modular reasoning skills for math and planning.

The practical takeaway is that skill ecosystems will not stay manually authored forever.

Agents will learn useful procedures. The hard part will be externalizing those procedures into artifacts humans can inspect, test, permission, reuse, or revoke.

An agent that “learns” a skill internally is useful. An agent that can turn that learned behavior into a governed package is much more deployable.

Security Is Not a Footnote

The paper’s security section is the reason this topic deserves operator attention.

One cited line of work shows that malicious instructions can be embedded inside skills and supporting files. That creates prompt-injection paths through the mechanism the agent is supposed to trust.

Another large-scale study found that 26.1% of analyzed community skills contained at least one vulnerability. The paper says these issues included prompt injection, data exfiltration, privilege escalation, and supply-chain risk. Skills with bundled executable scripts were 2.12 times more likely to contain vulnerabilities than instruction-only skills.

A separate study verified malicious skills from community registries and identified attack patterns such as credential exfiltration and agent decision hijacking.

The lesson is blunt: once skills become installable packages, skill security starts to look like package security, with the added complication that the package can manipulate an agent’s reasoning.

Why It Happens

Skills create risk because they sit between instruction and execution.

A normal software package can be dangerous because it runs code. A skill can be dangerous because it may run code and because it can persuade the agent to act differently. It can shape the agent’s interpretation of the task before any visible action happens.

That makes the trust model unusually sensitive.

If the agent treats a loaded skill as authoritative context, then a malicious or sloppy skill can redirect behavior. If the skill has access to scripts, files, network calls, or tool permissions, the damage is not limited to a bad answer. It can become data leakage, unauthorized action, or persistent workflow corruption.

The problem gets worse as skill libraries scale.

At ten skills, humans can roughly reason about what is installed. At hundreds or thousands of skills, selection, review, versioning, dependency management, and permissioning become their own operational system. A bad skill may be rarely triggered, which makes it harder to notice. A useful but outdated skill may silently encode stale policy. A skill generated by an agent may work on one task but fail when composed with another.

This is why the paper’s governance framework matters.

The authors propose verification gates: static analysis, semantic inspection, behavioral sandboxing, and permission-manifest validation. They propose trust tiers, where unvetted community skills receive minimal access and highly verified skills receive wider capabilities. They also propose lifecycle monitoring, where skills can be promoted, demoted, or revoked based on runtime behavior.

That is the right mental model. Skills need lifecycle management after installation.

What This Means for Builders

Builders should treat skills as production artifacts.

That means each skill should have a clear owner, version, purpose, trigger description, required permissions, allowed tools, expected inputs, expected outputs, and tests. A skill that can run scripts should be reviewed differently from a skill that only contains instructions.

The paper also implies that skill loading should be observable.

For any agent run, a builder should be able to answer: which skills were considered, which skill was loaded, which files were read, which tools became available, which permissions were used, and what output changed because the skill was present.

Without that trace, debugging becomes guesswork.

Builders should also avoid treating skills as a dumping ground for context. A skill should be a small, focused procedural package. If it grows into a mini-wiki, it will become harder to route, harder to audit, and easier to poison with contradictory instructions.

The strongest systems will likely separate four jobs:

Skill discovery: finding the right package for the task.

Skill loading: bringing only the needed instructions and resources into context.

Skill execution: using tools and scripts within explicit boundaries.

Skill governance: testing, versioning, permissioning, monitoring, plus revocation.

That is more engineering than “write a Markdown file,” but that is the point. The Markdown file is the interface, not the whole system.

What This Means for Buyers and Operators

Buyers should ask vendors about skills the way they ask about integrations, permissions, and data retention.

The useful questions are concrete.

Can we see which skills are installed?

Can we inspect the full contents of a skill before enabling it?

Are skills signed, versioned, and tied to provenance?

Can different skills receive different permissions?

Can a skill run code?

Can a skill access files, credentials, browsers, SaaS tools, or internal APIs?

Is there a sandbox?

Are skill-triggering decisions logged?

Can a skill be disabled globally after suspicious behavior?

Can users see when a skill influenced an action?

Those questions may sound operational, but they are product-risk questions. If an agent uses skills to perform real work, then skills become part of the control surface of the business.

For operators, the bigger point is that agent capability libraries need governance before they become invisible infrastructure.

The history of software ecosystems is predictable. Package managers created enormous leverage. Then security failures, dependency confusion, typosquatting, abandoned packages, and transitive risk followed. App stores created distribution. Then review, permissions, sandboxing, reputation systems, and revocation became necessary.

Agent skills will follow the same path, only faster, because the artifact is easier to create and the agent may be trusted to act on it.

What to Watch Next

The first thing to watch is whether skill standards converge across agents.

A skill that only works in one vendor’s environment is useful, but the ecosystem becomes much more interesting if skills become portable across runtimes. Portability will force clearer specifications for metadata, file layout, permissions, tests, and tool dependencies.

The second thing to watch is skill selection at scale.

The paper notes that large skill libraries can hit routing limits. If the agent loads the wrong skill, misses the right one, or loads too many, quality drops. Skill search, ranking, composition, and conflict resolution will become real infrastructure.

The third thing to watch is permission design.

The current implicit-trust model is too loose for serious deployments. Skills need capability-based permissions: this skill can read these paths, call these tools, use this network access, write to these destinations, and only under these approval rules.

The fourth thing to watch is learned skills.

If agents can discover useful procedures through experience, the next hard problem is making those procedures inspectable. A learned behavior that cannot be audited will be hard to trust. A learned behavior that can be exported into a testable skill package could become a major operating advantage.

Limitations and Caveats

This is a broad survey, so its value is synthesis rather than definitive measurement.

Several numbers come from fast-moving benchmark and security studies. They should be treated as evidence of direction, not permanent constants. Agent benchmarks change quickly, skill registries change quickly, and the security picture will shift as marketplaces, scanners, and runtimes mature.

The paper also leans into the agent-skill abstraction at a moment when the ecosystem is still young. Some of the proposed standards may change. Some vendor-specific behavior may not generalize. Some skills may end up being replaced by deeper runtime primitives, workflow engines, or policy systems.

There is also an adoption caveat.

Skills are useful when they encode real procedural knowledge. A badly written skill can add ceremony without improving outcomes. A large uncurated skill library can create more routing and governance overhead than value. The case for skills depends on disciplined packaging, not the mere existence of a SKILL.md file.

Still, the paper’s central warning holds. If skills become the way agents acquire procedural capability, then the trust layer around skills will matter as much as the skills themselves.

Source

Renjun Xu and Yang Yan. (2026). Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward. arXiv preprint arXiv:2602.12430. Available at: https://arxiv.org/abs/2602.12430

Research Browse Research & Deep Dives

Move through market maps, company deep dives, cross-profile patterns, papers, reports, and technical explainers.

Start Here Find the best entry point

Use the site map to choose a path through AI, operations, strategy, profiles, and series.

Topic Explore AI systems

Read essays on AI adoption, agents, business systems, and the changing shape of work.