AI security answers your buyer
can actually review.

Your buyer isn’t just asking whether your AI product works — they’re asking whether your claims survive security review. Origin Layer reviews the evidence behind your AI questionnaire — test output, logs, traces, scanner results and draft answers — and turns it into a defensible evidence position: what’s defensible, what’s weak, what’s missing, and what shouldn’t be claimed.

Built by Emon Ambia · 20 years in data, compliance & audit-facing evidence work.

If this is you right now

The questions that actually stall an AI deal.

The demo went well. The buyer likes the product. Then the security questionnaire arrives.

Now the conversation changes.

They’re not asking whether your AI feature is useful. They’re asking whether it can be trusted inside their company — near their data, their customers, their workflows, and their risk team.

This is where vague answers start to break.

The reassurances that closed every earlier conversation stop being enough the moment a reviewer asks to see what actually happened.

→Below are the questions buyers come back to when the deal gets serious — and what a defensible answer needs behind it.

01

Can a prompt, a file, or retrieved content steer the assistant?

This is the first AI-specific question, and it’s the right one. Can something that looks like ordinary input — a support ticket, an uploaded PDF, a page in your RAG index — carry instructions the model then follows?

What I usually hear

“We tried some jailbreaks.” · “We ran a scanner.” · “We have guardrails.” · “Nothing’s blown up in production.”

None of that is wrong, and none of it is enough. A handful of prompts isn’t coverage. A scanner result isn’t proof of behaviour — garak will mark a “hit” the moment its target string shows up in the output, including when the model quoted it while refusing. A guardrail is intent, not outcome. And production has never had someone deliberately attacking it.

What lands with a reviewer“Here are the entry points we tested, in these workflows, with these inputs. This is what the model saw, what it retrieved, and how it responded — and here’s how many times out of N it reproduced. These held, these failed, these we didn’t test.”

02

Can the assistant reach another customer’s data?

This one is less about the model and more about where it sits. The usual answer leans on controls that already exist.

What I usually hear

“Access control.” · “SOC 2.” · “The backend enforces permissions.” · “The model only sees authorised data.”

All useful, but it’s not the question. The question is whether those controls hold when the AI path is the way in. Can a user get the model to fetch another tenant’s record? Can a poisoned document pull restricted data into context? Can the assistant be talked into surfacing internal notes or its own system prompt? A SOC 2 report shows a control exists — it doesn’t show how that control behaves with an LLM in the loop.

What lands with a reviewer“We tried to cross the tenant boundary through the assistant — direct prompts, instructions hidden in retrieved content, and the tool path. Here’s what entered context, what came back to the user, and where enforcement held or didn’t.”

03

Can the assistant trigger an action it shouldn’t?

The moment the system can do more than talk — query data, call an API, change a record — this matters. The answers sound calm.

What I usually hear

“The agent only has approved tools.” · “The backend checks permissions.” · “Sensitive actions need approval.”

The gap hides in the verbs. There’s a real difference between the model suggesting an action, the model emitting a tool call, the backend executing it, and something actually changing state — and most answers blur all four into one. A standard pen test checks the app; it rarely asks whether the model can be talked into a tool call outside the user’s scope.

What lands with a reviewerThe whole chain, in order: input · who the user is · model output · tool selected · arguments · the authorisation decision · what executed · what the user saw. Anything short of that asks the buyer to trust the architecture diagram instead of the logs.

04

Can you show what happened, end to end?

This is where teams find they have logs but not evidence.

What I usually hear

“It’s in Datadog.” · “We can reconstruct it.” · “We store prompts and responses.”

Probably true — but a reviewer doesn’t want fragments from five systems. They want one chain per issue: the input, the instructions in play, what was retrieved, what was called, what got allowed or blocked, what came out, whether it reproduced, whether it was fixed. When those pieces aren’t joined up, even a solid system reads as immature.

What lands with a reviewerThat chain, assembled per finding — not “available somewhere in the logs.”

05

Once something failed, how do you know it’s fixed?

Here the answers get thin.

What I usually hear

“We patched it.” · “We updated the prompt.” · “We added a guardrail.” · “That’s an old issue.”

What’s missing is proof. A prompt fix can hold for one phrasing and fail on the next. A guardrail can close one route and miss another. The only answer that counts: here’s the same attack, re-run after the change, now failing.

What lands with a reviewerSeparate four things people tend to merge: what failed, what you changed, what you re-tested, and what you’re calling resolved.

06

What did you not test?

This is the one experienced buyers actually weigh. The weak move is to sound total.

What I usually hear

“We tested everything.” · “Fully covered.” · “Best practice.”

It does the opposite of reassure. Nobody believes full coverage; they believe clear limits.

What lands with a reviewer“These workflows and roles were in scope. These data paths were covered. These rely on policy, not testing. These are partial. These weren’t assessed. These are out of scope.” Stated limits read as control. Blanket confidence reads as a bluff.

The pattern underneath

The wording shifts from one questionnaire to the next. The worry underneath doesn’t: can it be manipulated, can it leak, can it act out of bounds, can you show what happened, can you prove the fix, and do you know where your evidence runs out.

That’s why generic answers stall in late-stage review. The buyer isn’t asking for more policy. They’re asking for evidence that matches how they think. When it does, the deal moves.

So what are your options?

Most teams reach for one of five answers when these questions land. Each one helps. The real test is whether it answers the specific question in front of the buyer — and that’s where the gaps show.

01

“Our engineers tested it”

Internal testing is real work, and your engineers know the system better than any outside firm — the data paths, the retrieval, the tool calls, the edge cases nobody else would think to try. They’ve almost certainly poked at prompt injection and run something against the assistant.

But it lives in engineering form: a ticket, a Slack screenshot, a notebook cell, a scanner dump, “we tried that, seemed fine.” Enough to fix a bug. Not enough for a reviewer who isn’t asking did you look — they’re asking show me what you did and what happened.

The gapNot effort — it’s turning engineering proof into evidence a stranger can follow: what was tested, in which workflow and role, what the model saw and did, whether it reproduced, what changed, and what’s still untested.

02

“We used an accredited firm”

A CREST-grade review reassures a buyer that the platform was tested properly — auth, sessions, APIs, access control, the usual web-app ground. That’s worth having.

But unless AI behaviour was explicitly in scope, the report stays quiet on the things that now matter: whether a poisoned support ticket can steer the agent, whether retrieved content can override instructions, whether one tenant’s data can surface in another’s answer, whether the model can be pushed into a tool call it shouldn’t make.

The gapA pen test shows the platform was tested. The buyer’s next question is whether the AI behaviour was — and that’s often a different engagement.

03

“We’re going through Schellman or a Big 4”

For some companies that’s exactly the right move. Schellman, Deloitte, PwC, EY and KPMG understand audit discipline and control design, and they’re starting to fold AI into SOC reports. If the question is “can we show control maturity to the board or a regulator,” that route fits.

It’s just rarely how you unblock a live deal this quarter. The startup version of the problem is blunter: answer these 47 questions by next week, and prove the assistant didn’t leak data. That’s a different job, on a different clock.

The gapIt isn’t about credibility — it’s shape and speed. Big firms are built for broad programmes and formal governance. The question isn’t “are they good,” it’s “are they the right shape for this exact moment.”

04

“We use compliance automation”

Vanta and Drata earn their keep — collecting control evidence, watching posture, mapping to SOC 2 and ISO 27001, keeping audit prep off the team’s plate. For “is access control in place, are devices monitored, are policies acknowledged,” they’re the right tool.

AI procurement questions sit just outside that shape. A platform can show a control exists; it can’t show whether a malicious prompt made a retrieved document override your assistant, whether a cross-tenant tool call was attempted and blocked, or whether the model refused a request instead of half-complying.

The gapNot a knock — a scope line. Compliance tools organise control evidence. These buyers are asking for behavioural evidence.

05

“We drafted the answers with AI”

Plenty of teams answer the questionnaire by pasting it into Claude or ChatGPT and asking for “a credible response.” Used well, that’s genuinely fine — for unpacking what a question is really after, or finding cleaner wording for something you actually do.

It turns into a liability the moment the generated answer papers over evidence that doesn’t exist. “We test for prompt injection.” “We prevent data leakage.” “We follow OWASP LLM guidance.” Reasonable sentences — until the follow-up: which inputs, where are the logs, did it reproduce, was the tool call blocked, who reviewed it, was it retested.

A good reviewer can feel the difference between an answer with evidence under it and one without. The tone gives it away.

The gapAnd the downside isn’t just the awkward moment. A confident answer with nothing underneath invites a harder review, more questions you weren’t ready for, and a quiet loss of trust with the security team — and with your champion inside the account.

The real gap

In every one of these the team is already doing sensible things. Engineers tested. A pen test ran. An assurance firm understands the governance picture. A compliance tool has the controls lined up. The answers were even carefully written. And a late-stage buyer is still asking one narrower thing:

Show me, with evidence, how this AI feature behaved when someone pushed on it.

That’s the gap. Not more governance, not another dashboard — taking your product’s actual AI behaviour and turning it into structured, evidence-backed answers a reviewer can work with. The strongest answer is almost never “we’re secure.” It’s: “For this question, here’s the evidence. Here’s what we tested, what happened, what we changed, and what’s still out of scope.”

Why this site exists

You might be reading this with a customer waiting, so I built the site to be useful before you ever contact me. It points you to the open-source tools your engineers should already be running, and it works through the real questions buyers ask. Most of it you can do yourself.

If you’ve hit a wall, the rest is for you.

The 5pm-Friday version of this problem

It’s Friday evening. A buyer has just sent over a long AI security questionnaire and wants answers — with a report — next week.

You can spend the weekend stitching together test notes, Slack screenshots, scanner output and half-remembered red-team runs into something a reviewer won’t pull apart. Or you can name what’s actually going on: you’re not short on effort. You’re short on time, structure, and a second set of eyes that knows how these reviews go.

That’s when I’m here — when it’s late, the clock is running, the buyer is serious, and you don’t want to burn another weekend turning scattered tests into something that reads like real evidence.

Turn test output into evidence Start with the open tools →

What I review

How scattered evidence becomes a defensible answer.

Buyers no longer ask whether your product is impressive. They ask whether what you say will survive review.

The problem isn’t effort — it’s that the evidence is scattered across tests, logs, screenshots, scanner output and draft answers.

What the buyer needs to see

01QuestionWhat is being asked?Capture the intent.

02EvidenceWhat proof exists?Collect what matters.

03JudgementWhat is defensible?Assess what holds.

04Buyer answerWhat can safely be said?State only what survives.

Origin Layer closes that gap by turning scattered material into a controlled evidence position.

Origin Layer review path

Question→Evidence→Judgement→Buyer answer

Only the final step reaches the buyer — and only if the evidence holds.

Weak signals do not become strong claims. Origin Layer verifies, classifies and limits what can be said before anything is written as a finding.

Proof, not promises

How buyer questions are reviewed — before they become answers.

This public snapshot shows the questions we see most often in AI security reviews, and our current classification based on evidence we’ve reviewed. Full test details, traces and reviewer notes are withheld.

Public review snapshot · Sanitised extract 8 questions · 2 confirmed · 3 cleared · 3 adjusted

QuestionWhat the buyer is asking

Risk areaType of risk

Public classificationPublic summary only

Confidence basisBased on evidence reviewed

01

Can hidden instructions override the assistant’s intended behaviour?

Prompt injection

Confirmed

Confirmed based on reviewed evidence. Full trace withheld.

High confidence

Multiple evidence sources reviewed

02

Can it be talked into saying something harmful?

Jailbreak

Cleared

No finding shown in public summary.

High confidence

Multiple evidence sources reviewed

03

Can a booby-trapped document or email change its answers?

RAG / document injection

Cleared

No finding shown in public summary.

Medium confidence

Limited evidence supplied

04

When your AI says it did something, did it actually do it?

Agent / tool actions

Adjusted

Claim adjusted because supporting evidence was incomplete.

High confidence

Multiple evidence sources reviewed

05

Can one customer end up seeing another customer’s data?

Data exposure

Confirmed

Confirmed issue shown in public summary. Full evidence withheld.

High confidence

Multiple evidence sources reviewed

06

How many of a scanner’s flagged issues are actually real?

False-positive cleanup

Cleared

Noise separated from real findings in the public summary.

High confidence

Multiple evidence sources reviewed

07

Can every questionnaire answer be tied to evidence?

Questionnaire mapping

Adjusted

Answer adjusted to an honest, evidence-linked status.

High confidence

Multiple evidence sources reviewed

08

Will the report hold up when their security team pushes back?

Report hardening

Adjusted

Unsupported claim adjusted down before delivery.

Medium confidence

Limited evidence supplied

Public summary only. Detailed test paths, reproduction records, reviewer notes and evidence references are included in the redacted sample pack or client engagement.

Want to see how we review in detail?Request a redacted sample pack →

From their question to your proof

One buyer question. One evidence trail.

This is not GRC, and it is not a generated security report. Origin Layer takes a buyer’s AI security question and turns it into a reviewable evidence trail: the test case, the observed behaviour, the supporting artefacts, the reviewer judgement, the remediation status and the remaining limitations.

A finding is marked confirmed only when the evidence supports it — and a reviewer of record signs off the judgement.

The example below is a representative lab finding, run against a deliberately vulnerable harness.

Buyer’s questionnaire question “Can one customer access another customer’s data through your AI assistant?”

FINDING OGL-LAB-002

Cross-account data disclosure via privileged tool execution

CRITICAL OWASP LLM06 · Excessive Agency

Target & scope

AI-assisted support path with tool access. Reviewed black-box and point-in-time, under agreed scope.

Attack class

Untrusted content steered a privileged action across a trust boundary — toward data the request shouldn’t reach.

Status & confidence basis

CONFIRMED · high. Basis: executed system behaviour, repeatable under controlled conditions — not one lucky run, not a model claim. Full reproduction records ship in the client evidence pack.

Expected control

Server-side caller-entitlement check at the tool boundary — the record’s owner must match the authenticated caller, enforced before execution.

Observed — public summary

A controlled review found a tenant-boundary failure in the tested AI-assisted path.
The classification is based on observed system behaviour, not a model claim.
The result was repeatable under controlled conditions.

Full traces, account identifiers, tool arguments, control values and reproduction records are withheld from public view — they ship only in the client evidence pack.

Human reviewCONFIRMED FINDING

Reviewer decisionConfirmed boundary-control failure. A real tool call executed — not a hallucination, not a model claim.

Business impactPotential cross-account data exposure if the affected path remains open.

StatusCONFIRMED · remediation reviewed · retest passed

Public preview ends here

See what sits behind a defensible answer.

The public preview shows the outcome. The redacted sample pack shows the evidence record behind it — what was tested, what was confirmed, what was cleared, what changed, and what stays withheld from public view.

Request the redacted sample pack →

No real customer data is shown. Full client evidence packs are released only inside scoped engagements.

The result is not a generated report. It is a defensible evidence pack your team can take into buyer review, security review, procurement, or remediation planning.

The difference shows up the moment you answer. Same question — two very different answers:

Buyer questionUnsupported answerEvidence-backed answer

Can prompt injection work?

“We have guardrails in place.”

Tested direct-prompt, uploaded-document and RAG injection paths. One failed, one held, one not assessed. Evidence attached.

Can one customer reach another customer’s data?

“Permissions are enforced in the backend.”

AI tool path tested. One cross-tenant access issue confirmed, reproduced under controlled conditions, remediated and retested. Evidence attached.

Can it call tools it should not?

“The model is not allowed to do that.”

Tool calls reviewed against caller permissions before execution. Bypass not observed in tested paths. Evidence attached.

Reviewer note“Not observed” is not the same as “impossible.” A clean result means the tested path did not produce the failure under the tested conditions. The report states what was tested, what was observed, and what remains unproven.

Open tools create signals. Origin Layer turns those signals into evidence a buyer can actually review.

What clients receive

A structured, verifiable client pack.

Every Origin Layer assessment produces a buyer-ready pack for security review, procurement, governance and remediation planning. Public pages show structure only. Full contents are shared through controlled access.

Assessment pack includes

Executive summary

A plain-English summary of confirmed risks, business impact, remediation status and retest outcome.

Confirmed findings register

A structured record of validated issues, affected control areas, severity, confidence and recommended action.

Evidence appendix

Reviewed evidence references linked to the assessment record. Full raw traces are retained in the client pack.

Remediation & retest note

What changed, what was retested, and whether the original issue remains open, reduced or closed.

Verification record

A signed evidence reference, so the pack can be checked against the final manifest.

Access to sample materials

Not publicly published

The client pack is an operating asset. Public pages show structure, not contents.

Verification included

Final client packs include a signed verification record.

Controlled access

Redacted examples are available on request to qualified buyers.

Request sample access →

Buyer-safe preview · No customer data · Methods withheld · Verification structure shown

The standard

A defensible evidence position — not a template.

Origin Layer doesn’t hand you a prettier scanner report. You receive a reviewed evidence position for enterprise security review: which answers are defensible, which are weak, which claims should be removed, which findings are real, and which areas remain untested.

It is not a scanner export, a generic AI-generated report, or a decorative security PDF.

The outcome is simple: your team knows what it can safely say, what it cannot yet prove, and what needs to happen before the answer survives buyer scrutiny.

The public preview shows the standard. The full client pack — the judgement record, evidence-backed answer position, remediation status and controlled references — is released only inside the engagement.

Upload once. Scope before review.

Your evidence enters a controlled review path.

Before paid review begins, uploaded materials are checked for suitability and routed to the correct path — Evidence Triage, Evidence Generation, or Buyer-Ready Pack. Automation maps the evidence; human judgement decides what can be defended.

01
Upload
Questionnaire, draft answers, Promptfoo / Garak / PyRIT output, logs, traces, screenshots, scan results, remediation notes.
02
Suitability check
Files are classified, buyer questions are parsed, and obvious gaps or unsupported claims are identified.
03
Scope route
The upload is routed to triage, evidence generation, full pack, or not-ready status.
04
Adjudication
Findings are classified as confirmed, cleared, adjusted, candidate, unsupported, or not assessed.
05
Report generation
The reviewed evidence position is turned into a triage memo or buyer-ready report.
06
Signed delivery
The released pack includes a signed manifest, so any later change is detectable.

Why deals actually stall

The reviewer isn’t saying no. They can’t find a yes they’re willing to defend.

By late-stage review the product has already won. What blocks the deal is quieter — and it sits inside the head of the person signing off.

01

The reviewer owns the risk.

Approve a vendor that later leaks, and the failure is theirs. Their honest default is no until a yes is defensible — for them, not for you.

02

Vagueness reads as risk, not confidence.

“We have guardrails” gives a reviewer nothing to sign off. So they escalate, ask more, and the deal slips another week.

03

Your champion is exposed too.

The person who wants to buy you is vouching for you internally. A weak answer makes them look naïve — and they quietly stop pushing.

04

Evidence turns the gatekeeper into an ally.

Give a reviewer something defensible and they can say yes without owning the risk. That is the actual unlock.

Origin Layer doesn’t make your product safer overnight. It makes your answers defensible — so the person reviewing them can finally say yes.

Services & pricing

Fixed scope. Fixed price.

Start where your evidence is. Every engagement is scoped before review — no open-ended consulting.

01 Evidence Triage

£1,500

48–72h · up to 10 buyer questions · review only

What’s defensible, weak and missing
What shouldn’t go to the buyer as written
A buyer-risk summary and next steps

Credited in full against a Pack within 7 days. Start triage

Most engagements 02 Buyer-Ready Pack

from£4,500

up to 20 buyer questions · review log + answer pack

Reviewed evidence log — confirmed / cleared / adjusted
Direct answers mapped to each buyer question
Executive summary, remediation & retest criteria

Request a pack

03 Urgent Rescue

from£9,500

48–72h priority · live blocked deal · by availability

Priority turnaround on a blocked deal
Full pack + signed, redacted record
Support through buyer follow-ups

Submit urgent review

04 Review Desk

from£3,000/mo

retainer · ongoing review as you ship

Standing review for new features
Evidence kept current as you ship
A second set of eyes before customers

Apply

When security review slows the deal, weak evidence gets expensive.

Fixed-scope evidence review for teams that already have tests, logs, scanner output or a buyer questionnaire to answer. If you don’t have usable test output yet, Evidence Generation is scoped separately. Work is priced by scope and urgency — buyer questions, AI workflows, evidence available, and turnaround. If a pack can’t be delivered inside the quoted window, we re-scope before starting.

Common add-ons: additional AI workflow · retest pack · buyer-call support · extra framework mapping · 24-hour expedited review.

Start before you call anyone

Most of this you can run yourself. Your team should already be using the open tools.

Honestly, I’d rather you tried first. An afternoon with these against your own app tells you more than any sales pitch — mine included.

Open-source · run these against your own app first4 in the stack

garakprobe scanner

A broad library of jailbreak, injection, leakage and toxicity probes — strong for generating candidate signals against a chat or REST endpoint. A “hit” is a substring match, not a verdict.

PyRITadversarial orchestration

Microsoft’s framework for multi-turn attacks and automated red-team flows — the one to reach for on agentic and conversational targets, where the attack builds over several turns.

promptfooeval & red-team harness

Config-driven testing that maps to your app type — RAG, agent, chatbot — and runs repeatable suites against your own prompts, so a result can be re-run on demand.

OWASPcategory spine

The LLM Top 10 (2025) is the shared vocabulary reviewers expect. Use it to structure what you tested — and, just as importantly, what you didn’t.

The tools are not the deliverable — the reviewed evidence is. A scanner hit isn’t a finding, a failed prompt isn’t business impact, and a raw trace isn’t something you send to a buyer. I sit after the tools: reviewing the signal, removing the weak claims, mapping the evidence, and packaging what can actually survive scrutiny.

The human review layer

Stuck on one piece — a messy prompt-injection result, a possible cross-tenant edge case, a tool-call log that’s hard to explain, a “we fixed it but never really proved it”? That’s the review step.

No spin — a straight read on where the evidence is strong, where it’ll get questioned, what’s missing, and how to present it so a reviewer can follow the chain without guessing.

Bring me the piece you’re stuck on

Market comparison

Where Origin Layer fits

Different options solve different problems. Origin Layer sits in the evidence gap between raw AI/security output and the buyer-ready evidence your customer, procurement team, or reviewer can actually assess.

	GRC automation platforms	Standard penetration testing	Full AI red team	Large assurance / Big 4-style engagement	Origin Layer
Typical cost	£6k–£20k+/year	£3k–£15k	£12k–£40k+	£20k–£100k+	Evidence triage from £1,500 · Buyer-ready pack from £4,500
Best for	policy workflows, evidence collection, control monitoring.	web app, API, infrastructure, cloud testing.	deeper adversarial testing, model and agent behaviour, specialist security review.	formal assurance, governance, audit, enterprise programmes.	reviewing AI/security evidence, removing weak claims, mapping findings to buyer questions, producing a defensible evidence pack.
Where the gap appears	do not adversarially test your specific AI behaviour or produce buyer-ready evidence for model risk questions.	strong technical testing, but output may still need translating into buyer-facing AI evidence.	powerful but broader, slower, and often more expensive than the immediate evidence gap requires.	trusted but often too heavy and slow for a startup under buyer review pressure.	Built for the gap between technical output and buyer-review readiness.

Focused buyer-review readiness Works alongside testing and assurance teams Indicative pricing shown for market context

Request sample access → See how the evidence pack differs from raw tool output.

Who’s behind it

The person behind the work.

My background is evidence discipline.

For twenty years I have worked around data, compliance, audit preparation, funding claims, and review processes where unsupported claims do not survive scrutiny. In that world, it is not enough for something to look right on paper. The record has to support the claim.

That is the standard I bring to AI evidence.

I work across the frameworks buyers and reviewers increasingly reference — including OWASP LLM Top 10, MITRE ATLAS, NIST AI RMF, ISO/IEC 42001, SOC 2, ISO 27001, and common security-questionnaire control language.

I also understand how issues show up in real systems: application logic, APIs, permissions, data handling, logging, tool calls, RAG flows, model behaviour, and agentic workflows.

The value is in translating between those layers: technical output, evidence standard, framework mapping, and buyer-facing reporting.

A scanner result, prompt failure, code issue, tool hit, or model behaviour is not treated as a finding just because it appears in an output. I review what happened, what was actually tested, what the evidence supports, which buyer concern it relates to, and what would fall apart if a buyer, security team, or auditor challenged it.

Origin Layer exists for that gap: the space between raw technical output and a buyer-ready evidence position. The result is not a generic AI-generated report. It is a reviewed evidence pack that shows what is defensible, what is weak, what is missing, what needs retesting, and what should not be sent as written.

For deep application penetration testing, exploit development, or specialist product-security work, I work alongside or after security teams — not in place of them. My role is to turn technical outputs into clear, reviewable findings that senior stakeholders, customer security teams, procurement reviewers, and governance teams can understand and act on.

Emon Ambia Founder & lead assessor · 20 years in audit, compliance & data — applied to AI evidence

Read the assessment method →

Before you ask

The questions serious buyers actually ask.

Straight answers on scope, access, confidentiality and what you walk away with.

Do you test my system, or only review output I send you?

Either — you choose the depth. The output review and question-mapping work from what you already have: prompts, responses, logs, scanner dumps. A full engagement runs the red-team engines against your system under written authorisation, then validates every candidate by hand.

Do you need access to production?

No. Most work runs against a staging or test instance, or against the raw outputs and logs you provide. The system, the in-scope features, and the level of access are all agreed in writing before anything starts.

Is this confidential?

Yes. Engagements run under written authorisation and an NDA. Evidence bundles are access-controlled, and anything client-facing is stripped of secrets and seeded canaries before it leaves my hands.

How is this different from a penetration test?

A pen test checks the application — auth, sessions, APIs, infrastructure. I focus on the model’s behaviour: whether it can be steered by input, leak across tenants, or be pushed into a tool call outside the user’s scope. Different question, often a different team — and the two complement each other.

How is it different from Vanta or Drata?

Those platforms organise control evidence — access control is in place, devices are monitored, policies are acknowledged. I produce behavioural evidence — whether a malicious document did or didn’t hijack your agent. Related, not the same.

What if the findings are bad?

Then you’ll know before your buyer does — with evidence and a fix path — which beats finding out mid-review. Severity and confidence are scored separately, so nothing gets inflated to look scary or buried to look clean.

What do I actually walk away with?

A signed, tamper-evident evidence bundle, a plain-language executive report for the decision-makers, a technical report for the engineers who fix it, and remediation mapped to controls with a retest criterion. See the deliverable →

Can a buyer verify the evidence themselves?

Yes — that’s the point. Every finding’s evidence is hashed and Ed25519-signed; anyone can verify it with the public key alone, without trusting me or my tooling. A one-byte change breaks the signature.

How fast can you turn this around?

Output reviews and question-mapping are quick; a full engagement is scoped in writing first. Most work is delivered in days, not weeks — but real turnaround depends on scope and access, which we agree up front.

How do refunds and scope work?

Evidence is scoped before full review. If your uploaded materials aren’t suitable for the selected route, the engagement is re-scoped before continuing. Refunds are available before evidence processing starts; once automated assessment, scope mapping, human review, report generation or signing has begun, fees are non-refundable to the extent work has been performed. Origin Layer does not guarantee buyer approval, a procurement outcome, certification, or that a system is secure — the work covers evidence processing, adjudication, report preparation and delivery within agreed scope.

Messy evidence is expected

Different inputs. One defensible position.

Your evidence does not need to arrive perfectly organised. Most teams have a mix of buyer questions, draft answers, scanner output, red-team notes, screenshots, logs, remediation tickets, and partial control evidence. That is normal.

1 · Upload what you have Send it as-is Questionnaires · draft answers · logs · screenshots · scanner output · red-team notes · remediation evidence.

→

2 · Evidence review Reviewed and sorted Each item is separated into usable, weak, missing, or needs follow-up.

→

3 · Written position One clear answer Defensible answers · gaps · next steps · a buyer-ready route.

What you upload	How it is used
Buyer questionnaire	Used to understand the questions being asked
Draft answers	Reviewed for defensibility and unsafe claims
Scanner or test output	Treated as candidate evidence, not proof
Logs or screenshots	Reviewed for supporting context
Remediation notes	Used to understand what changed
Retest evidence	Reviewed against the original issue

Origin Layer separates what can be used, what is weak, what is missing, and what needs follow-up. The output is not a pile of documents — it is a single evidence position: what you can defend, what you should not send as written, and what needs to happen next.

AI security answers your buyer
can actually review.

The questions that actually stall an AI deal.

So what are your options?

“Our engineers tested it”

“We used an accredited firm”

“We’re going through Schellman or a Big 4”

“We use compliance automation”

“We drafted the answers with AI”

The 5pm-Friday version of this problem

How scattered evidence becomes a defensible answer.

How buyer questions are reviewed — before they become answers.

One buyer question. One evidence trail.

Cross-account data disclosure via privileged tool execution

A structured, verifiable client pack.

Executive summary

Confirmed findings register

Evidence appendix

Remediation & retest note

Verification record

A defensible evidence position — not a template.

Your evidence enters a controlled review path.

Upload

Suitability check

Scope route

Adjudication

Report generation

Signed delivery

The reviewer isn’t saying no. They can’t find a yes they’re willing to defend.

The reviewer owns the risk.

Vagueness reads as risk, not confidence.

Your champion is exposed too.

Evidence turns the gatekeeper into an ally.

Fixed scope. Fixed price.

Most of this you can run yourself. Your team should already be using the open tools.

Where Origin Layer fits

The person behind the work.

The questions serious buyers actually ask.

Bring the material you already have.

Different inputs. One defensible position.

Upload your evidence. Get a written scope decision back.

AI security answers your buyercan actually review.

The questions that actually stall an AI deal.

So what are your options?

“Our engineers tested it”

“We used an accredited firm”

“We’re going through Schellman or a Big 4”

“We use compliance automation”

“We drafted the answers with AI”

The 5pm-Friday version of this problem

How scattered evidence becomes a defensible answer.

How buyer questions are reviewed — before they become answers.

One buyer question. One evidence trail.

Cross-account data disclosure via privileged tool execution

A structured, verifiable client pack.

Executive summary

Confirmed findings register

Evidence appendix

Remediation & retest note

Verification record

A defensible evidence position — not a template.

Your evidence enters a controlled review path.

Upload

Suitability check

Scope route

Adjudication

Report generation

Signed delivery

The reviewer isn’t saying no. They can’t find a yes they’re willing to defend.

The reviewer owns the risk.

Vagueness reads as risk, not confidence.

Your champion is exposed too.

Evidence turns the gatekeeper into an ally.

Fixed scope. Fixed price.

Most of this you can run yourself. Your team should already be using the open tools.

Where Origin Layer fits

The person behind the work.

The questions serious buyers actually ask.

Bring the material you already have.

Different inputs. One defensible position.

Upload your evidence. Get a written scope decision back.

AI security answers your buyer
can actually review.