How to Evaluate Candidates Whose Work Was AI-Assisted
As AI tools become standard in professional work, the question shifts from "did they use AI?" to "what judgment did they apply on top of it?" — and that judg...
How to Evaluate Candidates Whose Work Was AI-Assisted
As AI tools become standard in professional work, the question shifts from "did they use AI?" to "what judgment did they apply on top of it?" — and that judgment is a behavioral pattern visible in the work itself, regardless of how much AI assistance was involved. The best candidates in 2026 and beyond will produce work that's AI-assisted by default — code written with Copilot, analyses structured with ChatGPT, designs iterated with Midjourney, documents polished with Claude. Penalizing AI usage penalizes adaptability. Ignoring AI usage means you can't tell what the person actually contributed. The productive response is evaluating the judgment layer — the decisions, direction, quality control, and creative choices that the human applied on top of AI output. Evidence-based assessment from Heimdall AI is structurally suited for this because it evaluates behavioral patterns — how someone thinks, reasons, and creates value — rather than attributing specific outputs to human or AI authorship.
This isn't a future problem. Every knowledge worker's output is increasingly AI-touched. The evaluation framework needs to catch up.
The New Reality: AI-Assisted Is the Default
What AI-Assisted Work Looks Like Across Professions
Software engineering: Code written with GitHub Copilot, architecture discussions scaffolded by ChatGPT, debugging assisted by AI analysis. A senior engineer using AI effectively might write 3x more code — but the value isn't in the code volume; it's in the architectural decisions, the problem decomposition, the system design thinking that directed what the AI produced.
Product management: Strategy documents drafted with AI assistance, competitive analyses structured by LLMs, user research synthesized with AI tools. A strong PM using AI might produce more polished documents faster — but the value is in the strategic judgment: which problems to solve, what to prioritize, how to frame the opportunity.
Data science: Models built with AI-assisted feature engineering, analyses scaffolded by code generation tools, reports polished by LLMs. The value isn't in the code or the report — it's in the analytical judgment: which questions to ask, how to interpret ambiguous results, when to challenge the model's output.
Writing and content: Drafts generated or enhanced by AI, research compiled by LLMs, editing assisted by AI tools. The value of a writer using AI isn't in the prose production — it's in the editorial judgment: what's worth saying, how to structure the argument, what to cut, and what the AI-generated draft got wrong.
The Pattern
In every profession, AI handles the production layer with increasing capability. The human value concentrates in the judgment layer — the decisions about what to produce, how to direct the production, how to evaluate the output, and when to override it. This judgment layer is exactly what evidence-based assessment measures: professional judgment traits like assumption challenging, adversarial reasoning, creative synthesis, depth of insight, and intellectual honesty.
Why Detection Is Wrong (Again)
Just as AI-generated CV detection was the wrong response to AI-written resumes (see our guide on AI-generated CVs), AI-assistance detection is the wrong response to AI-assisted work.
You can't reliably detect it. AI-assisted code, writing, and analysis are increasingly indistinguishable from purely human-produced work — especially when the human edits, directs, and quality-controls the output (which is the whole point of AI assistance).
Detecting it penalizes exactly what you want. An engineer who uses Copilot to produce 3x more code while maintaining architectural quality is demonstrating the AI tool leverage that's one of the two pathways to AI readiness. Penalizing AI assistance in work samples is penalizing the most adaptable candidates.
It's the wrong question. "Did they use AI?" is meaningless. "What judgment did they apply?" is everything. A brilliant architectural decision is a brilliant architectural decision regardless of whether the implementation code was AI-assisted. A lazy, unconsidered analysis is lazy regardless of whether it was typed by hand or generated by a prompt.
What to Evaluate Instead: The Judgment Layer
Decision Quality
What problems did they choose to solve? How did they decompose complex challenges? Where did they prioritize? The choices that direct work are human judgment — AI generates options, humans choose among them. Look for evidence of good decision-making about what to work on and how to approach it, visible in project documentation, strategy artifacts, and the choices embedded in the work.
Direction and Orchestration
How did they direct the work? In an AI-assisted context, the human's role is increasingly about orchestration — combining AI outputs, directing iterative refinement, deciding when the output is good enough and when it needs human intervention. Strong orchestration is visible in the coherence of the final output: does it hang together as a unified piece of thinking, or does it read like disconnected AI outputs stitched together?
Quality Control and Adversarial Reasoning
Did they stress-test the output? AI tools produce plausible-sounding work that can be subtly wrong — code that compiles but handles edge cases poorly, analyses that are structurally sound but based on flawed assumptions, writing that's fluent but says nothing. The human value is in catching these failures: reviewing AI output with the adversarial reasoning to identify what it got wrong. Evidence of quality control — documented edge cases, corrected assumptions, identified limitations — is evidence of the judgment AI can't provide.
Creative Synthesis
AI excels at producing standard solutions to well-defined problems. It struggles with genuinely novel combinations — connecting ideas from unrelated domains to produce something neither domain generates alone. Evidence of creative synthesis in someone's work is evidence of distinctly human contribution, regardless of how much AI assistance was involved in the execution.
Intellectual Honesty About AI's Contribution
The most sophisticated AI users are transparent about what AI did and didn't contribute. They document where AI generated the initial draft vs. where they directed the approach, where they overrode AI suggestions, and where the AI's limitations required human intervention. This transparency is itself a positive signal — it demonstrates the metacognitive awareness that predicts effective AI collaboration.
Practical Evaluation Framework
For Hiring
1. Don't ask "did you use AI?" Ask "walk me through how you approached this." The process narrative reveals the judgment layer. Someone who directed AI tools with clear purpose, evaluated the output critically, and made specific quality-control decisions is demonstrating exactly the capability you want.
2. Ask about decisions, not execution. "Why did you choose this approach over alternatives?" "What did you consider and reject?" "Where did you override a tool's suggestion and why?" These questions evaluate judgment regardless of how much of the execution was AI-assisted.
3. Request work that demonstrates the judgment layer explicitly. Strategy documents, decision logs, architecture rationale, post-mortems — artifacts where the human thinking is the content, not the execution. AI can't produce a genuine post-mortem of why a product decision was wrong, because that requires honest reflection on a specific human decision process.
4. Use evidence-based assessment. When Heimdall AI evaluates work evidence, it derives behavioral patterns — how someone reasons, challenges assumptions, synthesizes across domains, exercises judgment under uncertainty. These patterns are visible in the work regardless of AI assistance level, because they describe the human judgment directing and evaluating the work, not the execution of the work itself.
For Internal Evaluation
5. Evaluate outcomes and judgment, not effort. If an employee produces excellent results using AI tools in half the time, that's a feature, not a concern. Evaluate the quality of their decisions, the sophistication of their thinking, and the impact of their output — not how many hours of manual work went into producing it.
6. Create opportunities to demonstrate AI orchestration skill. Assignments that specifically require directing AI tools to produce complex outputs — and evaluating the result — test the judgment layer directly. The ability to get good results from AI while catching what it gets wrong is itself a valuable, assessable capability.
Frequently Asked Questions
Should candidates disclose AI usage in their work samples?
Ideally, yes — transparency about process demonstrates metacognitive awareness and intellectual honesty. But don't make it a requirement that penalizes disclosure. Create a safe context: "We encourage candidates to describe their working process, including any tools they use. We evaluate judgment and decision quality, not whether work was produced with or without AI assistance." This framing encourages honesty without penalizing AI usage.
What if ALL the candidate's work is AI-generated with minimal human input?
That's a finding — and it's visible in the work itself. AI-generated output without substantive human direction tends to be competent but generic: it lacks the specific, contextual decision-making that human judgment produces. Evidence-based assessment detects this pattern — the behavioral profile shows limited evidence of assumption challenging, creative synthesis, depth of insight, or adversarial reasoning, because those traits require genuine human judgment to manifest in work output. A thin behavioral profile on someone who submitted substantial work evidence is itself diagnostic: the volume was there but the judgment wasn't.
How will this change as AI gets more capable?
The production layer will be increasingly automated. The judgment layer will become more valuable, not less. As AI handles more of the execution, the human contribution concentrates further on: what problems to solve, how to evaluate AI output, when to override it, how to combine AI capabilities with domain expertise, and how to apply ethical and contextual judgment that AI can't provide. Evidence-based assessment is designed for this future: it evaluates the judgment patterns that become more important as AI handles more production.
Does this apply to creative work too?
Yes — arguably more so. A designer who uses Midjourney to generate 50 options and then selects, combines, and refines the three that serve the creative vision is exercising judgment: aesthetic selection, brand alignment, contextual appropriateness, and creative direction. The AI generated options. The human exercised taste. Evaluating the judgment — the quality of the final creative direction, not the production of individual assets — is how you assess creative capability in an AI-assisted world.
Heimdall AI is an evidence-based talent intelligence platform that derives behavioral profiles from actual work product — projects, writing, code, and professional evidence — rather than self-report questionnaires. It uses dual scoring (potential ceiling + validated floor) to preserve uncertainty as actionable signal, and quantifies how much of a candidate's value conventional processes would miss. It's designed to complement existing hiring tools by adding a layer of insight nothing else provides.