News

Why AI demands a redesign of assessment—not just a policy response

Share with colleagues

Download the full Case Study

Take an in-depth look at how educators have used Cadmus to better deliver assessments.

Thank you!

We'll get back to you shortly.

Every institution I speak with has updated its AI policy in the last eighteen months. Most have updated it more than once. The policies are getting longer, more specific, more carefully worded. And the problem they are trying to solve is getting worse.

That is not a coincidence. It is a diagnostic.

When a policy response makes a problem more complex without making it smaller, the policy is mis-scoped. It is acting on the wrong part of the system. And in the case of AI and academic integrity, I think that mis-scoping is now visible enough to name directly: we are regulating the behaviour of students inside an assessment model that AI has already rendered unreliable. The policy is downstream of the problem. The problem is in the design.

Assessment has always run on proxies

To understand why this matters, it helps to be honest about what assessment has always been doing.

Assessment doesn’t measure learning directly. It never has. It measures outputs— submissions, responses, performances—and treats those outputs as proxies for the things we actually care about: thinking, capability, understanding. That proxy relationship has always involved an assumption: that a student who produces the output has done the cognitive work the output is supposed to represent.

Submission as proxy for thinking. Output as proxy for capability. Completion as proxy for learning. Generative AI has broken all three simultaneously.

A student can now produce a submission that is well-structured, coherent, and technically sound—meeting every criterion on the rubric—without engaging in any of the thinking the task was designed to elicit. The output has been decoupled from the cognition it was supposed to evidence.

When a proxy breaks, you have two options. You can try to patch it—add verification layers, detection tools, honour statements, AI disclosure requirements. Or you can recognise that the model relying on the proxy has to change. The first option is a policy response. The second is a design response. And in this case, only the second actually addresses the problem.

Why you can’t regulate your way out of a design failure

Policy acts after the task is set. It governs how students are expected to approach an assessment that already exists—what tools they can use, what assistance is permissible, what constitutes a violation. It operates within the assessment model.

What AI has done is change what the task is. Not just how students can complete it, but what completing it now means. A task designed to evidence reasoning through a written product no longer evidences reasoning through a written product, because the written product can now be generated independently of any reasoning. The task has changed at the level of its design assumptions. Policy cannot reach that level. Only redesign can.

This is not an argument against policy—institutions need clear expectations and fair processes. It is an argument about the limits of what policy can achieve when the underlying design is the source of the problem. You cannot regulate your way out of a design failure. You can only obscure it for a while.

An institution that invests primarily in policy and detection creates the appearance of an integrity system while the actual signal quality of its assessments continues to erode.

The grades are being awarded. The credentials are being issued. The proxy is broken. And the gap between what the qualification claims and what it represents widens quietly—until it becomes visible in a context no institution wants.

The shift that is actually needed: from evaluating outputs to evidencing thinking

Assessment needs to move from evaluating outputs to evidencing thinking. Not as an alternative to output-based tasks, but as a structural shift in what counts as valid evidence of learning. The output—the essay, the report, the case analysis—is not sufficient on its own. What matters is whether the student can demonstrate the thinking behind it.

This is where oral assurance becomes not just one tool among many but a design principle. When students know they will be asked to explain their reasoning, defend their decisions, or respond to a question about their own work, the nature of the task changes fundamentally. AI can contribute to the output. It cannot substitute for the student in the moment of demonstration. That moment is where genuine capability becomes visible—and where the proxy relationship between output and understanding is replaced by something more direct.

Good assessment design in an AI-enabled world needs to do four things that many current designs do not.

It needs to make thinking visible—to require students to show their process, not just their product. Staged work, documented decision-making, reflective commentary on how and why, not just what.

It needs to embed progression—to assess learning over time rather than at a single point. Tasks that build on each other, that require students to demonstrate developing understanding rather than a single performance that can be fabricated in isolation.

It needs to require application and contextualisation—to ground tasks in the specific, the local, the personal. Generic tasks are easy to generalise across. Tasks that require students to apply concepts to their own placement, their own institution, their own field are significantly harder to outsource meaningfully.

And it needs to align to purpose—to make the reason the task exists legible to students. Assessment that students experience as genuinely purposeful, connected to the capability they are developing and the world they are entering, is assessment they are more likely to engage with authentically.

Reframing AI’s role

None of this requires treating AI as the enemy. That framing has never been particularly useful, and it is becoming less so as AI becomes a standard feature of professional practice in almost every field universities prepare students for.

AI is fine if it supports thinking. It is a problem when it replaces the demonstration of thinking. That distinction is a design question, not a policy question.

An assessment that requires genuine demonstration of understanding—that makes thinking visible, demands live explanation, or requires contextualised application—can accommodate AI as a tool in the process. A student who uses AI to explore ideas, generate early drafts, or stress-test an argument, and who can then demonstrate and defend their understanding, has learned something real. The tool is not the issue. The question is whether the design creates the conditions under which genuine learning is demanded and visible.

Assessment policy that prohibits AI without redesigning what assessment demands is a temporary friction that resolves nothing structurally. Assessment design that changes what counts as valid evidence of learning is a response proportionate to the problem.

What this requires

This transition is not without cost, and intellectual honesty requires acknowledging that. Redesigning assessment at institutional scale is demanding work—it requires curriculum expertise, staff capability development, and governance structures that treat assessment quality as a strategic priority rather than an administrative function. The institutions making real progress on this are investing accordingly.

But the alternative cost is also real, and I think it is underestimated. An institution that continues to issue credentials on the basis of a broken proxy is making a claim about its graduates that it cannot fully support. As employer scrutiny of graduate capability increases, as accreditation frameworks sharpen their focus on demonstrable outcomes, and as students themselves begin to question what their qualification actually represents, the reputational exposure of that position grows.

The good news is that the design tools exist. The research base is strong. The regulatory direction—from TEQSA and increasingly from international equivalents—is aligned with where assessment design needs to go. What is required now is not more policy. It is the institutional will to treat assessment redesign as the core response to this moment, rather than the thing that happens after everything else has been tried.

AI has not created a new problem. It has made an existing design problem impossible to ignore. That is, in the end, a clarifying gift—if institutions are willing to use it.

-

Brigitte Elliott is Co-CEO of Cadmus, an end-to-end assessment platform built for higher education. She writes about the structural design of assessment systems, the conditions that produce learning at scale, and the relationship between institutional architecture and educational quality.

Category

AI

Leadership

More News

Load more
Group work is not the problem. Poor design is.

Assessment Design

Student Success

Group work is not the problem. Poor design is.

This article explores how challenges in group work are often linked to assessment design, rather than student behaviour. It highlights how clearer roles, structured milestones, and greater visibility of contributions can improve fairness and outcomes, while better supporting both students and educators throughout the process.

Cadmus

2026-05-12

How to reduce marking time without compromising feedback quality

Teaching & Learning

Feedback & Marking

How to reduce marking time without compromising feedback quality

Marking often feels constrained by a trade-off between speed and feedback quality. This article examines how that pressure is driven by the design of the marking process, not the act of assessment itself. It outlines how reducing repetition, improving consistency, and keeping marking in context allows educators to maintain high-quality feedback while reducing workload.

Cadmus

2026-05-05

Using data to identify at-risk students early

Student Success

Using data to identify at-risk students early

A practical piece for educators on identifying at-risk students earlier by looking beyond final submissions. It shows how process-level signals, such as engagement patterns, drafting behaviour, and time-on-task, provide a clearer view of student progress, and how to act on those signals through early check-ins and more targeted support.

Cadmus

2026-04-20