Categories
Education

AI Breaks The Psychological Contract of Essay Writing

Another invitation arrived this weekend (compliments of the bots feeding my social media feed) offering me another packaged AI solution to mark all my student essays.

Here’s the title:

The #1 AI Essay Grader for Teachers

And here’s the hook:

Reduce grading time by 80%. Grade your entire class’s essays in 2 minutes or less and deliver high-quality, specific and actionable feedback to your students.

It’s seductive. What school leader would not be interested in giving teachers back some time and improving the quality of student feedback? It’s an absolute win-win.

However, my position on this is relatively simple. Writing needs to be read, and as AI can’t read, it’s a no-deal for me.

That’s not to say I can’t understand all the reasons why people are not tempted to outsource marking (of essays) to a bot, but I think we must find other ways to paper over the cracks of teacher workload, and I would rather question whether to work was worth setting or doing in the first place.

My contention is that when a teacher assigns a piece of writing, an informal agreement is struck – a psychological contract – between the teacher and the student. The student agrees to sit down, think, try, and give shape to their ideas. The teacher, in turn, agrees to read that work with care, to notice, and to respond. It’s what gives writing purpose: not just the act of expressing, but the knowledge that someone will receive it, take the time to read it.

So what happens when we break that contract — when the reader is no longer a person, but a machine?

In education, we like to draw moral boundaries. One that’s gaining traction is this:

– It’s unacceptable for a computer to write it, but

– It’s acceptable for a computer to mark it

One is seen as helpful, and the other as harmful. One saves teacher time, and the other robs students of their voice.

When we let a computer assess student writing, we’re saying the machine can recognise quality, that it knows what clarity, a strong argument, and what coherence sounds like. And we’re trusting it to judge insight, nuance, and tone. In short, we’re trusting it to know what good thinking looks like on the page.

And if we believe that a computer can make that judgment, then on what principled basis can we deny a student from using the very same technology to generate the work in the first place?

I’m not trying to make a slippery slope argument. I’m trying to hold up a mirror, as I don’t think we are staring at the danger of new technology. Rather, I think we are looking at a contradictory approach to assessment that could inadvertently become embedded in our education system.

Because if a machine can mark thought, why shouldn’t it be allowed to generate it?

John Warner1, author of Why They Can’t Write, has a lot to say about this topic:

Writing is thinking. It’s not the recording of thinking, it’s not the result of thinking. It is thinking.”

This idea (that writing is a thinking process and not a performance) should be at the heart of how we design and assess student work (and I’ll return to this point later). When students write, they are not producing pre-packaged knowledge (as an AI does); they’re building understanding. Moreover, they are searching, revising, and constructing meaning as they go. Writing is messy, iterative, and often deeply personal and difficult.

So when we reduce writing to a product (a fixed deliverable to be judged) we lose all of that. We start treating it as something static that can be assessed in seconds. And I feel that this is precisely what too much edutech is trying to do just now.

I don’t doubt the new AI marking tools’ statistical reliability, nor do I dismiss their utility in large-scale moderation. But I can’t help but wonder what it means when we don’t even look at a piece of significant writing that we have asked our students to do for us.

What message does that send to the student? And what does it say about the nature of the work we’re asking them to do in the first place?

If students know their writing will be judged by a bot (or even by a human system that mimics the logic of a machine), why should they care about it? Why invest emotionally or intellectually in a task designed to be judged impersonally? I know I wouldn’t. Ask your students what they think. I have, and they are not OK with it.

Writing, at its best, is an act of communication. But communication requires a listener. If the “reader” is a bot (however clever), then the task becomes transactional and we can end up performing for the machine. Worse, we might end up trying to reverse-engineer our writing to please the machine. And then slowly, the deeper purpose of writing, the struggle to understand, the courage to express, will begin to fade away.

I’m not trying to reject automation. I’ve seen how AI tools can help teachers identify patterns, flag surface errors, and provide feedback. Used wisely, they can free up time for the more meaningful parts of teaching. But they must not replace those parts. And they certainly must not define them.

And yet, this is where we need to be honest: too much of the current excitement around AI in education is not about transformation. It’s about efficiency. Rather than reimagining what learning could be, we’re using new tools to make an old system run faster. AI marking doesn’t challenge the factory model of education; it perfects it. It allows us to process more students, more quickly, with less mess. But the factory was the problem to begin with.

If AI is simply making it easier to administer formulaic tasks, rank students2 against artificial standards, and reduce writing to a product, then it isn’t innovating education. It’s perpetuating it. And perhaps polishing it just enough to delay the change we really need.

If we allow AI to define what writing should be, then we will end up with writing that’s perfectly optimised and profoundly vacuous. Students will prompt it, tweak it, submit it, and move on. At no point will they need to think. And if that’s what we’re assessing, we’re not just outsourcing writing. We’re outsourcing education.

To resist that future (as John Warner might say), we may need to change how we think about student work. We need to question the purpose of asking students to do long-form writing. We need to ask questions that require judgment, not just knowledge. We need to return to the idea that writing is a conversation, not just a submission.

Just as importantly, we need to read student work in a way that honours that process. This means slowing down, engaging with the ideas, not just the formatting, and making space for uncertainty, disagreement, and voice.

At its heart, marking student writing is not merely an evaluative task – it’s a psychological contract. A promise. When we ask students to write, we are implicitly saying: I will read this. I will try to understand you. I will meet you in your thinking.

That contract is broken when we hand their work to machines or to systems designed for speed rather than understanding. It is broken when we value polish over struggle, ranking over meaning, and efficiency over care. And when that contract breaks often enough, students stop believing that their effort matters. Or worse, they stop thinking in the first place.

In the rush to innovate, we must not forget that assessment is more than a technical process. It is a moral gesture. Reading a student’s writing slowly, with intention, is to tell them:

You are worth my time.

So before we offload the work of reading, we must ask: What exactly are we trying to save? Because if it’s time (at the cost of the relationship), then we’re not innovating. We’re simply abandoning the very reason we teach.


Thank you, Gemma Dawson, for the provocation for this article, and for describing what I was thinking about as a “psychological contract” between the student and teacher. As always, I appreciate you.


  1. Also, I have enjoyed John Warner’s latest book, More Than More Words which explores writing in the age of AI. This quote from the book sums up much of his position:

    “If we treat the output of large language models as writing, as opposed to syntax generation, which is how I characterize it, then we’re allowing the meaning of writing and the experience of writing to be degraded for humans”.
    ↩︎
  2. I originally included a long section in this article on my thoughts on the use of comparative judgment, but then I thought it was a bit of a distraction and removed it. I’ve parked it here for now:

    Comparative Judgment

    Some of the new AI tools are also being used to support comparative judgment (CJ) to determine student grades. While the efficiency and effectiveness of CJ are becoming harder to question, I still remain deeply uncomfortable with the approach.
    I first encountered CJ when I was Head of the IB Diploma Programme in 2019, when the innovative IB assessment team was exploring the pros and cons of the approach and what a potential control trial might look like. It was not taken forward at the time, with the main reservation that the approach, despite the potential to be both cheaper and more accurate than the conventional use of human examiners, might break a psychological contract between the student and the IB.

    Philosophically, CJ isn’t built to understand meaning – it’s built to establish a ranking. Its great promise is that it’s fast and consistent: rather than assigning an absolute score, judges simply choose the “better” of two pieces of work. Rightly or wrongly, this seems to work very well in large-scale comparisons and seems to produce more reliable scores than traditional grading.

    But reliable scores are not the same as meaningful assessment.

    To my mind, CJ turns writing into a contest where the writing becomes a relative performance. A student’s essay isn’t good in its own right; it’s only good when compared to someone else’s. This may improve accuracy, but I feel that it diminishes individuality. The question is no longer, What are you trying to say? Rather, how well does your version of formulaic school-sounding writing stack up?

    The more polished and fluent the surface, the more likely a piece is to win. Conversely, the struggling thinker, the one grappling with complex ideas in messy prose, gets quietly pushed aside.
    And the teacher, meanwhile, is reduced to a judge, not a reader.
    It’s here that something vital is lost: the relational core of teaching. If you set the work, you should mark the work. That isn’t just tradition — it’s pedagogy. It’s ethics. When we assign writing, we’re not just giving tasks; we’re inviting students into a conversation. When we read their writing, we’re listening. We’re showing them that their words matter, that their ideas are worth responding to.

    CJ, for all its elegance, breaks that contract. It allows you to rank writing in 15 seconds, without knowing who wrote it, why they wrote it, what they meant, or how far they’ve come. It treats the writing as a product, not a process. An artefact, not an expression.

    Of course, things are not always so black and white. Much depends on the purpose of the writing or how the tools are used. If the purpose of asking a student to write an essay is to assess their grammar and punctuation, then the use of AI or CJ would make sense (although I would question why a teacher would ask students to write a full essay to do that). Or, if the use of AI or CJ is to support a moderation process, perhaps to identify anomalies in teacher marking that can be looked at again, then, again, that makes sense to me. ↩︎

Leave a comment