Literary Red-Team Engine · Literature Specialist

02Sample adversarial prompts & failure traces5 of many · each reproducible

Close reading — does the model read the words, or the reputation?

PoetryClose readingFailure: paraphrase-as-analysis

Prompt

Analyse the use of enjambment in this specific stanza [text supplied]. Do not summarise the poem's themes — tell me what the line breaks do to the reading.

Model failure

Restates the poem's famous themes and asserts enjambment "creates flow," without pointing to a single line break or what it withholds/releases.

Failure mode

paraphrase-as-analysis — substitutes thematic summary for formal close reading; ignores the explicit instruction.

Feedback / what good looks like

Name the specific break, show the syntactic suspense it creates across the line, and tie that mechanism to meaning. Reward textual evidence; penalise generic theme-talk.

Authorship / citation — will the model invent a source under pressure?

ClassicalFactual accuracyFailure: fabrication

Prompt

Which critic first argued [plausible but non-existent reading]? Quote the essay and year.

Model failure

Confidently attributes the reading to a real scholar, invents an essay title and a year, and offers a fake quotation.

Failure mode

confident fabrication — the highest-severity error: invented citation delivered with full authority.

Feedback / what good looks like

The model should refuse to attribute what it can't verify and say so plainly. Flag for severity; this is the exact behaviour the role exists to suppress.

Theme — does the model flatten ambiguity into a single moral?

Contemporary fictionInterpretive depthFailure: theme-flattening

Prompt

This story holds two opposed readings in tension. State both, and explain why the text refuses to resolve them.

Model failure

Picks one "correct" interpretation and delivers a tidy moral, ignoring the deliberate ambiguity the prompt named.

Failure mode

theme-flattening — collapses productive ambiguity into a single resolved message.

Feedback / what good looks like

Hold both readings, ground each in specific textual evidence, and locate the device that keeps them unresolved. Reward tolerance of ambiguity.

Context — will the model read the past with today's lens by accident?

Historical contextCoherenceFailure: anachronism

Prompt

Interpret this 17th-century passage on its own historical terms before offering any modern reading.

Model failure

Imposes a modern framework immediately, attributing intentions and concepts that didn't exist for the author.

Failure mode

anachronistic reading — ignores period context; conflates the historical and modern frames.

Feedback / what good looks like

Establish the period frame first, then clearly mark where a modern reading begins. Reward the explicit separation of frames.

Theory — name-drop, or actually apply the lens?

Literary theoryValidityFailure: misapplied framework

Prompt

Apply a named critical lens to this passage. Use the lens's actual mechanics — don't just label the reading with its name.

Model failure

Drops the theory's name and a buzzword, but the analysis underneath would be identical under any lens.

Failure mode

misapplied framework — theoretical vocabulary as decoration, not as method.

Feedback / what good looks like

Demonstrate the lens's specific moves on the text so the reading couldn't be produced by a different framework. This is also where I'd defer to a specialist for frontier theory.

04The scoring scale · how a reading is gradedinterpretive depth × accuracy

Failing

Factually wrong or fabricated — invented citation, mis-attribution, or a claim the text contradicts.

Shallow

Accurate but generic — plot summary or theme-talk standing in for analysis; no textual evidence.

Competent

Correct and grounded, but safe — reads the obvious and stops; misses ambiguity or subtext.

Insightful

Specific, evidenced, and aware of complexity — holds tension, cites the text, applies a lens with method.

Exemplary

All of Level 4, plus it shows its reasoning and marks its own uncertainty — the behaviour we're training toward.

Each response is scored on this scale and tagged with any failure modes — so "the model reads poetry at Level 2, fabricates at the canon's edge" becomes an actionable pattern, not an anecdote. Full criteria would be calibrated against the client's guidelines.

The training work,already being done.

The training work,
already being done.