Blog Post – What AI Can (and Can’t) Do in Journal Production Today

AI in publishing: the gap between the pitch and the reality

Spend enough time at publishing conferences and you’ll hear two contradictory things about AI. The first is that it’s going to automate everything — production costs will collapse, timelines will shrink to days, and editorial teams will be half the size they are now. The second is that it can’t be trusted with anything important — hallucinations, quality inconsistencies, and legal grey areas make it too risky to deploy at scale.

Neither of these is accurate, and acting on either will lead you astray. The publishers finding real value from AI right now are neither all-in evangelists nor cautious bystanders. They’re teams that have made a careful, task-by-task assessment of where AI performs reliably, where it needs supervision, and where it shouldn’t be in the loop at all.

That kind of assessment is harder to do when the conversation around AI is dominated by vendor promises on one side and reflexive skepticism on the other. So let’s try to do it clearly.

Production teams adopting AI broadly without distinguishing high- from low-reliability tasks
Quality incidents from AI-generated content errors that weren’t caught before publication
Missed efficiency gains from teams avoiding AI entirely in areas where it performs well
No clear internal framework for deciding which tasks to automate, assist, or keep fully manual
Vendor claims that overstate AI capability, leaving teams disappointed and distrustful after deployment

60%

Of journal production time is spent on tasks AI tools can already assist with reliably

$8B

Projected value of AI in academic publishing by 2030, driven by production efficiency

31%

Of publishers report quality issues from AI deployment without human review gates

The honest answer: AI is excellent at the right tasks and unreliable at the wrong ones

The most useful frame for evaluating AI in production isn’t capability — it’s reliability at scale. A tool that gets something right 90% of the time sounds impressive until you’re running 500 articles a year and that 10% failure rate translates into 50 errors reaching readers. The question isn’t whether AI can do a task. It’s whether it does it consistently enough, under editorial volume, that it can be trusted without a human checking every output.

“The publishers winning with AI aren’t the ones who’ve automated the most. They’re the ones who’ve automated the right things — and kept humans exactly where they still need to be.”
— The pattern across successful AI deployments in scholarly publishing

By that standard, AI today performs very differently depending on the task. Structured, rule-based production tasks — formatting, metadata extraction, reference checking, consistency flagging — are where AI earns its keep. These are high-volume, low-ambiguity operations where the cost of a miss is low and the efficiency gain is significant. Unstructured judgment tasks — deciding whether a figure caption accurately represents the data it accompanies, assessing the scholarly significance of a claim, evaluating whether a disclosure is complete — are where AI still falls short of the reliability threshold that publishing requires.

Why the distinction matters for your production team

Getting this wrong in either direction is costly. Over-trusting AI on judgment tasks creates quality and credibility risk. Under-trusting it on mechanical tasks means your team is spending hours on work that doesn’t require their expertise — burning editorial capacity on formatting fixes instead of editorial decisions. The teams that have found the right balance didn’t do it by deploying a single AI platform and hoping for the best. They mapped their production pipeline task by task and made deliberate choices about where automation belongs.

A clear-eyed map of where AI earns its place — and where it doesn’t

Rather than generic capability claims, here’s how AI actually performs across the core tasks in a journal production workflow — based on the patterns emerging from real deployments in scholarly publishing today.

DrPaper’s AI layer is built around exactly this distinction. Every AI-assisted step in the platform is designed with appropriate human review gates — so you get the efficiency gains where they’re reliable, without the quality risk where they’re not.

AI’s reliable zone vs. where human judgment stays essential

AI performs reliably

Metadata extraction and structuring from manuscript text
Reference format checking against citation style guides
Consistency flagging — terminology, abbreviations, heading levels
Plagiarism and duplicate submission screening
Initial manuscript compliance checks against journal scope and formatting requirements
Keyword suggestion based on subject classification taxonomies
Figure and table numbering verification
Language quality triage — flagging manuscripts that need copyediting attention

Human judgment stays essential

Evaluating scholarly significance or novelty of a contribution
Assessing whether a figure accurately represents its underlying data
Judging completeness and accuracy of author disclosure statements
Copyediting for meaning, nuance, and field-specific conventions
Reviewer selection based on expertise, relationships, and conflicts of interest
Final accept/reject decisions and editorial correspondence
Handling ethical concerns — data integrity, authorship disputes, misconduct
Interpreting ambiguous or novel content that falls outside training patterns

How to build AI into production without losing quality control

Audit your pipeline task by task

Map every recurring production task and assess it against two criteria: volume (is it high enough that automation would save meaningful time?) and ambiguity (does it require judgment, or is it rule-based?). High-volume, low-ambiguity tasks are your AI candidates.

Deploy AI with defined review gates

Every AI-assisted output should have a defined review step before it affects a published record. The gate can be lightweight — a scan, not a recheck — but it should exist. AI that operates without any human review point is where quality incidents happen.

Measure error rates, not just throughput

The obvious AI metric is speed — how many tasks completed per hour. But the metric that matters for publishing is error rate: how often does AI output require correction before it’s acceptable? Track both, and set thresholds that trigger a human review escalation.

Expand AI scope as reliability is demonstrated

Start with the tasks where AI reliability is already well-established — metadata, formatting, compliance checks. As you build confidence in your specific workflows and content types, you can extend AI assistance to adjacent tasks where it shows consistent performance in your context.

What the right AI deployment actually delivers

Significant reduction in production hours spent on mechanical, rule-based tasks
Faster manuscript intake — compliance issues caught before they enter the editorial queue
More consistent metadata quality across the entire article catalogue
Editors and production staff freed to focus on tasks that genuinely require their expertise
A clear, auditable record of which steps were AI-assisted and which were human-reviewed

Frequently asked questions about AI in journal production

Can AI replace copyeditors in academic publishing?

Not reliably — not yet. AI tools can flag consistency issues, catch common grammatical errors, and identify manuscripts that need attention, which makes copyeditors faster and more consistent. But field-specific conventions, nuanced meaning, and the kind of judgment required to edit for clarity without distorting scientific content still require human expertise. The better framing is AI-assisted copyediting, not AI replacement.

What is AI metadata tagging in academic publishing?

AI metadata tagging refers to automated extraction and structuring of bibliographic data — author affiliations, keywords, subject classifications, funding statements, and reference lists — directly from manuscript text. It’s one of the most reliable and high-value AI applications in journal production, reducing manual data entry and improving the discoverability of published content.

How do publishers maintain quality when using AI in production?

The key is designing human review gates into the workflow rather than treating AI output as final. Publishers using AI successfully in production have defined checkpoints — typically lightweight scans rather than full re-reviews — where a human confirms AI output before it affects the published record. They also track error rates per task type and escalate to fuller human review when rates exceed acceptable thresholds.

Is DrPaper’s AI fully automated, or does it keep humans in the loop?

DrPaper’s AI layer is designed around human-in-the-loop principles. AI handles the high-volume, rule-based steps — compliance checks, metadata structuring, consistency flagging — while editorial decisions, final approvals, and any judgment-dependent tasks remain with your team. Every AI-assisted action is logged with a clear audit trail, so you always know exactly what was automated and what was human-reviewed.

AI that knows its place in your workflow.

DrPaper brings AI assistance to the production tasks where it’s reliable — with human review exactly where it still belongs.

Request early access No commitment required · Setup in days, not months