Accuracy — AllScience

The Claim

Every AI-generated suggestion in AllScience links back to a real, published paper in your source library. You can click through any claim, read the original source, and verify it before you accept the suggestion.

This is the minimum standard we hold ourselves to for AI-assisted writing in a research context. No hallucinated citations, by design. Not as a goal we are working toward — as an architectural constraint that is built into the writing pipeline.

The architectural constraint

Our writing AI does not generate text from its training data. It retrieves passages from papers you have saved to your source library, then paraphrases and cites them. If a claim appears in an AllScience-generated paragraph, it came from a specific source in your library — we can show you which one, which sentence, and which section.

How Source Tracing Works

The writing pipeline has four stages, and every stage logs a provenance chain we can replay on demand:

Retrieval. When you ask AllScience to write about a topic, the system queries your source library using sentence-transformer embeddings. The top-k most relevant passages are retrieved with their source IDs and section offsets.
Generation. Our fine-tuned Qwen3-8B produces text conditioned on those retrieved passages. The model is trained to paraphrase from the retrieved context, not to fill in from its pretraining corpus.
Attribution. Each sentence in the generated output is cross-referenced against the retrieval pool. The sentence-to-source mapping is stored alongside the text.
Rendering. In the editor, every sentence with an attribution renders with a clickable marker. Clicking the marker opens the original source at the passage the sentence was derived from.

If the retrieval pool does not contain enough relevant context, AllScience refuses to generate rather than hallucinating. You get an empty response and a message telling you to import more sources, not a fabricated paragraph.

What Source Tracing Prevents

These are failure modes that other AI writing tools exhibit and that AllScience's architecture prevents by design:

Failure mode	Other tools	AllScience
Fabricated citations (DOIs, journals, authors that do not exist)	Common	Prevented — retrieval pool contains only real papers
Plausible-sounding claims from training-data memorization	Routine	Prevented — generation is conditioned on retrieval, not on pretraining
"Sounds right but I can't find the source"	Constant	Prevented — every claim links to its specific source passage
Stale sources (citing retracted or superseded papers)	Common	Partially mitigated — retraction data from CrossRef is surfaced in search results, but the user's source library is ultimately their responsibility

What Source Tracing Does Not Yet Prevent

We document our own failure modes. This is a real one that we found in our own testing, and we think you should know about it before you rely on AllScience for work that matters.

The Krakatoa failure mode

In April 2026, we generated a chapter on the 1883 Krakatoa eruption that scored well on every quantitative quality measure but contained three subtle factual misattributions:

1. The chapter stated "The Year Without a Summer in 1816, the year after Krakatoa's eruption." The Year Without a Summer was caused by Tambora (1815), not Krakatoa (1883). The model cross-wired two volcanoes separated by 68 years.

2. The chapter stated "Krakatoa shifted Earth's axis by 3 inches." The 3-inch axis shift is associated with the 1960 Chilean earthquake, not Krakatoa.

3. The chapter quoted "Dutch sailor Willem van den Berg on the ship Sibilla." No such person or ship is attested in any historical record we could find. The model invented a plausible-sounding primary source.

Each individual claim was plausible in isolation. The numbers, the dates, the names all fell within believable ranges. The hallucination was in the attribution — connecting facts to the wrong events or fabricating people whole-cloth. Source tracing caught nothing because the claims were not supported by any real source to begin with; they were generated outside the retrieval pool by a model that had been asked to fill in missing context.

We treat this as a known unsolved problem, not a reason to hide the claim. Here is what we are doing about it:

Quality rubric v2. We have a 10-point quality rubric we grade every generated chapter against, with 9.0 as the publish threshold. The Krakatoa chapter scored well on the first-generation rubric, so the rubric has been upgraded to include a fact-attribution check.
Entity fact database. We are building a structured fact DB (dates, locations, named people, measurements) for every major topic in the catalog. Generated text is cross-checked against it before publication.
Section-level polish. The polish and copy-edit layer operates per section, not on whole chapters. This means a single misattribution does not contaminate adjacent content and is caught earlier in the pipeline.
Human editorial spot-check. Between the 9.0 and 9.5 score bands, we require a human pass on any claim that names a specific person, place, or measurement. This is the 9.0-to-9.5 gap we are still closing.

The full technical diagnosis lives in docs/TRAINING_DATA_ROADMAP.md in our public repository. We will update this page as the fact-attribution work progresses.

Our Pledge

We publish our failure modes. We do not hide them, minimize them, or wait until we have a fix before we disclose them. Every AI research platform should do this. Most do not.

If you use AllScience for work that matters, you deserve to know both what the system does well and where it fails. Source tracing works. Factual attribution is a harder problem, and we are still working on it. Both statements are true, and we would rather you hear them from us than find out the hard way.

If you find a failure mode that is not documented here, email jerry@allscience.net. I read every email.

Try AllScience (free) How the platform works