How AllScience Works
What happens when you search. What happens when you write. What happens to your work. Plus a technical deep-dive at the bottom for readers who want one.
When You Run A Search
You type your research question into the search bar and press Enter. Here is what actually happens in the next two seconds.
Your query gets sent to seventeen academic databases at the same time — PubMed, Semantic Scholar, OpenAlex, CORE, arXiv, Europe PMC, CrossRef, bioRxiv, DOAJ, IEEE Xplore, Springer Nature, and six more. Every one of those databases runs your search independently and returns whatever it has.
When the results come back, we do three things to them before you see them:
- Deduplicate. The same paper often lives in three or four different databases under slightly different metadata. We match them up (using the DOI when we can, and the title plus first author when the DOI is missing) so you do not see the same paper twice.
- Rank. Results are ordered by how relevant they are to your query, how recent they are, how often the paper has been cited, and how many of the seventeen databases indexed it.
- Tag. Every result shows a badge for each database it came from, so you can see at a glance which sources had it. If you only care about PubMed results, you can filter to just those.
This whole process typically finishes in one or two seconds. If one or two databases are slow or unavailable, the search does not wait for them — you get results from the other fifteen with a note telling you which sources did not respond. The page never returns empty unless something has actually gone wrong on your end.
Why seventeen instead of one? Every database has gaps. PubMed is unmatched for biomedicine but misses physics and computer science. arXiv has the preprints you need but no peer-reviewed metadata. Semantic Scholar has the best citation graph but thin European coverage. Running all of them at once is the only way to find papers that only exist in one of them — and that is where the papers you are looking for usually live.
When You Write With The AI
Before you ever ask AllScience for a writing suggestion, you build up a source library by saving papers from search results, importing from Zotero or Mendeley, or adding sources by DOI. The AI cannot help you write until you have done this — and that is the whole point.
When you ask for a suggestion, here is what happens:
- We find the relevant sources. The system reads your draft around the cursor and figures out which papers in your library are most relevant to the sentence you are trying to write. It pulls the specific paragraphs from those papers, not just the titles.
- The AI writes from those paragraphs. The writing model is specifically trained to paraphrase from the sources it was given — not to fill in from what it remembers from training. If the sources you saved do not cover the claim you are trying to make, the AI returns an empty suggestion instead of inventing something.
- Every sentence gets a link back. When a suggestion lands in your editor, every sentence in it has a clickable marker that opens the exact passage in the exact paper it came from. You click, you read the source, you decide to keep the suggestion or rewrite it.
The short version: AllScience's writing AI can only work with papers you have put in front of it. It cannot invent a citation because there is no mechanism in the pipeline for it to invent a citation. This is a design decision we made at the start, not a safety net we added at the end. Our accuracy page walks through exactly what this prevents and — just as important — what it still does not catch.
Where Your Work Lives
When you save a paper, draft a manuscript, or upload an existing document, here is what happens to it:
- It stays on AllScience servers. Your drafts, your source library, your citations, and your search history live on infrastructure we rent directly. They are never sent to OpenAI, Google, Anthropic, or any other AI company for processing. We do not have a partnership, a data-sharing agreement, or a "we use their API on the backend" arrangement with any third-party AI provider.
- It is not used to train anything. We do not train models on your drafts. We do not fine-tune our writing AI on the papers you import. Your work is your work.
- You can export it any time. Papers export to PDF, Word, LaTeX, Markdown, EPUB, or HTML. Citation libraries export to BibTeX or RIS. If you decide AllScience is not for you, your work moves with you. There is no lock-in.
- We cannot read it in marketing meetings. This matters more than it sounds. At most research AI tools, your drafts are visible to whichever company provides the underlying model — which means they are potentially visible to the people making pricing decisions about you. That is not possible here because the underlying model is one we trained and operate ourselves.
Why this matters to you: The tool you depend on does not break because a third party changes their API. Your unpublished results do not leave your workspace. Your pricing does not depend on how much an external AI provider charges us per query, because no external AI provider is in the loop. These are not marketing claims — they are decisions we made at the start that cost us convenience and bought us your trust.
Why The Grammar Checker Is Not Grammarly
Grammar checkers built for business emails and blog posts were not made for research papers. They flag passive voice as wrong, even though passive voice is the correct form in a methods section. They suggest "simpler" wording that strips the precision of technical terms. They do not know that a discussion section should hedge and a results section should not. Using one on your manuscript is like using spell-check that only knows common English — it nags you about things that are not wrong and misses things that are.
AllScience's grammar checker was built specifically for academic and nonfiction prose from day one. It has 430+ rules, and the rules are organized around what a reviewer actually notices:
- Subject-verb agreement drift in the long, clause-heavy sentences research papers tend to produce
- Tense consistency within sections (past in methods, present in discussion)
- Parallel structure in lists and comparisons where academic prose loves to wander
- Hedging language that is too strong for your evidence or too weak for your claim
- Passive-voice patterns that are fine in methods but a red flag in the abstract
- Cliché detection across more than a thousand academic and nonfiction phrases
The point is not that our checker is smarter. The point is that you can finally ignore your grammar tool — it will only speak up when it actually has something you need to hear.
Technical Details (For The Curious)
Everything above is what matters to your day. The rest of this page is what runs underneath it, for readers who want the technical answer.
The AI models we trained and run ourselves
| Model | What it does for you |
|---|---|
| Qwen3-8B, fine-tuned in-house | The writing AI that paraphrases from your source library |
| Sentence-Transformers | The similarity matching that figures out which of your saved papers are relevant to the sentence you're writing |
| FLUX.1 Dev + our LoRAs | Not used in the research workflow. Used by our sister brands BellerCreatives for cover art and WonderPress for illustration. |
| Kokoro TTS | Audio narration for preprints and any content that benefits from being read aloud (accessibility). |
Every model above was trained or fine-tuned by us on hardware we rent at a flat monthly rate. There are no per-query API calls to external AI providers anywhere in the user-facing path.
The infrastructure the platform runs on
| Layer | Component |
|---|---|
| Web server | OCI Always Free ARM (4 OCPU, 24 GB) |
| API framework | FastAPI + SQLAlchemy, 1,000+ endpoints across 134 routers |
| Database | PostgreSQL, persisted in a Docker volume, daily backup |
| Reverse proxy | nginx with Let's Encrypt SSL |
| Frontend | Vanilla JavaScript, 85 HTML pages, no build step |
| GPU for inference + training | Vast.ai RTX 5090 (32 GB VRAM), persistent rental |
| Zoho Mail (noreply@allscience.net) | |
| Payments | Stripe live mode |
The user-facing stack runs on a single OCI ARM server with approximately $3 per month in marginal cost. Training and inference run on a persistent GPU at roughly $250 per month flat. There are no pay-per-query dependencies, no surprise scaling costs, and no hidden AWS bills.
What we use Claude Code for (and do not)
We use Anthropic's Claude Code as an AI pair-programmer during active development — that is, to help write and debug the platform itself. No user query, draft, source library, or citation is ever sent to Anthropic's API in production. If you want to verify that, the api/routers/ and api/services/ directories in our codebase contain zero imports of any third-party AI SDK in the user-facing path. User-facing AI runs entirely on the models listed above.