Frontier reasoners
Long-context reasoning across multi-document evidence bundles, mechanism plausibility, and contradiction synthesis. We orchestrate them through a router policy with full decision logs — we do not train models at this scale.
Evidence control plane for scientific work
SaaS Syndicate Labs is an independent AI systems lab. We build a stack of custom scientific models — domain fine-tunes, small task LMs, embeddings, rerankers, graders, and routing policies — and the open-source harness that orchestrates them with provenance, evaluations, and human review. Every output is addressable by run id and source span.
What breaks
It fails because evidence, protocols, models, datasets, and decisions do not stay connected. Six bottlenecks shape every model we train and every product we build.
Citations shift across decks, reviews, and packages until no one can answer which source said what.
Methods sections compress steps. Lab execution diverges. The delta never makes it back.
Run-to-run drift in instruments and reagents goes unrecorded next to model-derived interpretations.
Frontier models read papers well and invent references confidently. Chat UX hides which is which.
Internal lab notes, private PDFs, and public literature sit in different stores with different access rules.
Scientific agents are evaluated on chatbot leaderboards. No standard for citation fidelity or dose extraction.
How we approach it
Frontier reasoning will keep getting better. That is not enough. Scientific work also needs custom models — extractors, classifiers, embedders, rerankers, graders — adapted to biomedical and chemical vocabulary, and an orchestration harness that produces an audit trail any reviewer can replay.
The model zoo and the harness are the two compounding assets. Both ship open-source where possible. Both versioned independently. Both inspectable end-to-end.
Domain fine-tunes, small task LMs, embedding and reranker models, grader models, routing policies — built and adapted for scientific work, not borrowed from chatbot benchmarks.
Public literature, private corpora, run artifacts, and lab notes — indexed independently with hybrid lexical, vector, and reranker stages.
Frontier reasoners for hard inference, our domain fine-tunes for extraction and classification, our evaluator models for grading other models. One policy, one decision log.
Long-horizon tool use for protocol checks, claim extraction, contradiction detection, and structured graph writes.
Discrete gates with binary states — pass, review, contradiction, insufficient evidence — applied before any claim leaves the run.
Every claim carries (claim_id, run_id) with source spans, model calls, tool traces, eval verdicts, and reviewer state. Human review is part of the loop, not a footnote.
Model layer
We do not ship a single fine-tune. We ship a stack: frontier reasoners we route, domain fine-tunes we adapt from open weights, small task LMs we train for the hot path, embedding and reranker models tuned to scientific phrasing, grader models that score other models, and routing policies that pick which model handles which step.
Long-context reasoning across multi-document evidence bundles, mechanism plausibility, and contradiction synthesis. We orchestrate them through a router policy with full decision logs — we do not train models at this scale.
Open-weight base models adapted to scientific tasks where general models drop precision, miss biomedical entities, or fail on dose / mechanism / adverse-event reasoning. Trained with LoRA, QLoRA, or full SFT on curated golden sets.
Narrow, fast models for extraction and classification on the hot path of every run. They handle the high-volume structured work where frontier reasoners would be slow and expensive.
Domain-adapted embeddings and rerankers tuned to biomedical and chemistry vocabulary. Public embeddings under-recall on technical synonyms, abbreviations, and quantitative phrasing.
Model-as-judge with explicit rubrics. Graders are versioned independently of the systems they score, so eval changes are auditable. These run inside every Crosswalk benchmark and every Cartograph contradiction check.
Small policy networks and rules that decide which model handles which step under cost, latency, sensitivity, and quality budgets. Routing decisions are part of the audit trail, not hidden in a product layer.
Curated datasets → supervised fine-tuning, LoRA, QLoRA, or full SFT depending on the task → quantization-aware compression → eval gate through Crosswalk → versioned release with model card. The same pipeline that produces our claim linker also produces our adverse-event classifier and our biomedical embeddings. The product wedge is the harness; the platform asset is the model zoo this pipeline builds.
System
Ingestion, retrieval, model routing across our custom models and the frontier reasoners they sit alongside, tool orchestration, eval gates, a provenance graph, human review, and a typed export surface. Every run emits a manifest any consumer can replay.
What we are building
Four user-facing modules sit on top of one model foundation. Each module is a real system with named users, declared maturity, and a durable artifact it writes back into the claim graph.
Scientific claim graph with provenance edges, confidence flags, contradictions, and review queues.
Protocol reproducibility harness — checklists, parameter maps, ambiguity flags, and execution deltas.
Open biomedical eval foundry — citation fidelity, dose extraction, mechanism reasoning, adverse-event triage, synthesis checks.
Private research agent runtime — tool-using agents over local corpora with auditable traces.
05 · Foundation
The custom-model foundry under everything else — adaptation pipelines, golden sets, and benchmark gates that produce the small models the rest of the stack runs on.
Where it goes
Horizons describe what we are building, what comes next, and what the platform becomes — not quarterly promises.
Operating posture
These are not slogans. They are the constraints every product, every fine-tune, every eval, and every deployment is checked against.
Company
SaaS Syndicate Labs is an independent AI systems lab focused on the evidence layer for scientific work — open-source core, custom domain models, and private deployment patterns for teams operating under GxP, GLP, or IRB constraints. Research and product, not consulting.