Ingestion & Retrieval

Feed documents, transcripts, and text into MindGraph. Content is automatically chunked, embedded, and processed through a multi-layer extraction pipeline that builds structured knowledge graphs from unstructured text.

Three ingestion modes cover different use cases: single chunks for real-time processing, documents for batch ingestion, and sessions for conversation transcripts. One retrieval endpoint combines semantic search with graph traversal to return rich context.

How It Works

The ingestion pipeline processes content through these stages:

Chunking — Long content is split into overlapping chunks (~2000 tokens each) at paragraph/sentence boundaries.
Embedding — Each chunk is embedded for semantic search via the configured embedding model.
Extraction — An LLM analyzes each chunk across up to 6 cognitive layers (plus an optional Operational Ontology layer if a schema is active), extracting typed nodes (entities, claims, goals, etc.) and edges.
Deduplication — Extracted entities are resolved against existing graph nodes using alias matching and fuzzy search to prevent duplicates.
Provenance — Every extracted node links back to its source chunk via ExtractedFrom edges, preserving the evidence chain.

Extraction Layers

Each layer extracts different types of knowledge. You can select which layers to run per request via the layers parameter:

Layer	Extracts	Default For
Reality	Person, Organization, Nation, Event, Place, Concept, Entity, Observation	All modes
Epistemic	Claim, Evidence, Hypothesis, Assumption, Pattern, Analogy, Question	All modes
Intent	Goal, Decision, Constraint, Project	Session
Action	Affordance, RiskAssessment	Session
Memory	Preference, Summary	Chunk, Session
Agent	Task, Plan	On request only
Ontology (L7)	Workspace-defined types (Client, Supplier, Patient, …)	On request, requires `ontology_schema_id`

Note:Documents default to Reality + Epistemic (2 passes) for cost efficiency. Single chunks default to Reality + Epistemic + Memory (3 passes). Session transcripts default to Reality + Epistemic + Intent + Action + Memory (5 passes). The Agent layer is reserved for agentic interaction and only used when explicitly requested via the layers parameter. The Ontology layer requires both "ontology" in layers and an ontology_schema_id pointing to an active workspace schema — see Operational Ontology. Documents also support a content_type field that adjusts defaults: meeting_notes and report add Intent + Action; journal adds Intent + Memory.

Single Chunk Ingestion

Process a single text chunk synchronously. Best for real-time ingestion of short content (<8000 chars). Returns extraction results immediately.

curl -X POST https://api.mindgraph.cloud/ingest/chunk \
  -H "Authorization: Bearer mg_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Einstein developed the theory of general relativity in 1915...",
    "layers": ["reality", "epistemic"]
  }'

Response includes chunk_uid, counts of nodes created/deduplicated/edges, the list of extracted node UIDs, and any extraction errors.

Document Ingestion

Process a full document asynchronously. The content is split into overlapping chunks and processed in the background. Returns a job ID for progress tracking.

# Start ingestion
curl -X POST https://api.mindgraph.cloud/ingest/document \
  -H "Authorization: Bearer mg_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Full article text...",
    "title": "General Relativity Overview",
    "document_type": "article",
    "content_type": "article",
    "source_uri": "https://example.com/article"
  }'

# Poll job status
curl https://api.mindgraph.cloud/jobs/JOB_ID \
  -H "Authorization: Bearer mg_live_..."

Session/Transcript Ingestion

Process conversation transcripts asynchronously. Runs Reality + Epistemic + Intent + Action + Memory extraction by default (5 layers) since conversations naturally span those cognitive layers.

curl -X POST https://api.mindgraph.cloud/ingest/session \
  -H "Authorization: Bearer mg_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "content": "User: What should we focus on?\nAssistant: Based on the data...",
    "session_uid": "ses_abc123"
  }'

Context Retrieval

Retrieve relevant context using semantic search over your knowledge graph. Returns wiki articles (synthesized summaries), graph nodes with source document provenance, and edges connecting them.

curl -X POST https://api.mindgraph.cloud/retrieve/context \
  -H "Authorization: Bearer mg_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What did Einstein discover about gravity?",
    "node_limit": 10,
    "article_limit": 3
  }'

Note:Each graph node includes a source_documents field showing which documents it was extracted from. Set chunk_limit > 0to also include raw source text chunks in the response.

Job Tracking

Document and session ingestion return a job_id for tracking progress. Poll the job endpoint to monitor status:

curl https://api.mindgraph.cloud/jobs/JOB_ID \
  -H "Authorization: Bearer mg_live_..."

Job statuses: pending → processing → completed, failed, or cancelled. The progress object tracks total_chunks, processed_chunks, nodes_created, and edges_created.

Ingestion Management

Additional endpoints for managing ingestion jobs and documents:

GET /jobs — List all ingestion jobs with their current status.
POST /jobs/{id}/cancel — Cancel a pending or running job.
POST /ingest/resume/{doc_uid} — Resume a stuck document ingestion (reprocesses incomplete chunks).
POST /ingest/cleanup — Clean up orphaned chunks that have no parent document.
DELETE /ingest/document/{uid} — Delete a document and cascade-remove all its chunks and extracted nodes.
POST /ingest/embed-all — Generate embeddings for all nodes that don't have one yet.

Dashboard Upload

You can also ingest content directly from the Dashboard Ingest page. Upload text files, paste content, select extraction layers, and monitor job progress in real time.