Ingestion & Retrieval
Feed documents, transcripts, and text into MindGraph. Content is automatically chunked, embedded, and processed through a multi-layer extraction pipeline that builds structured knowledge graphs from unstructured text.
Three ingestion modes cover different use cases: single chunks for real-time processing, documents for batch ingestion, and sessions for conversation transcripts. One retrieval endpoint combines semantic search with graph traversal to return rich context.
How It Works
The ingestion pipeline processes content through these stages:
- Chunking — Long content is split into overlapping chunks (~2000 tokens each) at paragraph/sentence boundaries.
- Embedding — Each chunk is embedded for semantic search via the configured embedding model.
- Extraction — An LLM analyzes each chunk across up to 6 cognitive layers, extracting typed nodes (entities, claims, goals, etc.) and edges.
- Deduplication — Extracted entities are resolved against existing graph nodes using alias matching and fuzzy search to prevent duplicates.
- Provenance — Every extracted node links back to its source chunk via
ExtractedFromedges, preserving the evidence chain.
Extraction Layers
Each layer extracts different types of knowledge. You can select which layers to run per request via the layers parameter:
| Layer | Extracts | Default For |
|---|---|---|
| Reality | Person, Organization, Nation, Event, Place, Concept, Entity, Observation | All modes |
| Epistemic | Claim, Evidence, Hypothesis, Assumption, Pattern, Analogy, Question | All modes |
| Intent | Goal, Decision, Constraint, Project | Session |
| Action | Affordance, RiskAssessment | Session |
| Memory | Preference, Summary | Chunk, Session |
| Agent | Task, Plan | On request only |
layers parameter. Documents also support a content_type field that adjusts defaults: meeting_notes and report add Intent + Action; journal adds Intent + Memory.Single Chunk Ingestion
Process a single text chunk synchronously. Best for real-time ingestion of short content (<8000 chars). Returns extraction results immediately.
curl -X POST https://api.mindgraph.cloud/ingest/chunk \
-H "Authorization: Bearer mg_live_..." \
-H "Content-Type: application/json" \
-d '{
"content": "Einstein developed the theory of general relativity in 1915...",
"layers": ["reality", "epistemic"]
}'
Response includes chunk_uid, counts of nodes created/deduplicated/edges, the list of extracted node UIDs, and any extraction errors.
Document Ingestion
Process a full document asynchronously. The content is split into overlapping chunks and processed in the background. Returns a job ID for progress tracking.
# Start ingestion
curl -X POST https://api.mindgraph.cloud/ingest/document \
-H "Authorization: Bearer mg_live_..." \
-H "Content-Type: application/json" \
-d '{
"content": "Full article text...",
"title": "General Relativity Overview",
"document_type": "article",
"content_type": "article",
"source_uri": "https://example.com/article"
}'
# Poll job status
curl https://api.mindgraph.cloud/jobs/JOB_ID \
-H "Authorization: Bearer mg_live_..."
Session/Transcript Ingestion
Process conversation transcripts asynchronously. Runs Reality + Epistemic + Intent + Action + Memory extraction by default (5 layers) since conversations naturally span those cognitive layers.
curl -X POST https://api.mindgraph.cloud/ingest/session \
-H "Authorization: Bearer mg_live_..." \
-H "Content-Type: application/json" \
-d '{
"content": "User: What should we focus on?\nAssistant: Based on the data...",
"session_uid": "ses_abc123"
}'
Context Retrieval
Retrieve relevant context using semantic search over your knowledge graph. Returns wiki articles (synthesized summaries), graph nodes with source document provenance, and edges connecting them.
curl -X POST https://api.mindgraph.cloud/retrieve/context \
-H "Authorization: Bearer mg_live_..." \
-H "Content-Type: application/json" \
-d '{
"query": "What did Einstein discover about gravity?",
"node_limit": 10,
"article_limit": 3
}'
source_documents field showing which documents it was extracted from. Set chunk_limit > 0to also include raw source text chunks in the response.Job Tracking
Document and session ingestion return a job_id for tracking progress. Poll the job endpoint to monitor status:
curl https://api.mindgraph.cloud/jobs/JOB_ID \
-H "Authorization: Bearer mg_live_..."
Job statuses: pending → processing → completed, failed, or cancelled. The progress object tracks total_chunks, processed_chunks, nodes_created, and edges_created.
Ingestion Management
Additional endpoints for managing ingestion jobs and documents:
GET /jobs— List all ingestion jobs with their current status.POST /jobs/{id}/cancel— Cancel a pending or running job.POST /ingest/resume/{doc_uid}— Resume a stuck document ingestion (reprocesses incomplete chunks).POST /ingest/cleanup— Clean up orphaned chunks that have no parent document.DELETE /ingest/document/{uid}— Delete a document and cascade-remove all its chunks and extracted nodes.POST /ingest/embed-all— Generate embeddings for all nodes that don't have one yet.
Dashboard Upload
You can also ingest content directly from the Dashboard Ingest page. Upload text files, paste content, select extraction layers, and monitor job progress in real time.