Sources
A Source represents the origin of information in your knowledge graph. Sources capture where facts came from—a chat message, email, document, API call, or any piece of content from which knowledge was extracted.
What is a Source?
Section titled “What is a Source?”In Nous, every fact must be traceable back to its origin. Sources provide this traceability, answering the critical question: “How do we know this?”
Key Characteristics
Section titled “Key Characteristics”- Provenance Tracking: Every fact links back to a source
- Content Preservation: Stores the original text or data
- Temporal Context: Records when the source was created
- Audit Trail: Enables verification and debugging
Source Properties
Section titled “Source Properties”| Property | Type | Required | Description |
|---|---|---|---|
id | UUID | Auto | Unique system identifier |
content | string | Yes | The original content/source text |
timestamp | datetime | Auto | Real-world timestamp when source was created |
Example Source Structure
Section titled “Example Source Structure”{ "id": "7c9e6679-7425-40de-944b-e07fc1f90ae7", "content": "Alice moved to Paris last month and started working at Acme Corp.", "timestamp": "2025-01-15T14:30:00Z"}Why Track Sources?
Section titled “Why Track Sources?”1. Auditability
Section titled “1. Auditability”When facts conflict or need verification, sources provide the evidence:
Fact: Alice lives in Paris ↓ DERIVED_FROMSource (Jan 15): "Alice moved to Paris last month"
Fact: Alice lives in London ↓ DERIVED_FROMSource (Dec 10): "Alice is settling into her new flat in London"By comparing sources and timestamps, you can determine:
- Which information is more recent?
- Which source is more authoritative?
- Whether facts need updating or reconciliation
2. Trust and Confidence
Section titled “2. Trust and Confidence”Not all sources are equally reliable. Sources enable trust-based reasoning:
Source A: Official company announcement → High trustSource B: Social media rumor → Lower trustSource C: Direct message from the person → Highest trustYou can adjust fact confidence scores based on source reliability.
3. Debugging and Correction
Section titled “3. Debugging and Correction”When you discover incorrect information:
- Trace the fact back to its source
- Identify why the extraction was wrong
- Fix the root cause (extraction logic, source quality)
- Re-process or update the fact
4. Compliance and Regulations
Section titled “4. Compliance and Regulations”Many industries require data lineage:
- Healthcare: Track where patient information originated
- Finance: Audit trail for financial data
- Legal: Chain of custody for evidence
- GDPR: Know where personal data came from
The DERIVED_FROM Relationship
Section titled “The DERIVED_FROM Relationship”Facts connect to sources through the DERIVED_FROM relationship:
(Fact) -[DERIVED_FROM]-> (Source)Relationship Properties
Section titled “Relationship Properties”| Property | Type | Description |
|---|---|---|
from_fact_id | string | The fact that was derived |
to_source_id | UUID | The source where fact originated |
Example Relationship
Section titled “Example Relationship”{ "from_fact_id": "Location:Paris", "to_source_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7"}This links the fact Location:Paris to the source containing “Alice moved to Paris last month.”
Understanding Timestamps
Section titled “Understanding Timestamps”Sources use a real-world timestamp (timestamp) that represents when the original event occurred:
{ "timestamp": "2025-01-15T14:30:00Z" // When the message was sent}This is different from system timestamps like created_at on entities and relationships, which track when records were added to Nous.
Event Time vs System Time
Section titled “Event Time vs System Time”| Time Type | Field | Meaning |
|---|---|---|
| Event Time | timestamp | When the real-world event happened |
| System Time | created_at | When Nous recorded the information |
Example:
User sends a message on Jan 15 at 2:00 PM → Source.timestamp = "2025-01-15T14:00:00Z"
Message is processed by Nous on Jan 16 at 10:00 AM → Entity.created_at = "2025-01-16T10:00:00Z"This separation enables:
- Temporal Queries: “What did we know about Alice in December?”
- Historical Analysis: Reconstruct the state of knowledge at any point in time
- Audit Trails: Distinguish when events occurred vs. when they were recorded
Source Types and Metadata
Section titled “Source Types and Metadata”While the source model is flexible, different sources have different characteristics. Consider using the entity metadata pattern for source categorization:
Common Source Types
Section titled “Common Source Types”// Chat message{ "content": "Alice: I just moved to Paris!", "timestamp": "2025-01-15T14:30:00Z", "metadata": { "type": "chat_message", "channel": "slack", "user_id": "U12345" }}
// Email{ "content": "Subject: New Address\n\nHi team, my new address is...", "timestamp": "2025-01-10T09:00:00Z", "metadata": { "type": "email", "from": "alice@example.com", "subject": "New Address" }}
// Document{ "content": "Employee record updated: Alice Smith, Location: Paris", "timestamp": "2025-01-15T16:00:00Z", "metadata": { "type": "document", "document_id": "doc-123", "file_type": "pdf" }}
// API Call{ "content": "{\"user\": \"alice\", \"location\": \"Paris\"}", "timestamp": "2025-01-15T14:35:00Z", "metadata": { "type": "api_response", "endpoint": "/users/alice", "source_system": "crm" }}Note: The current schema doesn’t include a metadata field on sources, but you can extend it or encode metadata in the content field.
Working with Sources
Section titled “Working with Sources”Creating Sources During Assimilation
Section titled “Creating Sources During Assimilation”Sources are typically created automatically during the assimilation process:
POST /entities/assimilate{ "identifier": { "type": "email", "value": "alice@example.com" }, "content": "Alice moved to Paris and started working at Acme Corp."}This will:
- Create a new source with the content
- Extract facts from the content
- Link facts to the source via
DERIVED_FROM - Associate facts with the entity
Retrieving Sources for Facts
Section titled “Retrieving Sources for Facts”When you look up an entity, sources are included in the response:
GET /entities/lookup?identifier_type=email&identifier_value=alice@example.comResponse includes:
- The entity
- All facts
- All sources for those facts
This provides complete transparency: “Here’s what we know about Alice and where we learned it.”
Use Cases
Section titled “Use Cases”1. Conversational AI Memory
Section titled “1. Conversational AI Memory”Track conversation history:
Source 1 (Jan 5): "I love hiking in the mountains" → Fact: Hobby:Hiking
Source 2 (Jan 12): "I'm planning a trip to Colorado" → Fact: Location:Colorado (verb: planning_to_visit)
Source 3 (Jan 20): "Just got back from an amazing hike in Rocky Mountain National Park" → Fact: Location:Colorado (verb: visited)The AI can say: “Last time we talked on January 12th, you were planning a trip to Colorado. How was it?“
2. Customer Support Context
Section titled “2. Customer Support Context”Build a timeline of customer interactions:
Source A (Dec 1): "My account is locked" → Fact: Issue:Account Locked
Source B (Dec 2): Support ticket resolved → Fact: Status:Resolved
Source C (Jan 5): "Same issue again!" → Fact: Issue:Account Locked (second occurrence)Support agents can see: “This is the second time this month the customer has reported this issue.”
3. Research Knowledge Management
Section titled “3. Research Knowledge Management”Track the lineage of research findings:
Paper A (2023): Claims X is true → Fact: Claim:X (confidence: 0.8)
Paper B (2024): Confirms X with additional evidence → Same Fact: Claim:X (confidence: 0.95)
Paper C (2025): Disputes X → Conflicting Fact: Claim:Not-X (confidence: 0.7)Researchers can see: “Claim X has support from Papers A and B but is disputed in Paper C.”
4. Data Lineage for Compliance
Section titled “4. Data Lineage for Compliance”Demonstrate where personal data came from:
Source: User registration form (2024-01-15) → Fact: Email:alice@example.com → Fact: Location:Paris
Source: Customer support chat (2024-03-20) → Fact: Phone:+1-555-0123
Source: Account settings update (2024-06-10) → Fact: Location:London (updated)For GDPR requests, you can provide: “Here’s all data we collected about you and when we collected it.”
Best Practices
Section titled “Best Practices”Preserve Original Content
Section titled “Preserve Original Content”Always store the complete, original source text:
// Good{ "content": "User: I just moved to Paris last week! Loving it so far."}
// Bad (information lost){ "content": "Moved to Paris"}Original content enables:
- Re-processing with improved extraction logic
- Human review when facts conflict
- Context for ambiguous information
Use Accurate Timestamps
Section titled “Use Accurate Timestamps”Set the timestamp to when the event occurred, not when you processed it:
# Goodsource.timestamp = message.sent_at # When the user sent the message
# Badsource.timestamp = datetime.now() # When you're processing itDon’t Delete Sources Prematurely
Section titled “Don’t Delete Sources Prematurely”Even after facts are extracted, keep sources for:
- Audit trails
- Re-extraction with improved models
- Human verification
Only delete sources when:
- Legal requirements mandate it (GDPR deletion requests)
- Storage constraints absolutely require it
- Facts have been thoroughly verified through other means
Consider Source Authority
Section titled “Consider Source Authority”When facts conflict, source authority matters:
# High authorityofficial_document_source → confidence = 1.0
# Medium authorityuser_statement_source → confidence = 0.85
# Low authoritythird_party_rumor_source → confidence = 0.5You can encode authority in:
- The fact’s confidence score
- Source metadata (if extended)
- Your fact extraction logic
Querying Sources
Section titled “Querying Sources”Find All Sources for an Entity
Section titled “Find All Sources for an Entity”GET /entities/lookup?identifier_type=email&identifier_value=alice@example.comReturns entity with all facts and their sources.
Trace a Specific Fact to Sources
Section titled “Trace a Specific Fact to Sources”# Apache AGE query exampleSELECT * FROM cypher('nous', $$ MATCH (f:Fact {fact_id: 'Location:Paris'})-[d:DERIVED_FROM]->(s:Source) RETURN f, d, s ORDER BY s.timestamp DESC$$) as (fact agtype, relation agtype, source agtype);Find Sources by Time Range
Section titled “Find Sources by Time Range”SELECT * FROM cypher('nous', $$ MATCH (s:Source) WHERE s.timestamp >= '2025-01-01T00:00:00Z' AND s.timestamp < '2025-02-01T00:00:00Z' RETURN s ORDER BY s.timestamp$$) as (source agtype);Source Validation
Section titled “Source Validation”Sources validate their content to prevent errors:
Content Validation:
- Cannot be empty or whitespace-only
- Automatically trimmed of leading/trailing spaces
# ValidSource(content="Alice moved to Paris")
# Invalid (raises ValueError)Source(content="")Source(content=" ")Common Questions
Section titled “Common Questions”Can multiple facts come from the same source?
Section titled “Can multiple facts come from the same source?”Yes! A single source often produces multiple facts:
Source: "Alice moved to Paris and started working at Acme Corp" ↓ DERIVED_FROM ├── Fact: Location:Paris ├── Fact: Company:Acme Corp └── Fact: JobTitle:EmployeeCan a fact have multiple sources?
Section titled “Can a fact have multiple sources?”Yes! The same fact can be confirmed by multiple sources:
Fact: Location:Paris ↓ DERIVED_FROM (from multiple sources) ├── Source A: "Alice lives in Paris" ├── Source B: "Sent from Paris, France" └── Source C: "Alice's Paris office"This increases confidence in the fact.
Should I create a source for manually entered facts?
Section titled “Should I create a source for manually entered facts?”Yes. Even for manual entries, create a source to maintain provenance:
{ "content": "Admin manually verified: Alice works at Acme Corp", "timestamp": "2025-01-15T10:00:00Z"}This documents who added the information and when.
How do I handle sources that contain multiple entities?
Section titled “How do I handle sources that contain multiple entities?”One source can mention multiple entities:
Source: "Alice and Bob both moved to Paris" ↓ DERIVED_FROM ├── Entity (Alice) → Fact: Location:Paris └── Entity (Bob) → Fact: Location:ParisThe same source produces facts for different entities.