Skip to content

API Reference

Auto-generated from source docstrings.


Memory

lakehouse_memory.memory.Memory

Composition root that wires episodic, semantic, and working stores.

Each Memory instance holds a SQL client, one vector index per store, and a Scope that filters all reads and tags all writes to a specific user / session / agent combination.

The canonical way to create a Memory wired to real Databricks resources is Memory.from_databricks. After construction, call provision() once to create the underlying Unity Catalog tables and Vector Search indexes.

Example::

mem = Memory.from_databricks(
    catalog="my_catalog",
    schema_name="agent_memory",
    workspace_url="https://my-workspace.azuredatabricks.net",
    access_token="dapi...",
    http_path="/sql/1.0/warehouses/abc123",
    vector_search_endpoint="my_vs_endpoint",
)
mem.provision()
scoped = mem.with_scope(user_id="u1", session_id="s1")
scoped.episodic.write(event_type="chat_message", payload={}, text="Hello")

__init__(config, client, *, episodic_index, semantic_index, scope=None)

Initialise Memory with explicit collaborators.

Prefer Memory.from_databricks for production use. This constructor is useful in tests where you supply stub clients and no-op indexes.

Parameters:

Name Type Description Default
config MemoryConfig

Catalog/schema/embedding settings for this Memory instance.

required
client DatabricksClient

SQL client used by all three stores for DDL and DML.

required
episodic_index VectorIndex

Vector index used by the episodic store for similarity search. Pass a no-op VectorIndex to skip vector search for episodic events.

required
semantic_index VectorIndex

Vector index used by the semantic store for similarity search. Pass a no-op VectorIndex to skip vector search for facts.

required
scope Scope | None

Optional identity scope applied to every read and write. Defaults to an empty Scope() (no filtering).

None

as_langchain_chat_history(limit=100)

Return a LangChain BaseChatMessageHistory wired to the episodic store.

Requires the [langchain] optional extra::

pip install lakehouse-memory[langchain]

Parameters:

Name Type Description Default
limit int

Maximum number of recent chat messages to return when messages is accessed. Defaults to 100.

100

Returns:

Type Description
LakehouseChatHistory

A LakehouseChatHistory instance scoped to this Memory's scope.

as_langchain_retriever(k=5)

Return a LangChain BaseRetriever wired to the semantic store.

Requires the [langchain] optional extra::

pip install lakehouse-memory[langchain]

Parameters:

Name Type Description Default
k int

Number of semantically-similar facts to return per query. Defaults to 5.

5

Returns:

Type Description
LakehouseSemanticRetriever

A LakehouseSemanticRetriever instance scoped to this Memory's

LakehouseSemanticRetriever

scope.

from_databricks(*, catalog, schema_name, workspace_url, access_token, http_path, vector_search_endpoint, scope=None, embedding=None) classmethod

Build a Memory wired to real Databricks resources.

Constructs a SqlConnectorClient and two Delta Sync-backed DatabricksVectorIndex objects (one for episodic events, one for semantic facts) and stashes the VS credentials so that a later call to provision() can create the indexes without repeating them.

Does not provision — call mem.provision() after construction to idempotently create the Unity Catalog tables and Vector Search indexes.

Parameters:

Name Type Description Default
catalog str

Unity Catalog catalog name (e.g. "my_catalog").

required
schema_name str

Schema inside catalog where memory tables live (e.g. "agent_memory").

required
workspace_url str

Full Databricks workspace URL, including scheme (e.g. "https://my-workspace.azuredatabricks.net").

required
access_token str

Databricks personal-access token or service-principal secret used for both SQL Warehouse and Vector Search API calls.

required
http_path str

SQL Warehouse HTTP path (e.g. "/sql/1.0/warehouses/abc123").

required
vector_search_endpoint str

Name of the existing Databricks Vector Search endpoint to back both indexes.

required
scope Scope | None

Optional identity scope to pre-apply to every store. Defaults to an empty Scope() (no filtering).

None
embedding EmbeddingConfig | None

Optional embedding endpoint configuration. Defaults to EmbeddingConfig() (databricks-gte-large-en, 1024 dims).

None

Returns:

Type Description
Memory

A fully-wired Memory instance. Call provision() before

Memory

reading or writing to ensure the underlying tables and indexes exist.

provision(*, vector_search_endpoint=None, workspace_url=None, access_token=None)

Idempotently create the UC schema + tables and, optionally, the Vector Search indexes.

Always creates the Unity Catalog schema (if absent) and the three memory tables (episodic, semantic, working). When a Vector Search endpoint is available — either supplied here or stashed by from_databricks — also creates the two Delta Sync indexes.

Safe to call multiple times; existing tables and indexes are left untouched.

Parameters:

Name Type Description Default
vector_search_endpoint str | None

Name of the Databricks Vector Search endpoint to use when creating indexes. Falls back to the value stashed by from_databricks, if any. Pass None (and provide no stashed value) to skip index creation entirely.

None
workspace_url str | None

Workspace URL needed for Vector Search API calls. Falls back to the value stashed by from_databricks.

None
access_token str | None

Databricks PAT or service-principal secret for Vector Search API calls. Falls back to the value stashed by from_databricks.

None

Raises:

Type Description
ValueError

If vector_search_endpoint is resolved but workspace_url or access_token cannot be determined.

with_scope(*, user_id=None, session_id=None, agent_id=None)

Return a new Memory with scope fields merged from the given arguments.

Any field you pass overrides the corresponding field on the current scope; fields you omit (or pass as None) are inherited unchanged. The new instance shares the same SQL client and vector indexes as the original — no new connections are opened. Stashed VS credentials (workspace_url, access_token, endpoint) are forwarded so that provision() may still be called on the derived instance.

Parameters:

Name Type Description Default
user_id str | None

Override the user_id dimension of the scope.

None
session_id str | None

Override the session_id dimension of the scope.

None
agent_id str | None

Override the agent_id dimension of the scope.

None

Returns:

Type Description
Memory

A new Memory instance with the merged scope applied to all

Memory

three stores.


MemoryConfig

lakehouse_memory.config.MemoryConfig

Bases: BaseModel

Top-level configuration for a Memory instance.

Holds the Unity Catalog coordinates (catalog + schema) that determine where the three memory tables are stored, plus the embedding configuration used by Vector Search. Instances are frozen (immutable) after construction.

Attributes:

Name Type Description
catalog str

Unity Catalog catalog name. Must be non-empty.

schema_name str

Schema inside catalog where the episodic, semantic, and working tables reside. Must be non-empty.

embedding EmbeddingConfig

Embedding endpoint configuration used when creating and querying Vector Search indexes. Defaults to EmbeddingConfig().

fqn(table)

Return the fully-qualified Unity Catalog name for a table.

Parameters:

Name Type Description Default
table str

Unqualified table name (e.g. "episodic").

required

Returns:

Type Description
str

Three-part identifier <catalog>.<schema_name>.<table> suitable

str

for use in SQL statements and Vector Search index names.


EmbeddingConfig

lakehouse_memory.config.EmbeddingConfig

Bases: BaseModel

Configuration for the embedding model used by Databricks Vector Search.

Specifies which Foundation Model API endpoint produces the embeddings that back the episodic and semantic vector indexes, along with the expected output dimensionality.

Attributes:

Name Type Description
endpoint_name str

Databricks Foundation Model API endpoint name that generates embeddings. Must match the endpoint used when the Vector Search index was created. Defaults to "databricks-gte-large-en".

dimensions int

Dimensionality of the embedding vectors produced by endpoint_name. Must be positive. Defaults to 1024.


Scope

lakehouse_memory.scope.Scope dataclass

Identity scope that constrains memory reads and tags memory writes.

A Scope represents a specific combination of identity dimensions: user_id, session_id, and/or agent_id. Any subset of these may be set; unset fields act as wildcards — they are absent from SQL WHERE clauses and vector metadata filters, so they match all values.

Scope is the single source of truth for scope-related SQL and vector filter construction. Every store applies these filters automatically to every read operation and includes all set fields as columns on every write.

Instances are frozen (immutable). Use merge to derive a new Scope with some fields overridden.

Attributes:

Name Type Description
user_id str | None

Identifies the end-user whose memory is being accessed.

session_id str | None

Identifies the conversation session.

agent_id str | None

Identifies the agent (or agent variant) operating on memory.

merge(other)

Return a new Scope with other's set fields overriding self's.

Fields that are None on other are inherited from self unchanged. This allows incremental narrowing of scope without losing previously set dimensions.

Parameters:

Name Type Description Default
other Scope

A Scope whose non-None fields will override the corresponding fields on self.

required

Returns:

Type Description
Scope

A new frozen Scope instance with the merged field values.

to_metadata_filter()

Build a metadata filter dict for Databricks Vector Search queries.

Returns:

Type Description
dict[str, str]

A {field_name: value} dict containing only the fields that are

dict[str, str]

set on this scope. An empty dict is returned when no fields are

dict[str, str]

set (i.e., no filtering is applied).

to_where_clause()

Build a SQL WHERE-clause fragment and a bound-parameter map.

Clauses are AND-joined in stable alphabetical field order so that the generated SQL is deterministic across Python versions and runtimes.

Returns:

Type Description
str

A 2-tuple (clause, params) where clause is a non-empty SQL

dict[str, str]

fragment like "agent_id = :agent_id AND user_id = :user_id"

tuple[str, dict[str, str]]

and params is the corresponding {name: value} dict for

tuple[str, dict[str, str]]

parameterised queries. Both are empty ("" and {}) when no

tuple[str, dict[str, str]]

fields are set.


LangChain Adapters

LakehouseChatHistory

lakehouse_memory.adapters.langchain.LakehouseChatHistory

Bases: BaseChatMessageHistory

BaseChatMessageHistory backed by the episodic memory store.

Bridges Memory.episodic to the LangChain BaseChatMessageHistory interface so that LakehouseChatHistory can be dropped directly into RunnableWithMessageHistory.

Each chat turn is persisted as an episodic event with event_type="chat_message" and payload={"role": "human"|"ai", "content": ...}.

Scope is inherited from the Memory instance supplied at construction time. To isolate history per session, call memory.with_scope(session_id="new-session-id") before constructing this object.

Note

clear() is an intentional no-op. Episodic memory is append-only by design; deleting history is not supported. To start a fresh conversation, change the session_id via memory.with_scope(session_id=...).

__init__(memory, limit=100)

Initialise a LakehouseChatHistory.

Parameters:

Name Type Description Default
memory Memory

A Memory instance (typically already scoped to a session via memory.with_scope(session_id=...)).

required
limit int

Maximum number of most-recent messages to return when messages is accessed. Defaults to 100.

100

clear()

No-op: episodic memory is append-only and does not support deletion.

For a fresh conversation, derive a new scoped instance via memory.with_scope(session_id="<new-session-id>") and pass it to a new LakehouseChatHistory.

LakehouseSemanticRetriever

lakehouse_memory.adapters.langchain.LakehouseSemanticRetriever

Bases: BaseRetriever

BaseRetriever backed by the semantic memory store.

Bridges Memory.semantic to the LangChain BaseRetriever interface so that LakehouseSemanticRetriever can be dropped into any retrieval chain or used with create_retrieval_chain.

Each retrieved fact becomes a Document whose page_content is the fact text; all other fact columns (source, scope fields, timestamps, etc.) are placed in metadata (text is excluded since it is promoted to page_content).

Scope is inherited from the Memory instance supplied at construction time.

Attributes:

Name Type Description
memory Memory

The Memory instance whose semantic store is queried. Scope filtering (user / session / agent) is applied automatically from memory.scope.

k int

Number of semantically-similar facts to return per query. Defaults to 5.