API Reference¶
Auto-generated from source docstrings.
Memory¶
lakehouse_memory.memory.Memory
¶
Composition root that wires episodic, semantic, and working stores.
Each Memory instance holds a SQL client, one vector index per store,
and a Scope that filters all reads and tags all writes to a specific
user / session / agent combination.
The canonical way to create a Memory wired to real Databricks resources is
Memory.from_databricks. After construction, call provision() once to
create the underlying Unity Catalog tables and Vector Search indexes.
Example::
mem = Memory.from_databricks(
catalog="my_catalog",
schema_name="agent_memory",
workspace_url="https://my-workspace.azuredatabricks.net",
access_token="dapi...",
http_path="/sql/1.0/warehouses/abc123",
vector_search_endpoint="my_vs_endpoint",
)
mem.provision()
scoped = mem.with_scope(user_id="u1", session_id="s1")
scoped.episodic.write(event_type="chat_message", payload={}, text="Hello")
__init__(config, client, *, episodic_index, semantic_index, scope=None)
¶
Initialise Memory with explicit collaborators.
Prefer Memory.from_databricks for production use. This constructor
is useful in tests where you supply stub clients and no-op indexes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
MemoryConfig
|
Catalog/schema/embedding settings for this Memory instance. |
required |
client
|
DatabricksClient
|
SQL client used by all three stores for DDL and DML. |
required |
episodic_index
|
VectorIndex
|
Vector index used by the episodic store for
similarity search. Pass a no-op |
required |
semantic_index
|
VectorIndex
|
Vector index used by the semantic store for
similarity search. Pass a no-op |
required |
scope
|
Scope | None
|
Optional identity scope applied to every read and write.
Defaults to an empty |
None
|
as_langchain_chat_history(limit=100)
¶
Return a LangChain BaseChatMessageHistory wired to the episodic store.
Requires the [langchain] optional extra::
pip install lakehouse-memory[langchain]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum number of recent chat messages to return when
|
100
|
Returns:
| Type | Description |
|---|---|
LakehouseChatHistory
|
A |
as_langchain_retriever(k=5)
¶
Return a LangChain BaseRetriever wired to the semantic store.
Requires the [langchain] optional extra::
pip install lakehouse-memory[langchain]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
int
|
Number of semantically-similar facts to return per query.
Defaults to |
5
|
Returns:
| Type | Description |
|---|---|
LakehouseSemanticRetriever
|
A |
LakehouseSemanticRetriever
|
scope. |
from_databricks(*, catalog, schema_name, workspace_url, access_token, http_path, vector_search_endpoint, scope=None, embedding=None)
classmethod
¶
Build a Memory wired to real Databricks resources.
Constructs a SqlConnectorClient and two Delta Sync-backed
DatabricksVectorIndex objects (one for episodic events, one for
semantic facts) and stashes the VS credentials so that a later call to
provision() can create the indexes without repeating them.
Does not provision — call mem.provision() after construction to
idempotently create the Unity Catalog tables and Vector Search indexes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
catalog
|
str
|
Unity Catalog catalog name (e.g. |
required |
schema_name
|
str
|
Schema inside catalog where memory tables live
(e.g. |
required |
workspace_url
|
str
|
Full Databricks workspace URL, including scheme
(e.g. |
required |
access_token
|
str
|
Databricks personal-access token or service-principal secret used for both SQL Warehouse and Vector Search API calls. |
required |
http_path
|
str
|
SQL Warehouse HTTP path
(e.g. |
required |
vector_search_endpoint
|
str
|
Name of the existing Databricks Vector Search endpoint to back both indexes. |
required |
scope
|
Scope | None
|
Optional identity scope to pre-apply to every store.
Defaults to an empty |
None
|
embedding
|
EmbeddingConfig | None
|
Optional embedding endpoint configuration. Defaults to
|
None
|
Returns:
| Type | Description |
|---|---|
Memory
|
A fully-wired |
Memory
|
reading or writing to ensure the underlying tables and indexes exist. |
provision(*, vector_search_endpoint=None, workspace_url=None, access_token=None)
¶
Idempotently create the UC schema + tables and, optionally, the Vector Search indexes.
Always creates the Unity Catalog schema (if absent) and the three memory
tables (episodic, semantic, working). When a Vector Search
endpoint is available — either supplied here or stashed by
from_databricks — also creates the two Delta Sync indexes.
Safe to call multiple times; existing tables and indexes are left untouched.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector_search_endpoint
|
str | None
|
Name of the Databricks Vector Search endpoint
to use when creating indexes. Falls back to the value stashed by
|
None
|
workspace_url
|
str | None
|
Workspace URL needed for Vector Search API calls.
Falls back to the value stashed by |
None
|
access_token
|
str | None
|
Databricks PAT or service-principal secret for Vector
Search API calls. Falls back to the value stashed by
|
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If vector_search_endpoint is resolved but workspace_url or access_token cannot be determined. |
with_scope(*, user_id=None, session_id=None, agent_id=None)
¶
Return a new Memory with scope fields merged from the given arguments.
Any field you pass overrides the corresponding field on the current
scope; fields you omit (or pass as None) are inherited unchanged.
The new instance shares the same SQL client and vector indexes as the
original — no new connections are opened. Stashed VS credentials
(workspace_url, access_token, endpoint) are forwarded so that
provision() may still be called on the derived instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_id
|
str | None
|
Override the |
None
|
session_id
|
str | None
|
Override the |
None
|
agent_id
|
str | None
|
Override the |
None
|
Returns:
| Type | Description |
|---|---|
Memory
|
A new |
Memory
|
three stores. |
MemoryConfig¶
lakehouse_memory.config.MemoryConfig
¶
Bases: BaseModel
Top-level configuration for a Memory instance.
Holds the Unity Catalog coordinates (catalog + schema) that determine where the three memory tables are stored, plus the embedding configuration used by Vector Search. Instances are frozen (immutable) after construction.
Attributes:
| Name | Type | Description |
|---|---|---|
catalog |
str
|
Unity Catalog catalog name. Must be non-empty. |
schema_name |
str
|
Schema inside catalog where the |
embedding |
EmbeddingConfig
|
Embedding endpoint configuration used when creating and
querying Vector Search indexes. Defaults to
|
fqn(table)
¶
Return the fully-qualified Unity Catalog name for a table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str
|
Unqualified table name (e.g. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Three-part identifier |
str
|
for use in SQL statements and Vector Search index names. |
EmbeddingConfig¶
lakehouse_memory.config.EmbeddingConfig
¶
Bases: BaseModel
Configuration for the embedding model used by Databricks Vector Search.
Specifies which Foundation Model API endpoint produces the embeddings that back the episodic and semantic vector indexes, along with the expected output dimensionality.
Attributes:
| Name | Type | Description |
|---|---|---|
endpoint_name |
str
|
Databricks Foundation Model API endpoint name that
generates embeddings. Must match the endpoint used when the Vector
Search index was created. Defaults to
|
dimensions |
int
|
Dimensionality of the embedding vectors produced by
endpoint_name. Must be positive. Defaults to |
Scope¶
lakehouse_memory.scope.Scope
dataclass
¶
Identity scope that constrains memory reads and tags memory writes.
A Scope represents a specific combination of identity dimensions:
user_id, session_id, and/or agent_id. Any subset of these
may be set; unset fields act as wildcards — they are absent from SQL
WHERE clauses and vector metadata filters, so they match all values.
Scope is the single source of truth for scope-related SQL and vector
filter construction. Every store applies these filters automatically to
every read operation and includes all set fields as columns on every write.
Instances are frozen (immutable). Use merge to derive a new Scope
with some fields overridden.
Attributes:
| Name | Type | Description |
|---|---|---|
user_id |
str | None
|
Identifies the end-user whose memory is being accessed. |
session_id |
str | None
|
Identifies the conversation session. |
agent_id |
str | None
|
Identifies the agent (or agent variant) operating on memory. |
merge(other)
¶
Return a new Scope with other's set fields overriding self's.
Fields that are None on other are inherited from self
unchanged. This allows incremental narrowing of scope without losing
previously set dimensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
Scope
|
A |
required |
Returns:
| Type | Description |
|---|---|
Scope
|
A new frozen |
to_metadata_filter()
¶
Build a metadata filter dict for Databricks Vector Search queries.
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
A |
dict[str, str]
|
set on this scope. An empty dict is returned when no fields are |
dict[str, str]
|
set (i.e., no filtering is applied). |
to_where_clause()
¶
Build a SQL WHERE-clause fragment and a bound-parameter map.
Clauses are AND-joined in stable alphabetical field order so that the generated SQL is deterministic across Python versions and runtimes.
Returns:
| Type | Description |
|---|---|
str
|
A 2-tuple |
dict[str, str]
|
fragment like |
tuple[str, dict[str, str]]
|
and params is the corresponding |
tuple[str, dict[str, str]]
|
parameterised queries. Both are empty ( |
tuple[str, dict[str, str]]
|
fields are set. |
LangChain Adapters¶
LakehouseChatHistory¶
lakehouse_memory.adapters.langchain.LakehouseChatHistory
¶
Bases: BaseChatMessageHistory
BaseChatMessageHistory backed by the episodic memory store.
Bridges Memory.episodic to the LangChain
BaseChatMessageHistory interface so that LakehouseChatHistory can
be dropped directly into RunnableWithMessageHistory.
Each chat turn is persisted as an episodic event with
event_type="chat_message" and
payload={"role": "human"|"ai", "content": ...}.
Scope is inherited from the Memory instance supplied at construction
time. To isolate history per session, call
memory.with_scope(session_id="new-session-id") before constructing
this object.
Note
clear() is an intentional no-op. Episodic memory is append-only
by design; deleting history is not supported. To start a fresh
conversation, change the session_id via
memory.with_scope(session_id=...).
__init__(memory, limit=100)
¶
Initialise a LakehouseChatHistory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
memory
|
Memory
|
A |
required |
limit
|
int
|
Maximum number of most-recent messages to return when
|
100
|
clear()
¶
No-op: episodic memory is append-only and does not support deletion.
For a fresh conversation, derive a new scoped instance via
memory.with_scope(session_id="<new-session-id>") and pass it to a
new LakehouseChatHistory.
LakehouseSemanticRetriever¶
lakehouse_memory.adapters.langchain.LakehouseSemanticRetriever
¶
Bases: BaseRetriever
BaseRetriever backed by the semantic memory store.
Bridges Memory.semantic to the LangChain BaseRetriever interface
so that LakehouseSemanticRetriever can be dropped into any retrieval
chain or used with create_retrieval_chain.
Each retrieved fact becomes a Document whose page_content is the
fact text; all other fact columns (source, scope fields, timestamps, etc.)
are placed in metadata (text is excluded since it is promoted to
page_content).
Scope is inherited from the Memory instance supplied at construction
time.
Attributes:
| Name | Type | Description |
|---|---|---|
memory |
Memory
|
The |
k |
int
|
Number of semantically-similar facts to return per query.
Defaults to |