Async API¶We demonstrate the following functionalities suppored by LanceDB using our asynchonous APIs: Automatic versioning Instant rollback Appends, updates, deletions Schema evolution Let's first prepare the data. We will be using a CSV file with a bunch of quotes from Rick and Morty In [50]: Copied! !wget http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv !head rick_and_morty_quotes.csv !wget http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv !head rick_and_morty_quotes.csv --2024-12-17 15:58:31-- http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv Resolving vectordb-recipes.s3.us-west-2.amazonaws.com (vectordb-recipes.s3.us-west-2.amazonaws.com)... 3.5.84.162, 3.5.76.76, 52.92.228.138, ... Connecting to vectordb-recipes.s3.us-west-2.amazonaws.com (vectordb-recipes.s3.us-west-2.amazonaws.com)|3.5.84.162|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 8236 (8.0K) [text/csv] Saving to: ‘rick_and_morty_quotes.csv.3’ rick_and_morty_quot 100%[===================>] 8.04K --.-KB/s in 0s 2024-12-17 15:58:31 (160 MB/s) - ‘rick_and_morty_quotes.csv.3’ saved [8236/8236] id,author,quote 1,Rick," Morty, you got to come on. You got to come with me." 2,Morty," Rick, what’s going on?" 3,Rick," I got a surprise for you, Morty." 4,Morty," It’s the middle of the night. What are you talking about?" 5,Rick," I got a surprise for you." 6,Morty," Ow! Ow! You’re tugging me too hard." 7,Rick," I got a surprise for you, Morty." 8,Rick," What do you think of this flying vehicle, Morty? I built it out of stuff I found in the garage." 9,Morty," Yeah, Rick, it’s great. Is this the surprise?" Let's load this into a pandas dataframe. It's got 3 columns, a quote id, the quote string, and the first name of the author of the quote: In [51]: Copied! import pandas as pd df = pd.read_csv("rick_and_morty_quotes.csv") df.head() import pandas as pd df = pd.read_csv("rick_and_morty_quotes.csv") df.head() Out[51]: id author quote 0 1 Rick Morty, you got to come on. You got to come wi... 1 2 Morty Rick, what’s going on? 2 3 Rick I got a surprise for you, Morty. 3 4 Morty It’s the middle of the night. What are you ta... 4 5 Rick I got a surprise for you. Creating a LanceDB table from a pandas dataframe is straightforward using create_table We'll start with a local LanceDB connection In [35]: Copied! !pip install lancedb -q !pip install lancedb -q In [52]: Copied! import lancedb async_db = await lancedb.connect_async("~/.lancedb") import lancedb async_db = await lancedb.connect_async("~/.lancedb") In [53]: Copied! await async_db.drop_table("rick_and_morty") async_table = await async_db.create_table("rick_and_morty", df, mode="overwrite") await async_table.to_pandas() await async_db.drop_table("rick_and_morty") async_table = await async_db.create_table("rick_and_morty", df, mode="overwrite") await async_table.to_pandas() [2024-12-17T23:58:46Z WARN lance::dataset::write::insert] No existing dataset at ~/.lancedb/rick_and_morty.lance, it will be created Out[53]: id author quote 0 1 Rick Morty, you got to come on. You got to come wi... 1 2 Morty Rick, what’s going on? 2 3 Rick I got a surprise for you, Morty. 3 4 Morty It’s the middle of the night. What are you ta... 4 5 Rick I got a surprise for you. 5 6 Morty Ow! Ow! You’re tugging me too hard. 6 7 Rick I got a surprise for you, Morty. 7 8 Rick What do you think of this flying vehicle, Mor... 8 9 Morty Yeah, Rick, it’s great. Is this the surprise? 9 10 Rick Morty, I had to I had to I had to I had to ma... Updates¶ Now, since Rick is the smartest man in the multiverse, he deserves to have his quotes attributed to his full name: Richard Daniel Sanchez. This can be done via LanceTable.update. It needs two arguments: A where string filter (sql syntax) to determine the rows to update A dict of updates where the keys are the column names to update and the values are the new values In [54]: Copied! await async_table.update(where="author='Morty'", updates={"author": "Richard Daniel Sanchez"}) await async_table.to_pandas() await async_table.update(where="author='Morty'", updates={"author": "Richard Daniel Sanchez"}) await async_table.to_pandas() Out[54]: id author quote 0 1 Rick Morty, you got to come on. You got to come wi... 1 3 Rick I got a surprise for you, Morty. 2 5 Rick I got a surprise for you. 3 7 Rick I got a surprise for you, Morty. 4 8 Rick What do you think of this flying vehicle, Mor... 5 10 Rick Morty, I had to I had to I had to I had to ma... 6 12 Rick We’re gonna drop it down there just get a who... 7 14 Rick Come on, Morty. Just take it easy, Morty. It’... 8 16 Rick When I drop the bomb you know, I want you to ... 9 18 Rick And Jessica’s gonna be Eve,… Schema evolution¶ Let's add a new_id column to the table, where each value is the original id plus 1. In [55]: Copied! await async_table.add_columns({"new_id": "id + 1"}) await async_table.to_pandas() await async_table.add_columns({"new_id": "id + 1"}) await async_table.to_pandas() Out[55]: id author quote new_id 0 1 Rick Morty, you got to come on. You got to come wi... 2 1 3 Rick I got a surprise for you, Morty. 4 2 5 Rick I got a surprise for you. 6 3 7 Rick I got a surprise for you, Morty. 8 4 8 Rick What do you think of this flying vehicle, Mor... 9 5 10 Rick Morty, I had to I had to I had to I had to ma... 11 6 12 Rick We’re gonna drop it down there just get a who... 13 7 14 Rick Come on, Morty. Just take it easy, Morty. It’... 15 8 16 Rick When I drop the bomb you know, I want you to ... 17 9 18 Rick And Jessica’s gonna be Eve,… 19 If we look at the schema, we see that a new int64 column was added In [56]: Copied! await async_table.schema() await async_table.schema() Out[56]: id: int64 author: string quote: string new_id: int64 Rollback¶ Suppose we used the table and found that the new column should be a different value. How do we use another new column without losing the change history? First, major operations are automatically versioned in LanceDB. Version 1 is the table creation, with the initial insertion of data. Versions 2 and 3 represents the update (deletion + append) Version 4 is adding the new column. In [57]: Copied! await async_table.checkout_latest() await async_table.list_versions() await async_table.checkout_latest() await async_table.list_versions() Out[57]: [{'version': 1, 'timestamp': datetime.datetime(2024, 12, 17, 15, 58, 46, 983259), 'metadata': {}}, {'version': 2, 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 0, 291948), 'metadata': {}}, {'version': 3, 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 8, 381165), 'metadata': {}}] We can restore version 3, before we added the new_id vector column In [58]: Copied! await async_table.checkout(2) await async_table.restore() await async_table.to_pandas() await async_table.checkout(2) await async_table.restore() await async_table.to_pandas() Out[58]: id author quote 0 1 Rick Morty, you got to come on. You got to come wi... 1 3 Rick I got a surprise for you, Morty. 2 5 Rick I got a surprise for you. 3 7 Rick I got a surprise for you, Morty. 4 8 Rick What do you think of this flying vehicle, Mor... 5 10 Rick Morty, I had to I had to I had to I had to ma... 6 12 Rick We’re gonna drop it down there just get a who... 7 14 Rick Come on, Morty. Just take it easy, Morty. It’... 8 16 Rick When I drop the bomb you know, I want you to ... 9 18 Rick And Jessica’s gonna be Eve,… Notice that we now have one more, not less versions. When we restore an old version, we're not deleting the version history, we're just creating a new version where the schema and data is equivalent to the restored old version. In this way, we can keep track of all of the changes and always rollback to a previous state. In [59]: Copied! await async_table.list_versions() await async_table.list_versions() Out[59]: [{'version': 1, 'timestamp': datetime.datetime(2024, 12, 17, 15, 58, 46, 983259), 'metadata': {}}, {'version': 2, 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 0, 291948), 'metadata': {}}, {'version': 3, 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 8, 381165), 'metadata': {}}, {'version': 4, 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 22, 800694), 'metadata': {}}] Add another new column¶Now we'll change the value of the new_id column and add it to the restored dataset again In [60]: Copied! await async_table.add_columns({"new_id": "id + 10"}) await async_table.add_columns({"new_id": "id + 10"}) In [61]: Copied! await async_table.schema() await async_table.schema() Out[61]: id: int64 author: string quote: string new_id: int64 Deletion¶What if the whole show was just Rick-isms? Let's delete any quote not said by Rick In [62]: Copied! await async_table.delete("author != 'Richard Daniel Sanchez'") await async_table.delete("author != 'Richard Daniel Sanchez'") We can see that the number of rows has been reduced to 30 In [63]: Copied! await async_table.count_rows() await async_table.count_rows() Out[63]: 34 Ok we had our fun, let's get back to the full quote set In [67]: Copied! await async_table.checkout(5) await async_table.restore() await async_table.checkout(5) await async_table.restore() In [68]: Copied! await async_table.count_rows() await async_table.count_rows() Out[68]: 99 History¶We now have 9 versions in the data. We can review the operations that corresponds to each version below: In [32]: Copied! await async_table.version() await async_table.version() Out[32]: 6 Versions: 1 - Create 2 - Update 3 - Add a new column 4 - Restore (2) 5 - Add a new column 6 - Delete 7 - Restore Summary¶ We never had to explicitly manage the versioning. And we never had to create expensive and slow snapshots. LanceDB automatically tracks the full history of operations I created and supports fast rollbacks. In production this is critical for debugging issues and minimizing downtime by rolling back to a previously successful state in seconds.