MoreRSS

site iconTomasz TunguzModify

I’m a venture capitalist since 2008. I was a PM on the Ads team at Google and worked at Appian before.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Tomasz Tunguz

The Multimodal Lake House : Partnering with Lance

2025-06-24 08:00:00

Remember when you took a family photo & Ghibli-styled it?

Or that vibe coding session, when you pasted a screenshot of the browser so the AI can help you debug some Javascript?

Today, we expect AI to be able to hear, see, & read. This is why multimodal is the future of AI.

Multimodal data means using text, images, video, sound, even three-dimensional shapes with AI.

image

These are magical user experiences. But they aren’t easy to build. Data pipelines must be built to manage larger files. Embeddings need to be extracted from these unstructured files in ways that don’t explode compute costs.

Multimodal data is orders of magnitude larger than text : the average PDF is 10x larger than a text file & a YouTube video is roughly a million times larger.

Plus, multimodal data doesn’t change one part of the data pipeline : engineers must process the data at each step of the AI stack, from model training to real-time serving & downstream analysis at petabyte scale.

We kept hearing about these problems from builders & in the same breath, about a company that solves them.

Founded by Chang She, creator of the Pandas Library, & Lei Xu, core contributor to HDFS, the Hadoop file system, LanceDB has a tremendous heritage within the data ecosystem.

RunwayML, Midjourney, WorldLabs, ByteDance, UBS, Harvey, & Hex use Lance. We admire the technology so much, we are using it internally at Theory as part of our AI stack & we’re excited to partner with Chang & Lei to bring multimodal AI to builders & users everywhere.

Read more about the multimodal lake house & the kinds of AI it can enable.

Fighting for Context

2025-06-20 08:00:00

Systems of record are recognizing they cannot “take their survival for granted.”

One strategy is to acquire : the rationale Salesforce gives for the Informatica acquisition.

image

Another strategy is more defensive - hampering access to the data within the systems of record (SOR).

image

Unlike the previous software era where SORs built platforms on top of themselves to develop broader ecosystems (in Salesforce’s case Veeva & Vlocity), the AI shift does seem to be more defensive.

[App builders] can no longer index, copy or store the data they access via the Slack application programming interface on a long-term basis.

This isn’t the first time within the AI ecosystem that companies have shut off access to APIs. The DeepSeek launch shook the ecosystem’s assumptions.

Newer models employ distillation : asking a previous model many questions as a way of training itself for far less to achieve similar results. The result : tighter controls over AI API access. OpenAI, Anthropic, Cohere, Google are just some of the foundation model companies that now have clauses explicitly banning developing competitors via the API.

Some startups have found terrific success by partnering with SORs : Abridge, AI for clinical documentation, has grown tremendously through its partnership with Epic. ServiceNow announced its intention to acquire MoveWorks for $2.85b.

AI has increased the stakes because the underlying workflows that have powered software are changing. BDRs no longer manage one email account, they manage five or ten or more. The growth rates of AI companies are faster than ever, & the context/data stored within systems of record is some of the most valuable assets in the market.

The combination of these three forces will drive more M&A, bigger outcomes, & greater defensive behavior as incumbents search for the right strategy for their business. For startups relying on big partners, the potential for platform risk has never been more acute.

DuckLake - Subsecond Latency on a Petabyte

2025-06-18 08:00:00

DuckLake is one of the most exciting technologies in data.

While data lakes are powerful, the formats that manage them have become notoriously difficult to work with.

“I think one of the things in DuckLake that we managed to do is to cut, I want to say like 15 technologies out of this stack.”

How does it achieve this? Instead of building a custom catalog server, DuckLake uses a simple, elegant idea: a standard database to manage metadata. It uses a database for what it’s good at. This clean architecture allows DuckLake to manage huge data lakes—with millions or even billions of files—across AWS S3 or Google Cloud Storage.

This simplicity also delivers incredible performance. In tests, DuckLake achieved sub-second query planning on a petabyte of data with 100 million snapshots—a scale that other systems can’t handle.

DuckLake speaks SQL, the lingua franca of data. Its architecture provides full ACID compliance, so concurrent reads and writes are handled seamlessly, allowing entire teams (and their AI agents) to work on the data lake simultaneously.

By returning to first principles, DuckLake delivers the power of a modern data lake without the complexity. Its simplicity and performance make it a vital part of the future of data.

The Data Decacorn Derby

2025-06-16 08:00:00

Databricks seems to be closing the gap on Snowflake faster than expected.

Last week Databricks shared some important updates on their business which allows us to compare the progress of the two companies.

image Quarterly revenue between the two company shows nearly identical slope, two parallel lines. Snowflake recently exceeded $1b in quarterly revenue mark while Databricks just touched $750m and is targeting $925m for the next quarter.

image Snowflake’s revenue growth rate has been on a long glide path to nearly asymptoting at 25% year over year. Databricks saw a resurgence in their growth rate from mid-23 to early 25, from about 50 to 60% - exceptional in a company at this stage.

This growth rate was catalyzed by the resurgence of interest in AI workloads after the launch of GPT-3.5 (the dawn of the modern AI era) in late 2022 and a rebounding software economy in early 2023.

image

Comparing the number of large customers generating more than a million per year for each company, Snowflake has about 90 more large customers.

image Databricks continues to operate at about 15% better gross margin primarily because customers operate their own compute when using Databricks, whereas Snowflake bundles the compute and storage, compressing margins.

image

Databricks achieved profitability for the first time in the most recent quarter. Meanwhile, Snowflake tends to operate between -30% to -40% net income margin. In addition, Snowflake has heavy stock-based compensation expenses that creates sawtooth patterns in their profitability.

The gross margin impact of bundling cloud services is a contributor, but without additional insight into the relative sales and marketing and research and development budgets of the two companies, it’s hard to ascertain exactly the reason for the delta.

As the battle has continued, the distance between the two companies has narrowed & the future promises more of these two data decacorns duke it out. Customers win when market leaders compete this intensely.

The Coming Wave of Acquihires

2025-06-13 08:00:00

The Seed Surge of 2021 will lead to a raft of acquihires. image

In 2021 the total number of US software & AI seeds jumped from 2900 to 4300 - a 49% jump. Seeds fell to about 3000 creating a seed tabletop.

Series As moved in lockstep both on the way up and the way down - creating a squeeze.

image

These data form part of a longer term trend of a greater number of seeds but a relatively fixed number of Series As.

The result?

An 86% reduction in the seed to Series A conversion rate. Part of this may be seeds are seed-strapping : raising a seed and continuing on profits, catalyzed by the efficiencies of A.

AI is another component : pre-AI and post-AI companies, which are roughly demarcated by the launch of GPT 3.5 in late 2022. Retooling products with AI takes time.

Even with those factors, the amount of “excess” seeds (quantity more than the Series A market can handle), continues to grow, resulting in the decreasing conversion rate.

This dynamic mirrors 2015 when a surfeit of seeds drove a steady increase in the number of acquihires.

image

Acquihires, acquisitions typically under $20m for talent, may become de rigeur as incumbents seek to bolster their teams with AI talent & update existing products with new capabilities.

Partnering with Maze Security

2025-06-10 08:00:00

Doctors and security research have more in common than you might think.

Doctors defend human bodies against an ever-shifting landscape of viruses & infections. Security researchers do the same thing, but at massive scale—protecting thousands of servers instead of a single patient.

The doctors’ responsibility are to defend a human body from an ever-shifting landscape of potential viruses and infections. Each human body is slightly different. The research around human health evolves all of the time as well as the research around potential infections.

Doctors with AI are 10 percentage points more accurate delivering diagnoses than those without AI.

Vulnerability management, the practice of identifying the security vulnerabilities that might be exploited is exactly the same thing But at much larger scale because instead of a single patient, security researchers are managing tens of hundreds of thousands of Servers, computers, routers and other kinds of infrastructure, each with their own uniquenesses at large companies.

image

Prioritizing the most important issues to address is critical and some critical severity vulnerabilities might be relevant for one company but not relevant for another company, just like A patient might be genetically predisposed to a condition where another one may not be.

In security like medicine, the ability to respond quickly to the most important issues separates the strong from the compromised.

This is exactly the challenge Maze is solving. Founded by an experienced team from Tessian, Elastic, & Amazon, they’re building AI that thinks like a senior security researcher—considering your company’s unique topology to prioritize vulnerabilities that actually matter.

Maze have replaced rules-based systems with AI that considers the company’s unique topology & infrastructure To prioritize the most important vulnerabilities & understand the impact of potential breaches.

Wiz, CrowdStrike, Orca, and other systems will produce a team of three AI analysts. Two AI analysts will determine whether or not the issue is exploitable or not, and how urgently to fix it.

Agentic Security is the future of security. We’ve seen tremendous results from working with Security Operations Center Automation, with our portfolio company, DropZone.

Maze is seizing the opportunity to transform the $16b vulnerability management market. The company is hiring!