Gianfranco's Newsletter
Posts
Dry Castles and Digital Moats: Defensibility in Generative AI (Part II)

Dry Castles and Digital Moats: Defensibility in Generative AI (Part II)

From commodity to competitive edge: Crafting AI solutions that generic models can't touch

Gianfranco Filice
July 10, 2024

Essay Information:

Words: 5,697 | Est. Reading Time: 21 minutes

Author’s Note: This essay is part two of a three-part series on Defensibility Levers in Generative AI, employing Hamilton Helmer’s 7 Powers as a frame for our analysis. If you are unfamiliar with the 7 Powers, please refer to part one for a brief, but essential, introduction.

When all the magic behind your startup is an API call away, where does your power lie?

In Part I, we investigated how early-stage founders could build an edge into their products at the Application Layer. The solution was to apply a domain-specific approach to product sense, earning the right to sell a solution rather than software.

The crux of this exploration was product sense. Too many founders start with capacity-forward thinking (what can I smash with the HammerLLM) rather than problem-forward thinking. As HBS professor Theodore Levitt put it, “People don’t buy a quarter-inch drill; they buy a quarter-inch hole.” It’s through this lens that we can evaluate why capacity-forward thinking, in the case of using an off-the-shelf LLM, won’t be the path to building power.

This essay provides a more nuanced look at building power at the Model Layer. With product sense as the frame for our analysis, we will explore how model architecture can help founders build “digital moats”—blending model training, alignment, and tooling to confer power.

First, our analysis will cover the drawbacks of relying on closed-source models for B2B products. We will examine how the indiscriminate training data for these large models produces “average” results, creating an opportunity for startups to find a wedge in specialization.

Second, we will look at strategies at the Model Layer. Digging into three pillars of effective model refinement — training, alignment, and tooling — and how they can be leveraged to build meaningful workflow solutions.

Lastly, we will review a case study. We will illustrate how Runway ML's product iteration and research led it to many of the powers enabled through the techniques discussed throughout this essay.

Let’s start with the limitations of large, closed-sourced models.

The Open Secret: Why Closed-Source Models Fall Short

For resource-strapped startups attempting to demonstrate the value of their solution early on in their founding, off-the-shelf generic models like GPT or Claude represent critical starting points. At inception, something quick and cheap that works can be more important than power.

But what happens when startups rely too heavily on these models? They risk relinquishing power.

The optimization goals of these generic large language models (LLMs) are inherently biased toward producing "surface-level" knowledge. As next-word-prediction engines, LLMs are trained to output sentences and phrases that mimic the patterns found in their vast training data. When that data encompasses the entire Internet corpus, it's no wonder essayists refuse to use them for anything other than grammar-checking tools.

Discerning readers have learned to distinguish between the depth and substance of prose crafted with human touch and the hollow, pattern-matched outputs of these machines. LLMs resort to lifeless, insight-poor phrases like "delving deep" or "in the realm of."

Of course, this “average-by-design” approach is intentional, as it makes their closed-source models more useful across use cases and user types. The implication is clear: these closed-sourced models are at odds with the nuanced understanding required to integrate into and enhance existing workflows. This may be why prompt engineering is a desired skill for Fortune 2000 companies looking to integrate these models. Achieving the desired outputs requires considerable effort and familiarity with a model.

While off-the-shelf LLMs may offer ease of use and quick implementation, they come at the cost of long-term power and the ability to compete at the Model Layer. Fundamentally, these models' greatest strength—their generic applicability—is also their greatest weakness.

Thus, the generic nature of off-the-shelf models presents a Counter-Positioning (Power 3) opportunity for startups attempting to specialize for their customers' sake.

With the disadvantages of closed-source LLMs in mind, let's transition to exploring the impact of off-the-shelf LLMs on a startup's ability to offer product sense at the Model Layer.

Generic LLMs Don’t Make (Product) Sense

In the previous section, we discussed a core drawback of off-the-shelf LLMs from a structural perspective. Here, we will explore how these generic LLMs, by nature of their one-size-fits-all approach, fail to provide the advantages of product sense.

Due to their indiscriminate training data, generic LLMs are best suited to narrowing the gap between low and average performers; they fail to deliver any meaningful return for high-performers who already excel in their role. The critical insight is to examine who benefits most from these generic LLMs and who remains underserved.

In a November 2023 paper, the National Bureau of Economic Research reported that access to Generative AI tools increased productivity and performance by 34% for novice and low-skilled workers within their sample but yielded little to no improvement for high-skilled ones. This disparity emphasizes the opportunity: while generic LLMs benefit less-experienced workers, they fail to provide significant value for top performers who need something specialized to move the needle. The study suggests that generic models pass on best practices of capable workers to newer ones, helping the latter group accelerate their learning curve. However, as with generic training, this comes with its own cost: failing to deliver any value to those whom a workflow solution is best suited to serve.

High performers, seeing a tide raise all boats but their own, naturally have demand for specialized solutions. This explains why the “built to make you sound average” capabilities of an LLM are limited to “rebuilding the middle class” rather than those at the forefront of company performance.

According to a case study from McKinsey’s State of AI report, when supply chain companies attempted to leverage an off-the-shelf LLM to extract patterns and insights from their existing data - a touted value from these models and consultants - they found that these generic models failed to contextualize the precise, predictive analytics, optimized inventory forecasts, or risk guidance that the highest performers could achieve with ease. Revisiting counter-positioning, the failure of generic LLMs to deliver the nuanced, context-specific insights that top performers require presents an opportunity in itself: workflow-specific models.

This is where the shift from "agent-as-a-service" to "workflow-as-a-service" becomes meaningful. By focusing on an underserved niche, startups can create a powerful counter-position to generic LLMs and develop the specialized, transformative solutions that a generic LLM cannot—or chooses not to—provide.

The logic follows: the key to competing against off-the-shelf models is to create bespoke, tailored models.

The next five sections of this essay evaluate levers within model refinement that make this possible, particularly from the perspective of resource-constrained startups.

Tailor-Made: Stitching Together Training, Alignment, and Tools

Having established the limitations of generic, closed-source LLMs, we will now transition to the technical strategies within model refinement that serve to create power at the Model Layer.

It’s worth noting that implementing these strategies is inherently hard, but there is a purpose to the difficulty: Perkins’ Law.

Perkins’ Law, coined by one of the original partners behind Kleiner Perkins, states that market risk is inversely proportional to technical risk. In other words, solving a difficult technical problem reduces the chance for others to compete in the same market.

By building a technical moat with Perkins’ Law in mind, combining product sense and technical acumen gives rise to Cornered Resources. Iterating through model refinement demonstrates your startup’s ability to meet the moment, refining what is merely “acceptable” off the shelf and transforming it into something “mission critical” for a customer. The process of overcoming technical challenges to achieve a cornered resource is what we can refer to as a Process Power.

The path toward these Cornered Resources and Process Power requires insight into model refinement.

We will soon analyze three core levers within this practice. The first will be on the impact of training a model with synthetic data in pursuit of specialization, demonstrating that data quality matters more than data quantity. The second will cover model alignment (e.g., getting outputs that match expectations) with new techniques like Direct Policy Optimization (DPO). Lastly, we will cover tooling that can augment your model. Each lever will serve to build upon a Cornered Resource in serving a workflow.

Before looking at each of these pillars in depth, it’s critical to understand the role data plays in achieving power at the Model Layer. In the next section, we will briefly explore why niche, high-quality data is crucial for startups looking to create powerful, specialized models.

Niche is the New Black: Cornering the Market with Targeted Data

Prevailing wisdom suggests that a transformer model’s utility increases proportionately with data volume and parameter count.

We beg to differ. Contrary to the graph’s implications, the true power of models lies in data precision rather than data quantity.

Founders, here is your opportunity to enter the arena. Focusing on curating high-quality, niche data yields a Cornered Resource that provides a much-needed competitive edge at the workflow.

Just as a student’s learning outcomes depend on the quality of their study materials, an LLM’s effectiveness is determined by the quality of its training data. As discussed, the limitations of one-size-fits-all closed-source solutions create an opportunity for open-source models tailored to bridge this gap. The rigid architectures of these closed models prevent the precise customizations startups require to deliver at the workflow.

This essay posits that the path towards power lies in shifting reliance on generic, large language models toward specialized, open-sourced alternatives enchased with niche data.

With this approach, a startup can leverage the transparency and full control needed to deliver a solution with commensurate product sense.

The graph below, which gained attention through Andrew Ng’s analysis, demonstrates the impact of data quality on the performance of machine learning algorithms. The graph measures a model’s accuracy using Mean Average Precision (mAP), which is a common evaluation metric in machine learning that assesses the quality of a model’s prediction. The results are clear: a high mAP score corresponds with better model performance.

The data shows that models trained with noisy, indiscriminate data, represented by the orange line, require 3x the amount of training examples to achieve the same accuracy as models trained with high-fidelity, targeted data, as indicated by the green line. This disparity underscores the fact that the size of generic LLMs, which use noisy data, compensates for a lack of focused data.

The emphasis for founders is to focus less on making ever larger models, and instead, focus on cultivating the precision and relevance of training data.

Now that we’ve provided background on the importance of quality, focused data, we will return to the three imperative levers that can be used to refine the model architecture of open-source models—models that create power.

Let’s begin by exploring defensibility, starting with synthetic data.

Unreal Data, Real Results: The Power of Synthetic Data

High-quality, curated data for training Gen AI models is both scarce and sought-after. The resource limitations early-stage startups inherently face make obtaining these curated datasets challenging. Synthetic data alleviates this challenge: founders have a new lever with which they can enter the arena not just well equipped but also expertly trained, even without access to vast amounts of real-world data.

Synthetic data, which consists of artificially generated datasets that simulate real-world scenarios, offers an alternative ‌way to fine-tune language models when real data is sparse. Fine-tuning here thus becomes the use of synthetic data to adapt an existing pre-trained model to improve its performance on specific tasks.

Recognizing the challenges in data accumulation, platforms like LangChain and HuggingFace have developed tools and a community base to simplify the process of generating and refining synthetic datasets for model training. Langchain's Tuna (pictured below) offers a no-code platform for generating synthetic, LLM-ready datasets; HuggingFace's TRL platform focuses on refining and customizing these synthetic datasets further, making it easier for startups to adapt them for fine-tuning their models.

HuggingFace's tools allow lean teams to format datasets across conversational or instructional steps, enabling the creation of datasets that are better equipped to support specific workflows. With customization at the data layer, startups are empowered to develop high-quality models, without the expensive infrastructure and high compute requirements.

To illustrate the effectiveness of synthetic data, we can look at a paper published by the Microsoft Research team on their Phi-3 model. The study highlighted that the Phi-3-mini model, with 3.8 billion parameters, trained on 3.3 trillion tokens of heavily filtered web data and synthetic data, outperformed larger models like Llama 3 in specific reasoning tasks. Despite its smaller size, Phi-3-mini achieved a 69% score on the MMLU benchmark and 8.38 on MT-bench, demonstrating the practical potential of tailored, synthetic data for compute-constrained startups.

In essence, synthetic data serves as a crucial model refinement lever early in a startup's journey, enabling them to outmaneuver generic LLMs by focusing on specialized datasets tailored to their specific workflow.

But remember, synthetic data is just one piece of a much bigger puzzle.

Synthetic data is a necessary step for securing power, but it's not enough on its own. When this strategy is combined with the empathy-driven product sense discussed in Part I of this essay series, startups can leverage their expertise to curate databases that are tuned to the unique needs of their target customers. The combination of domain expertise, customer empathy, and tailored synthetic data is critical in creating powerful, customized open-source models that can outperform larger, more compute-intensive systems in targeted tasks.

Looking to the next section, the second pillar of powerful model refinement is composed of model alignment techniques that can improve model performance.

Policy Matters: Directly Optimizing for Success

If data is the concrete upon which a foundation (model) is built, model alignment is the mold that shapes the concrete into its final form.

Despite having billions of parameters and petabytes of training data, generic large language models require extensive prompt engineering, evaluation, and familiarity with the model to generate the outputs users expect.

As discussed in the first section of this essay, generic LLMs prioritize the quantity of concrete over the precision of the mold.

Consequently, when it comes to specialized use-cases, as in the case of building a skate park as opposed to a parking lot, the molding, or specialization, tends to be of higher importance. As with construction materials, concrete is a commodity, and true value unfolds from the application of that commodity toward a defined end. In other words, generic LLMs force users to adapt their workflows to fit the model, rather than refining the model to adapt to the needs of the user. This gap in alignment presents another opportunity for counter-positioning.

New innovations like Direct Policy Optimization (DPO) allow startups to develop Gen AI solutions that help to refine the "molds" over time to create their specialized "skate parks” rather than generic, one-size-fits-all structures.

DPO helps align a model by iteratively integrating user preferences into its decision-making process. In this context, a policy refers to the course of actions taken by a model in response to human preferences. The process takes a user prompt and offers multiple responses, allowing the user to choose the output most aligned with their expectations. This user feedback is incorporated into the model over time, refining its decision-making process to align with aggregated user preferences as the model learns.

The novelty behind this approach lies in its ability to remove certain requirements common in Reinforcement Learning with Human Feedback (RLHF), including a separate reward model, extensive sampling, and hyperparameter tuning. Implementing DPO balances the trade-off between computational efficiency (simpler training) and product sense (personalization).

To illustrate how DPO works in practice, the following visual from the team at Conviction shows how the iterative feedback loop allows the model to align with user preferences:

Unlike traditional model alignment mechanisms like RLHF, DPO enables models to grasp nuanced components of a response, including contextual appropriateness and domain-specific accuracy. For instance, in customer service workflows, a DPO-optimized LLM can provide more accurate responses to customer inquiries by refining the model based on feedback regarding its outputs with respect to industry-specific queries and resolutions.

The results are clear: shaping concrete requires the right mold. Purpose-built foundation models, using tools like DPO, help founders transform raw materials into powerful solutions by refining their "molds" with synthetic data.

Having explored the impact of novel solutions like DPO for model alignment, we can now turn our attention to the third and final pillar of powerful model refinement: model tooling. The next section will examine how Retrieval-Augmented Generation can be combined with synthetic data and DPO to increase the likelihood of achieving power at the Model Layer.

The Right RAG for the Job: Enhancing Models with Tooling

Have you ever come across a PhD student in Artificial Intelligence who doesn't read relevant academic papers on ArXiv, attend and present their work at conferences, or find ways to connect with other researchers working on related work?

Probably not.

Their role demands that they stay informed, aware, and cognizant of the innovations impacting their field. In a similar vein, when it comes to workflow models, would you trust a tool that fails to incorporate the essential knowledge base, insights, and documents relevant to your work?

Probably not.

This line of thinking sheds light on why LLM tooling is so valuable. While LLMs appear like second brains, they are just models excelling at predicting the most relevant next word in a sequence. Stretching an LLM beyond its strengths invites disappointment.

These models are trained on static datasets that represent a snapshot of knowledge at a specific point in time, limiting their ability to incorporate new information and adapt to changing contexts.

Consequently, LLMs without tooling would be like a PhD student who chose never to expand their knowledge or contribute to their field after completing their coursework: potential impact and relevance is curbed. Likewise, LLMs that don't evolve with tooling end up generating outdated, irrelevant, or inconsistent outputs, all based on their snapshot-in-time knowledge.

As a tool, Retrieval-Augmented Generation (RAG) offers a solution to the inherent limitations of an LLM, giving these systems access to up-to-date, relevant information during the text generation process. Equipping an LLM with the ability to retrieve and utilize external knowledge, RAG gives startups a chance to create models that tackle real-world reasoning tasks demanded in the workflow.

The client side of RAG works by dividing a language model's task into two primary functions, as demonstrated in the illustration below: information retrieval and response generation. In the retrieval step, the system analyzes user input and queries an external vector database to find relevant, context-specific information. Vector databases store company-specific knowledge as numerical representations called embeddings that enable semantic search and retrieval of information. This retrieved context is then combined with the original prompt and fed into the LLM, enabling the model to generate its response with up-to-date information rather than relying on its pre-existing, static knowledge base.

Model refinement, using tools like RAG, has considerable implications for the Model and Application Layer. Returning to our case study from Part I in our essay series, Perplexity AI and its application of RAG created the basis for it to “unbundle search” and compete against the likes of Google.

Beyond acting as a router for both open and closed-source models, Perplexity differentiates itself through its advanced data indexing and RAG pipeline, enabling it to deliver actionable internet search responses. In this case, the laudable system behind Perplexity is doing more than “fetching” semantically relevant information; it implements RAG in production environments, balancing speed, relevance, and accuracy trade-offs based on a user’s search query.

Perplexity counter-positioned itself as an alternative to Google by leveraging its cornered resource, the RAG pipeline, to adapt to users' search habits rather than forcing them to work within the limitations of a static model. In other words, tooling was a means to an end for making the user the hero of the story. Again, it’s about problem-forward thinking rather than HammerLLM!

However, it is important to be aware of the drawbacks. While RAG can pull relevant context, it may not always be sufficient to reduce hallucinations. LLM evaluation still matters. Beware: users might rely too heavily on RAG-generated outputs and adopt an “trust-always, verify-never” approach versus a more prudent “trust but verify” one, negating the purpose behind these copilots.

The difficulty of achieving RAG with unstructured data (e.g., non-JSON formats) has led to the emergence of startups focusing solely on developing the “middleware” between data and response generation, as represented in the previous diagram.

But middleware isn't enough: any tool that touches an LLM also inherits these systems' lack of transparency and unpredictability. Probabilistic black-box models like LLMs deviate from traditional debugging; testing and model evaluation replace the deterministic logic familiar to software engineers, requiring new tools and approaches to LLMOps that introduce their own form of friction in delivering a workflow solution.

To overcome these limitations, one must achieve process power. These challenges are invitations: the spoils of model refinement are a mission-critical product rather than a nice-to-have. A Ph.D. student's approach to their research demands access to information beyond the textbook, and so too does a model demand the adaptability and notion of learning to break beyond the confines of static knowledge bases.

Model Refinement, A Summary

In the last three sections, we covered three pillars of model refinement that lead to power: model fine-tuning with synthetic data, model alignment with DPO, and model tooling with RAG.

Synthetic data provides the flexibility to train open-source models for specific use cases. It is thus a powerful tool for creating difficult-to-replicate resources when guided by a keen understanding of your target users' needs.

Model alignment through DPO provides a lever for refining models iteratively, with the help of those who are best situated to benefit: users. With more precision in outputs, the weights that make up this bespoke model evolve into a cornered resource that few others would be able to replicate over time.

Tooling, especially with examples like RAG, provides a needed enhancement to evolve a language model from a text-generation machine to a more robust copilot at the workflow. The challenges in its implementation offer hints to a process power for your startup.

To demonstrate what the thinking behind model architecture looks like in practice, we'll turn our attention to Runway ML and its application of Process Power, Cornered Resources, and Counter-Positioning.

Runway ML: A Case Study

Runway, recognized by Time Magazine as one of the 100 Most Influential Companies of 2023, is a creative platform that makes advanced, domain-specific AI capabilities accessible to anyone with Internet access. Founded in 2018 by Cristóbal Valenzuela, Alejandro Matamala, and Anastasis Germanidis, Runway originated at NYU's Tisch School of the Arts Interactive Telecommunications Program. Since then, it has become a leader in AI-driven creative tooling.

While the examples below will highlight many of the power levers available at the Model Layer, the most important take-away is the team's unwavering and unrelenting focus on product sense. The founders saw many of the innovations in AI as out of reach for their targeted market; the tools were optimized for researchers rather than users. The gap between technical expertise and those who would stand to benefit the most became the north star from which Runway's power emanated.

Process Power - Research and Model Development

In 2021, Runway collaborated with the CompVis group at LMU Munich and Stability AI, introducing Latent Diffusion, a model that would change model-based image generation.

Latent diffusion starts with compressing an image into a simplified, encoded form called a latent representation, similar to how an artist's initial sketch would capture only the essential outlines of a picture. The process then introduces random noise into the image before systematically removing it, just as an artist would add then erase smudges and blur to refine the final picture. Through iterative refinement - where a neural network guides noise removal - latent diffusion then reconstructs the original image with added details, textures, and forms, transforming the "sketch" into a photorealistic image. The introduction of noise and its iterative removal addressed diversity and image quality challenges, knocking out trade-offs inherent in previous image generation models.

Building upon their contributions to Latent Diffusion, Runway collaborated with CompVision, Stability AI, and LAION to produce Stable Diffusion in 2022. The model introduced a text encoder, which enabled text-to-image generation for the first time. The model translates the text into a visual language, guiding the image's evolution in the latent space and producing a semantically accurate representation of the text input.

Runway's background in contributing to the development of Latent and Stable Diffusion provided the necessary expertise and foundation to launch their flagship products, Gen-1 and Gen-2, in 2023.

Gen-1 introduced "Video-to-Video." This product inaugurated stylization, allowing any video to be reconfigured into the style of an input image or storyboard. It turns mockups into stylized renders and masks, and allows objects to be isolated in an image and changed using text prompts.

Gen-2 built upon the foundation of Gen 1 with "text-to-video." It gave users the ability to generate videos using just a text prompt and refined the capabilities that defined Gen 1.

In June 2024, Runway released Gen-3 Alpha, creating a new standard for high-fidelity, controllable video generation. Gen-3 Alpha is the first in a series of models trained on a new infrastructure built for large-scale multimodal training. It represents a major improvement in fidelity, consistency, and motion over Gen-2, and is a step towards building General World Models. The model's highly descriptive, temporally dense captions enable imaginative transitions and precise key-framing of elements in the scene, offering unprecedented control and creativity.

The implications of Gen-1, Gen-2, and now Gen-3 Alpha demonstrate why model refinement matters: Runway's advantage stemmed from its ability to integrate cutting-edge model refinement into production-ready solutions for its users.

They drove the innovation of the model rather than allowing the terms of an API call to drive them. As Perkins’ Law suggests, overcoming these technical hurdles offered Runway a strategic advantage when going to market.

By refining open-source models they helped develop for their specific use case and workflow, Runway cultivated Process Power by building with their end user in mind.

Cornered Resource - User-Defined Model Alignment

Runway’s approach to user feedback and model iteration helped the company achieve a Cornered Resource. By refining its models and platforms to fine-tune its product for filmmakers, designers, artists, and hobbyists, Runway has been able to build upon its technical advantages.

Runway's commitment to empathy-driven product sense comes from its founders. One of its founders, Cristóbal Valenzuela, had experimented with creating open-source models that were more accessible to end-users as early as 2016, popularizing tools like ml5.js for creatives. Ml5.js was among the first platforms to allow users to run machine learning models in a web browser with their own GPUs. It demonstrated early signs of market pull among creatives looking to push the boundaries of these models for their own workflow.

Because the team and product leadership understood the importance of alignment with users, many of their products channeled the contemporary moment's pulse.

For example, in 2020, Runway introduced "Mask," a feature that automatically propagated a mask or matte from one video scene to the next, all within a browser. This innovation created the backdrop for introducing Green Screen, a tool that would revolutionize rotoscoping. (Rotoscoping is a technique used in video editing where editors manually trace over footage, frame by frame, to create a matte or mask for isolating specific elements.) Green Screen turned a task that typically took hours to complete into a couple-clicks operation.

Runway has turned user alignment and model refinement into a flywheel. As they improve their models based on user feedback, they deliver products that, in turn, invite more users, providing more data and insights that refine their platform even further.

This resource is, therefore, cornered. Few firms can leverage the symbiotic relationship between user feedback and product development at this scale.

Counter-Positioning - Meeting Designers Where They Are

Part I of this essay series discussed one of the biggest barriers to Gen AI adoption: the chasm between the technology’s potential and a user’s ability to use it.

What is true today mirrors what Runway's founding team saw back in 2016: open-source models like ImageNet and Char-RNN improved workflows for designers in ways few thought possible. However, the target audience still lacked the technical skills needed to fully utilize these products. So, the founders had to adapt their tools to fit the users' needs, effectively making the "glove" fit the "hand."

To bridge this gap, the founders saw an opportunity to create user-friendly, AI-driven platforms that empowered users, even if they lacked technical know-how. Leveraging empathy and product sense, they identified the need to meet their users where they were, rather than forcing users to conform to their platform.

The crux of Runway's DNA is empathy: they focused on making their software as accessible as possible for their users. One example of this was meeting users on their own virtual turf, in the browser. Deviating from the convention of players like Adobe Premier Pro and Final Cut Pro, Runway did away with the native desktop app and transitioned to web-browser architecture.

The move made Runway's platform easier to access and enabled them to push updates and feature rollouts faster to users. This transition was made possible by advancements in web video codecs and standards, which enable the compression, transmission, and playback of video content on the web. With these improvements, Runway delivered a seamless, browser-based experience to their users, eliminating the need for a desktop application, thus lowering the barrier to entry for creators.

Just as generic LLMs don't make (product) sense, the ability to counter-position a platform to meet the needs of an underserved user base is the driving force behind Runway's continued prowess.

Case Study Recap

Runway's success in AI-driven video editing demonstrates many of the principles we mentioned in achieving power at the Model Layer. While the case study may not have always drawn explicit one-to-one connections between Runway's approach and the technical intricacies of model architecture, the underlying principles of process power, cornered resources, and counter-positioning remain at the heart of their success.

Process power, as demonstrated by Runway's contributions to Latent Diffusion and Stable Diffusion, laid the groundwork for their innovative products.

Cornered resources, achieved through user alignment and the development of a unique flywheel, allowed them to create a defensible position in the market.

Finally, counter-positioning, exemplified by their empathy-driven approach and strategic transition to a web-based architecture, enabled them to meet users where they were and provide a more accessible, user-friendly experience.

Conclusion

When the magic of creation is just an API call away, the low barriers to entry and the ability to iterate at the surface level can be false signals of survival.

However, these limitations of generic, closed-source LLMs, which are hamstrung by their "average-by-default" nature, present a unique opportunity for startups. By leveraging specialization and counter-positioning, startups can overcome such limitations and create tailored solutions that cater to their target audience's needs.

By employing methods such as synthetic data generation, Direct Policy Optimization (DPO), and Retrieval-Augmented Generation (RAG), startups can map out a path to power at the Model Layer. These techniques enable the creation of powerful, customized solutions that not only outperform generic LLMs but also delight customers at the workflow.

Thus, true power, which withstands the test of time in carved out defensible niches, stems from the artful refinement and nuanced understanding of the Model Layer.

Here, we pull forward the crux of Part I of this essay series: product sense.

Without product sense, any attempt toward model refinement would be “capacity-forward” rather than “problem-forward.” Without product sense, we’d fail to have the user-centric focus that gets us closer to the specialized workflow solutions that confer power.

The Runway ML case study serves as a prime example of how these principles of model refinement and the strength of product sense can be applied in practice. The company's journey exemplifies the path to defensibility that we have outlined: by leveraging product sense, iterating on user feedback, and employing specialized model refinement techniques, they have created a platform that otherwise would make innovations in AI inaccessible to those who stand the most to benefit: filmmakers, designers, artists, and hobbyists.

Startups that choose to move beyond an API call are startups who empower themselves. While the specifics of model refinement are important, the vision, technical acumen, and problem-forward thinking is what really separates those who dabble from those who thrive.

The reward of power awaits: enter the arena and build your own "digital castles.”

The final part of this essay series will explore power at the Execution Layer.

Acknowledgments

As I reflect on the journey of crafting "Dry Castles, Digital Moats: Defensibility in Generative AI – Part II," I am profoundly grateful for the collaborative spirit and intellectual rigor brought to the table by a remarkable few who answered the call for commentary. Their diverse expertise and insights have been instrumental in shaping a nuanced and comprehensive exploration of strategic frameworks in Generative AI for startups.

In-Depth Technical Insights:

Zhengyuan Zhou, Ph.D, Assistant Professor NYU Stern School of Business, whose expertise lies in machine learning, stochastic optimization, and game theory. With a PhD in Electrical Engineering from Stanford University and prior experience as a Goldstine Research Fellow at IBM, Professor Zhou's insights significantly enriched Part II of this essay.
Etan Green, Ph.D, visiting scholar at Arena AI and Assistant Professor of Operations, Information, and Decisions at The Wharton School, applies deep reinforcement learning to real-world economic problems. With a PhD from Stanford University and a background as a Postdoctoral Researcher at Microsoft Research, Etan's contributions provided valuable depth to Part II of this essay.
Rakesh Gidwani, CTO and partner at Protagonist Fund and with a background at Two Sigma, generously shared his comprehensive technological and business strategy expertise across all three parts.
Brandon Cui, a Machine Learning Engineer at Mosaic ML (acq. Databricks) with a rich background at Meta AI (FAIR), significantly enhanced Part I and Part 2 of my essay—his deep understanding of machine learning from a pragmatic perspective enriched the analysis.

Editorial:

My profound gratitude to Angela Black for her contribution to the cohesion, logic, and clarity of this essay.

These individuals' collective wisdom and experience have shaped my essay. I am deeply grateful for their invaluable contributions and the unique perspectives they brought to this work.

References

Reply

or to participate.