My guest this week is Fixie founder Matt Welsh, someone who I’ve known for over a decade. We met at Google, where we both worked on making the internet easier for people in emerging markets to use and navigate.
Through our work at Google, we discovered that making the internet easy to use is a very difficult problem to solve. Introducing a single tiny change to a website or platform can have a big impact on users and engagement–both positively and negatively.
Today, Matt is still attempting to solve this initial problem of making online platforms easier to use, but by way of a very different medium. Currently, he’s working on introducing chat interfaces into various platforms through an open source language that he created alongside his company, Fixie.
As a former Harvard professor of computer science, Matt’s area of focus has always been on large distributed systems. So it makes sense that he tends to take a pretty big picture view in terms of how LLMs might eventually transform the online landscape.
Matt and I talked about what Fixie’s users are building with Fixie, the pros (and cons) of using smaller sized LLMs, and his prediction on how LLMs might be used to disrupt not only the internet and interfaces, but entire operating systems.
But first, a quick rundown on what Fixie does
Fixie makes it easier for developers to build chatbots or “sidekicks.” Whereas old chatbots were built with old tech stacks, in the future these chatbots will be built on LLMs.
With LLM based chatbots, users can directly communicate their needs without navigating through multiple pages or executing actions. The chatbot provides immediate assistance by pulling up relevant product documentation or guiding users through necessary processes. It might even interact with the website’s interface directly to expedite user requests. Fixie's conversational AI platform is designed to simplify the process of creating and managing these customized chatbots.
What sets Fixie apart is its core technology, which provides a framework for developers to construct AI applications in a way that’s user-friendly. This is accomplished through AI.JSX, an open-source programming abstraction created by Fixie, designed to develop AI applications.
Here’s my conversation with Matt, condensed and edited for clarity
A big part of Fixie’s mission is making the user interface easier to use. What is Fixie’s overall vision? Do you see its ultimate goal as making UI a better experience? Or do you see it as building these sidekicks to offer better customer support?
It's both. It seems obvious to me that we’re gradually adapting to a reality in which conversational agents are embedded within the UI surface. This happened before, back when the search engine became the interface for the web browser. The first thing you do when you’re using a browser is immediately go in and do a web search, whether you’re opening a tab or navigating to a new page. We're going to see a similar shift happen as soon as these conversational agents get really good. When this happens, the UI surface probably won’t need to be nearly as complex as it is now.
The other thing we're exploring is the idea of the AI generating UI. Something we do in Fixie is, if you're chatting with the bug-tracking tool Jira, for instance, you can say, “Show me a list of issues that are assigned to me.”
What Fixie is going to do with this command is actually call an API, get the list, and then the agent is going to say, “Hold on a minute. This is a list that’s got these fields. Let me render a UI element that's task specific for the thing that person just asked me about, which in this case was the command ‘Show me this list.’”
Now, that's a simple example. But imagine that Fixie could work with whatever type of task the user is attempting to perform and generate a range of options along with different ways to play with that data.
What would that actually look like? Can you explain that to me in greater detail?
It’s an extension of the promise of things like Chrome. The idea with Chrome is that users could tailor and customize their browser, UI, and capabilities for their specific needs. Today, I have about a dozen browser extensions that I'd be totally lost without because they interact with how I operate.
What’s really underlying this technology is a data model that represents data and material that I have unique access to whether it's my inbox or calendar or bank account. This means that, in the future, the way I interface with my information universe could be synthesized specifically for me. And it would work in a highly tailored way that doesn't require someone else telling me, ‘Here's the way that you're going to use this data.’
Right now, the problem is that all of this data and information is scattered across so many different surfaces. The best place we’ve found to unify them thus far is the notification bar, which isn't a great place either. So what if concretely, you could create or synthesize your own personal dashboard that draws from all these sources and enables highly tailored UI for each type of task?
I love that you’re thinking about Fixie in terms of taking it to the browser level and the OS level.
Actually, I think there’s an even bigger opportunity beyond those two things. The really interesting place where I would have gone with this is less around synthesizing new UI services for existing systems, but finding a way to build new systems in the first place.
For instance, if I wanted to start building the bug tracking tool Jira from scratch, how would I build it today? Would I hire a huge army of programmers building a bunch of custom Python or Java code to build up all these capabilities? Or could I potentially use a language model as the computational engine for some of them?
My broader thesis is around the idea that these language models ultimately replace conventional software because you can teach them how to do things. The model is the computer.
So from where you’re starting with Fixie today, how do you get to this point that you’re talking about? It sounds like these are pretty different things.
It may look different, but it’s still the same underlying principle. My answer here goes back to the market. Right now, Fixie is about bringing value to companies in actionable ways.
If I told the world, “Come to us so you can fire 80% of your software engineers and replace them with AI,” I don't think it would be very successful at this very moment. But I predict that in five to 10 years, this will be a thing.
What we’re doing today with Fixie is what people are trying to use AI for right now, which is building these conversational experiences that access your data and your APIs. That's a microcosm of what I'm describing broadly in terms of the ability to actually build software in a more general sense, but it is still software. It's software that's LLM-driven.
I’m interested to learn what your journey building Fixie has been like, especially when it comes to working with LLMs and handling non-deterministic results.
In addition to turning the temperature down, we’re seeing more control over results
with the advent of open source models like Llama 2. For the longest time, everything has been behind the black box of OpenAI. But with Llama 2, we can now dial in what the models do and how they work and decide how they're going to respond in different situations.
How do you do that?
Fine-tuning is obviously one of the first things we do. But there are other things that Llama 2 allows you to do in terms of controlling the way it executes some of the inputs and also allowing you to put parameters on the type of outputs you accept.
If you want to go even deeper, we've been seeing some interesting work happening where people are looking at ways of doing what’s effectively “brain surgery” on the model. It’s not fine-tuning in the conventional way of adapting the gradients. It's actually going in and directly modifying the structure of the model or the weights of the model in order to constrain it or guide it in certain ways. It's very hard to do and the methodologies for this are still very nascent but this could be a huge, huge accelerant.
Do you let your customers decide which model to use, or is the model you're using part of the product?
We allow people to use whatever models they want. They can bring their own model in if they like or they can use the models we've built in. Llama 2 just came out recently and so we're still doing a ton of experimentation to see how we can take advantage of it.
It’s still not clear if this turns into something where each customer potentially has their own very fine-tuned, highly customized variants of the model for their use case or whether this is totally unnecessary. The jury's still out on this one. I’m a big believer in fine-tuning and specialization of the models. But I know other people think that as the models get bigger and bigger it’s going to be harder to beat them in terms of generalization.
Why are you such a big believer in fine-tuning and specialization of models over bigger, more generalized models?
From a fundamental perspective, if I'm asking questions or answering questions about a specific type of software, I don't need to know anything about World War II or the Byzantine Empire. It's still useful to know these things but it's not necessarily domain-relevant.
What I do think is really interesting is the idea of taking highly distilled or compressed versions of the model that are only good at very specific tasks. They're much faster and cheaper to operate and so you get lower latency and lower costs.
I love this but the post made me smile as yesterday I posted my thoughts on my newsletter with the opposite title: "Gen AI goes beyond the UI". https://marily.substack.com/p/gen-ai-goes-beyond-the-ui :) We overlapped at Google with Matt Welsh, really amazing work!