Brief summary
Anthropic Mythos garnered significant attention when it was launched in mid-April 2026. Yet despite it apparently presenting an unprecedented threat to global software, you don't have to look to closely to see that this was an effective product launch as much as a story about the grave security risks of today's AI models.
But this isn't to say there aren't important implications for software developers, security professionals and other technologists. In this episode of the Technology Podcast, one of our new hosts Nate Schutta is joined by Chris Kramer to discuss Anthropic Mythos and Project Glasswing, unpacking what's hype and what really matters.
A few links for this episode:
- Some more information about Project Glasswing.
- A story about how a small Discord group briefly had access to Mythos.
- How Mozilla used Mythos to discover Firefox bugs.
Nate Schutta: All right, hello, and welcome to the Thoughtworks Technology Podcast. I am one of your hosts, Nate Schutta. Best way to describe me as architect as a service. I am here with my good friend, my colleague, my sometime co-conspirator, Chris Kramer.
Chris Kramer: Hey, thanks so much, Nate. I'm Chris Kramer. I'm a AI and machine learning leader at Thoughtworks.
Nate: Outstanding. Thank you for taking some time to chat with me here today. I've been wanting to pick your brain about this since the announcement came out. Let's just dive right into it. What do you think about Mythos? Is this finally going to be the AI that destroys humanity? Is it just a stepwise change? We've got a lot of things I want to dig into, but let's just start there by setting the table on Mythos, your thoughts, and then we'll see where that gets us.
Chris: The community has been very interested in what's going on with Mythos, obviously. It is not the doomsday AI, in my opinion. It is a stepping stone and very much a representation of the scaling limits we are now reaching with the LLM topology.
Nate: It feels a little bit to me like things have slowed down a little bit in that regard, and that there was this period where a new model would come out, and it's like, wow, this just absolutely blows away everything we had before. As is always the case in technology, that pace has definitely slowed. From what I can tell, there's definitely an improvement.
Now, for those that maybe haven't been paying as close attention, when Mythos was announced, Anthropic came out and said, whoa, this is so dangerous that we need to contain it. Then they announced this Project Glasswing, which was basically a collection of technology companies that were given access to it so that they could explore their code base, make sure there weren't very glaring zero-day CVE-type things.
We have started to see some announcements come out of that. I know Mozilla just announced, I think it was 271 bugs that they fixed. It is clearly finding things that have been there in some cases for 10 years, 20 years. Although I'm very curious as to if that's something unique to Mythos or if that's something that older models with similar prompts also would have found.
Chris: Going back to one thing you said, this recent article I saw that a Discord chat, I believe, somehow had access to Mythos.
Nate: Yes.
Chris: They obviously didn't run Mythos on itself. I would say that, given enough time and the right agentic harness, these are probably faults that another model could find, but there is a lot of ability tied up into the hidden state of these large language models that obviously Anthropic has unlocked some of that potential without needing a massive upgrade in chips. We can certainly dive into what we suspect is going behind the scenes with Mythos.
Nate: There's a couple things I want to pull on there, but I do want to start with the CVEs. It is clear that Mythos is finding bugs, and I think that's not disputed. The question I would have is if Mythos found, let's say, 10 bugs in your code and half of them are real legitimate bugs that need to be fixed, not the, well, gee, if you prop your front door open, turns out anybody can walk in. We knew that. You leave all your ports open, that's not good. The weird, interesting ones where, if you pass in this wild card and it goes through this path, then all of a sudden you've got root access. That's the kind of stuff that we obviously want these things to find.
What isn't as clear to me, or at least some of the anecdotal evidence that I've pulled from talking to people, it seems like older models would have found most of them, maybe not all of them. When I think about this as a stepwise improvement, that feels more right to me than we've suddenly hockey-sticked in a new and interesting way. Is that what you're hearing and seeing?
Chris: It is. I think maybe the finding with Mythos is perhaps more so that there's a lot more duct tape holding together enterprise software than the perhaps populace thinks there is.
Nate: No one could have predicted this.
Chris: I know. That now we have some of the tools that are doing those at speed. I think maybe that's the distinction we're seeing with Mythos is really not the ability is brought down into the reality, but the ability to do it in a reasonable time frame.
Nate: Do you think that this portends a future where part of our pipelines will be, and then AI scans for security vulnerabilities, and is that a dramatic departure from some of the existing tools and techniques we've been using?
Chris: It is. Touch on this paper I mentioned just before we started speaking, which is I have seen some, what we call agent-first code bases out there. These are code bases where the commits are expected to be done by an agent, and then through the GitHub actions, pipeline, whatever CI/CD you're using, there's also an agentic security scan and some patches and bug fixes all done agentically. Side note, I think there's a really interesting question there about IP rights and ownership, which you've brought up before, Nate, but regardless, that seems to work best on small to medium-sized code bases.
I think this is basically going to continue to be a problem with LLM-based agentic architecture because of something called document poisoning. This is what this new Microsoft paper touched on, which is the longer the task, the more-- I'm going to butcher the thesis of this document. We can share it as part of the link. The longer running a task, the more likely a corrupt or bad statement in some document is likely to cause the whole thing to just go off the rails. That's exactly what I'm seeing in these agentic code bases is that the longer it's sitting there by itself, the more fragmented its thinking and the more poisonous bad assumptions get over time.
Nate: Interesting. I do think that's a fascinating part of this, that it feels like many of the things that we've tried to do for the ways our own brain works, like break the problem down, decompose the problem, short tasks, a lot of those same tips, techniques, tricks apply using AI effectively. It gets back to some of the conversations that we've had about how do we apply this to our software development life cycle? It seems to me, self-servingly maybe, but that the fundamentals of software engineering are pretty darn important today, even with these tools that allow us to potentially move a lot faster.
Chris: Absolutely. I think we're still in a place where I, at this moment, would not trust an agent to write a full feature by itself. Granularity still dictates a human breaking that into chunks or doing a massive code review.
Nate: I think that speaks to the fact that as more code is produced, almost by definition there are more bugs, there are more defects, and that's true whether a human writes it, whether AI writes it. Do you think that with AI potentially writing more code with code bases potentially getting larger, are we opening ourselves up to more of these zero-day CVE-type problems? Is AI going to find many of them anyway, so it's really just a wash, we're basically where we are today? What do you see as the impacts of that on the software we're producing?
Chris: I do very much think we are in a weird transitory phase where we are experiencing the growing pain of a technology that absolutely is going to radicalize the job of a software engineer. I make no claims that it will never get there. I think it will probably be there in as little as a year from now. For now, we're in this place where the development velocity is not seeing the uplift just yet, as it relates to the number of lines of code that are being output.
Nate: I do think that's an interesting way to look at this. I think that a lot of organizations, they don't even know what their current baseline is in terms of velocity, like how long does it take for them to create a feature, put it out there. If you don't know how fast you're going now, you can't really say whether AI is making things better or worse. One of the things that I've been thinking a lot about here over the last few weeks, as various conversations I've had with folks like you, with speed, the faster you go, the riskier things get.
If you think about driving through your neighborhood at 20 miles an hour versus driving through your neighborhood at 70 miles an hour, there's a huge uplift in risk when you're driving faster. The analogy I've thought about is you, and I have driven at highway speeds for many, many years. Clearly, we've done it safely for many years. That doesn't mean you can put us behind the wheel of an F1 car and expect us to get around the track without crashing or hurting ourselves or hurting someone else.
I do feel like that's a potential issue as we introduce more of this into the software development world. Without the proper training, without the proper guardrails and harnesses in place, how do you move fast safely? I'm curious if you have any thoughts on that, how you see companies dealing with that.
Chris: I have a bet, and this is not an original bet, I think the industry is very much coalescing here, which is agent harnesses, meaning the wrappers and tooling we put around these agentic systems and LLMs, that's the new breed of prompt engineering, context engineering, whatever it is. What that points at is that that's where the IP is starting to live of doing this safely. Meaning you have to either A, build yourself, or B, really trust the framework, the skills, the ways of thinking, the ways of working that these agents are relying on.
That only addresses half the equation. That addresses the agentic side of stuff. The other half is the human side, and how are people actually interacting with these agents. I think that's where there's still-- Very few organizations have touched on that. It's still a special snowflake. Every organization we go to, what is the right fit for that organization of enabling developers safely with this software?
Nate: That's a good point. A lot of it is, where are they on their own software development journey? You've got some very, very mature companies that have been doing this agile and whatnot at scale. They've got those practices and principles in place. If you've got someplace that's more chaotic, maybe doesn't have all of that, it's a rockier road for sure. The way the Firefox team said is basically, we're in for a rocky transition here as we start adapting to and adopting these tools. I'm going to be very curious to see how that all plays out.
Anecdotally, I've heard a couple stories recently, one where an organization is now seeing 50,000-line diffs from AI. They're theoretically supposed to review all those, but what developer is going to look at a 50,000-line diff? If you and I were on a project and a junior engineer checked in something with a 50,000-line diff, we'd have a conversation with them, and we'd talk about why you shouldn't do that, and then they'd learn, and they wouldn't do that again, hopefully, or we'd have to continue coaching them until they learn that lesson. AI can't really learn that. We have to put these harnesses, these guardrails in place. It's like, no, do not do a 50,000-line diff unless you're, I guess, reformatting or something silly.
Chris: I might be going a little tangential to your point just now, but in researching Mythos, I found a quote. It was that intelligence and reaching a goal are not necessarily the same thing. I bring that back to how these LLMs and agents are trained. They are very much optimization techniques that we're using to get these agents to do what we want. That's what we see quite often in these agents and LLMs acting as yes men. I think that very much also goes back to A, perhaps more code is better. Maybe there's a perception of that in the models, and B, this disconnect between writing code and thinking about code.
Nate: You do bring up a really interesting point. These models, almost any time you ask anything, it's like, "Oh, that's very insightful. Wow, Chris, you are so smart for thinking of that. That's a great question. That was so well phrased." I do appreciate the ego boost, this makes me wonder, should you really be doing that? Is there a way for me to tune it so that you just give me the straight advice? Sometimes you got to tell me what's really going on here. Don't soothe my ego necessarily.
Chris: I know. It is nice to get a pat on the back every once in a while, though.
Nate: Oh, totally. I worry that some developers are going to think that their LLM always says I'm an amazing developer, I'm the best developer they've ever met, so obviously I must be really good at this.
Chris: I've seen a whole slew of-- I don't know if I want to call them pre-seed AI slop startups and what would be radical, transformative changes to computer science that are very much just an echo chamber of bad code that doesn't do what it says it does, but reinforced by pats on the back and pseudo-intellectual white papers. I don't know what the impact of that is on society, but I'll be curious to find out.
Nate: That's an interesting point, Chris, because I think a lot of the chatter or discourse around AI, especially when it comes to writing code, creating code, seems to come out of startup land. I think there is a segment of the population that thinks that all software is only created by startups. You and I have spent our entire careers working with legacy organizations, older organizations, companies that have code that's been around for 30 years, 50 years. They have existing business practices that you can't move fast to break things. Breaking things results in outages that cost you millions and millions of dollars.
How do you see these tools? How do we safely apply these tools in those environments? I think maybe another way I'd ask this is maybe being on the bleeding edge isn't the best place to be in some cases.
Chris: I think we see a paradox, really, where there's been research that companies that are really leveraging AI to realize value are organizations that enable teams to run kind of startups, which is exactly the opposite of what you're saying, which is move fast, break things, push with bugs, and if someone sees it, just pull it back and fix it. A lot of organizations are just not set up that way. There is almost an orthogonal shift that needs to occur in a lot of large organizations to truly benefit from agentic AI, which where I'm seeing that startup, it sounds like such jargony phrasing.
Where I'm seeing that mentality really work is where these guardrails, policies, et cetera, are baked into the platform that developers enable themselves with for these agents. As an organization, you really need to create a seamless developer experience where they can go out, grab one of these agents, grab a key for cloud code, whatever, start developing, and all of the safety guidelines that need to be in place to prevent these million-dollar bugs are behind the scenes and not something that gets blocked through process or still having big code reviews, whatever it means. There's a balance that needs to occur in a shift in large organizations.
Nate: I would say one of the things we've been trying to do for a long time as part of shift left is let's make the right thing to do, the easy thing to do-
Chris: Correct, yes.
Nate: -so that you don't have to think about it. Let's make it so when developers are doing their day-in, day-out work, the harnesses and the validations and whatnot are baked in so that we don't have to rely on, "Chris will do the right thing. Chris is good. He knows not to leave the front door unlocked." Yes, maybe. Maybe he forgets one night. If you bake it in, it's a lot harder to skip a step.
A few other threads I'm interested in pulling on here. You did mention this earlier. It does feel like we've hit some interesting limitations in terms of how we train these models. Again, we're not seeing that exponential, new model comes out and just blows away what we had in the past. I think it's when we hit these constraints that some creativity starts happening, and we see some new ways of breaking this. We saw this with DeepSeek. Here's another way of training these models, it doesn't always have to be on the most expensive, fastest, baddest chips.
What do you think is coming in that regard? What's the next step in how we train these models? What do you think is going to, as Gary Marcus would say, break through that limitation on scaling?
Chris: We have seen that with what Mythos is. To talk about that, I'll go back in time just a little bit. Before LLMs, and I should say before transformers and this paper that I'm sure most people have heard of by now called Attention Is All You Need. The prevalent model architecture really had to do with recurrence. We know that from software engineering, obviously, as recursive thinking, which is, basically within a model, we have this unit that unfolds itself as the task dictates. It can get more complex dynamically depending on what it's doing.
That idea was lost when transformers overtook recurrent neural networks as the pervasive model architecture. What I mean by that, excuse me, is that it's just a gas pedal to the floor, let's see how far we can get with transformers by just exploding the reasoning section of the model.
Bringing us back to today, I think what we're seeing with Mythos and other recent model advantages is-- I don't think. Papers suggest that this is exactly what's happening. There's a code base called Open Mythos, you can go check out, that implements this. Is we're starting to inject loops back into the reasoning portions of these models where as a sentence, in the easiest case, is going through, the logic can actually pass back to itself to continue thinking about that.
That makes these models really hard to train though, because before, we only had a end layer model we had to worry about optimizing. Now the permutations of looping back, how many times do you loop back, become exponentially more complex. That's why until recently, it really wasn't worth productionizing at scale that architecture. With scaling limits, I think we're at the point where companies like Anthropic are starting to do that.
Nate: It is always interesting in our industry to see when we've run into the limit on something, what do we do to get around it. I feel like that's often where we do get these really interesting jumps and steps forward that we couldn't have before. I think it was Ethan Mollick who wrote a piece about the jagged frontier of AI where it seems like it's so good at some things. It really has gotten very good at coding. I think that's very clear that the code it's generating today versus even just a few months ago is pretty remarkable. Yet it still screws up other things.
We all have our favorite example. I mentioned this to my wife the other day. I said, "You can't lick a badger twice." She's like, "What?" I said, "Oh, you don't know that reference?" For a while there, you could put any idiom you wanted into Google and it would, "Oh, yes. That's a well-known idiom. That means blah, blah, blah." You can't lick a badger twice. Although, I guess, in fairness, all idioms are made up. That's fine. I'm curious to see, and I'd love your thoughts, what do you expect to see in the future, whether that's six months, a year? Where do you feel like that next frontier is going to be where we're going to look and go, "Wow, it really did get good at this now"?
Chris: That's a good question, a big question as an industry or as a society, I don't know what the right topology is at this point, but certainly, we've drawn a circle, as it were, a boundary in the sand where some sort of semantic-based embeddings of knowledge, plus an agent harness, plus an LLM, creates a really good proxy for a lot of knowledge working tasks. I really think when we see the next radical transformation, it's one of those three things that is going to have a transformative paper come out that is completely orthogonal to how we're currently thinking about these things. I don't want to diss the ingenuity of humans here, but I think at some point, some of these transformations are going to be AI-driven.
Nate: For sure.
Chris: I have a hypothesis, and maybe this is from too many years of reading science fiction, that there's some sort of knowledge boundary where all of our AI advances to date are very much tracking against a biological way of designing intelligence, meaning this concept of neurons and brain-like structures. At some point, I think AI is going to come up with something that we couldn't even have imagined. That's what's going to lead to some of these transformations coming down the road.
Nate: That's a really interesting point. I remember years ago, and I guess this would be more machine learning than what we truly consider AI today, but reading about how do we create the most efficient antenna for a satellite or something having to do with a space probe kind of thing. As humans, we're drawn to symmetry. Some of the designs that we come up with are not actually the best approach in terms of giving you the best antenna, but they look good. We naturally gravitate towards that, whereas these learning models don't have that constraint.
They'll actually generate this is the most efficient antenna, even though a human looks at it and goes, "Ugh, that's ugly," but it's actually the best way to do it. I think, to your point, we'll likely see something along those lines where a human would not have made that jump or that combination because, obviously, you can't combine these three things together. Everybody knows that. It's like, "What if we just try it? Oh, look, it works." Then God only knows where you live. Of course, maybe that gets us into the book that came out. I think it was fall of '25, If Anyone Builds It, We All Die.
Chris: I'll kick it to you to dive into that one. That's on my list. I haven't read it yet, though.
Nate: I think they make some interesting points about you cannot possibly know the motivations of essentially an alien mind. We think that we can bake into it, "You like humans. Keep us as pets at worst." The reality is we can't understand. We can't even really understand how it makes some of these connections. It's just these giant strings of numbers, and you're like, don't know how it got there. As someone who has spent their life writing deterministic code, it's a little unsettling to have something where it's like, we don't know how it arrived at this answer, but it did.
Chris: A article on a blog on Hugging Face really caught my attention recently, which very much has to do with this whole Mythos situation, albeit coincidentally. It's about this, essentially, a researcher in his basement with two GPUs. He wanted to see if he could make a model that could be at the top of the Hugging Face leaderboards with just those two GPUs, consumer-grade GPUs he had in his basement.
What led him down this hypothesis that eventually wound up to be true is that he was playing with different LLMs that he passed in base 64 as the input, nothing else. The LLM understood it and could work with it perfectly, which doesn't really make sense if you think about how these models are trained, which is in human-readable language. The fact that you could put in base 64 and then get a comprehendible output even, led him to believe that there is some combination of reasoning layers under the hood of these models that were doing different things, like translating base 64 into more representative meanings that the model could then work with.
You can find this blog post, by the way, by looking up RYS, Repeat Yourself, which was the name of his model. He started to basically do brain scans on these models, where he was taking different permutations of layers and then pointing at a layer and having it repeat itself or go back two steps. Using this, he could actually find reasoning-related layers within these models where these layers were doing similar things. By repeating them, he could extend out the thinking cycle of the model, and therefore the quality, without actually having to train a new model or significantly change the size of the model.
We basically just took Qwen2, 75 billion, and added some of those loops we were talking about earlier with Mythos. That model, I think, was on top of the leaderboards for a year, if not more, on Hugging Face. It's a really interesting blog post. If you go in there, he almost has MRI scans of the model showing how he went about figuring this out. It's a really interesting read.
Nate: That's fascinating. Because of where a lot of the attention has come from, where a lot of the money has come from, I think we've put a lot of attention on the big companies, the open AIs, the Anthropics, the Microsofts, the Googles, the Amazons, et cetera. It is possible to do these things in your proverbial basement. I've got a friend who's got a three-GPU set up at home.
In addition to using it to keep his office warm in the winter, he does some pretty fascinating things on there. It does make you wonder, is that where the next breakthrough is really going to come from is someone who's essentially tinkering in their garage, doesn't have some of the constraints maybe, or the, "We have to go down this path because, the lead researcher says, that's what we're going to do this." I'm going to go over here and play and see what happens.
Chris: Just like we saw with these scaling constraints, humans really get the most creative under constraint situations, similar to the transformation we saw with the Qwen models in their optimization technique. I think we will see similar transformations in the optimization space when these large model providers stop subsidizing tokens, and people in organizations start to see the real cost of these models they're running.
Nate: I'm really glad you brought that up, because that's one of the things I'm starting to hear from folks where we've been on these subsidized, all-you-can-eat plans. I've read some reports where developers on the $200 a month plan are costing the tool vendor $50,000 a month. I don't have an MBA, I'm just a techie, but I don't think that's a viable business model to charge $200 for something that costs you $50,000. That doesn't seem like a good way to make profit.
Right now, this whole industry has largely been subsidized by a lot of investment and a lot of money chasing this. You can start to see the edges of that where companies are going, "Hey, we've poured an obscene amount of money into this, show me the profits." What do you think happens when that switch flips and we're no longer using subsidized tools that are using subsidized models and everyone says, "Actually, I'm going to have to charge you what you're really using here"?
Chris: One of two things is going to happen. We're already seeing early signs at organizations where boards are starting to say, "Where's my business value," essentially. The first thing that might happen, I think, is that some organizations that hadn't seen tremendous business value from AI, probably just going to throw their hands up and say, "We're putting caps on everything. We're going to limit our spend here. We tried it, wasn't for us?
Alternatively, I think a lot of organizations actually have an opportunity to really maintain their value while cost optimizing through self-managed model stacks, GPUs, their own foundational models, whatever that looks like, depending on the use case. I think it's organizations, and I'm stealing this anecdotal data point from Andy Nolan, another Thoughtworker, it's really organizations around the 500,000 use cases around the 500,000 and up price point that self-hosting really becomes the best option. A lot of large organizations we're going to see are just going to go internal, but for the most complex of use cases, I think.
Nate: Oh, that's interesting. It'll be curious to see how organizations respond to that. I think it's interesting to me that some companies are actually tracking people's token usage, but as a way of ranking you. I know some companies have leaderboards. I was talking to a friend of mine whether this was explicit or implicit, it's like, "If you're not using enough tokens, we're going to wonder why." As opposed to, "You're using a lot of tokens. We need you to cut back."
I feel like this is yet another example of just because you can measure something doesn't mean it's a useful metric. I would like to think the right way to look at this is, tell me how much business value you're generating out of those tokens. You and I have had that conversation before where, "I need lots of tokens." Why? "Because I need lots of tokens." Right, but how much business value are you generating with all those tokens? It'd be curious to see how that plays out.
Chris: It very much feels like a hype metric, the token contest, I guess we'll call it. Very much smells like a NFT bro attitude, or a Bitcoin bro attitude, which is YOLO, let's just throw coins at the problem until it fixes itself. I totally agree, though, that this is another tricky governance situation where ideally, token use needs to be tied to an articulatable business case with clear business value. If you don't have that, you either are not measuring the right thing or don't have the dashboards, KPIs, whatever it is in place to measure the right thing, because tokens is an arbitrary signal of nothing.
Nate: It's the same as judging a developer by lines of code written or lines of code modified, or lines of code deleted. Yes, it is measurable. That doesn't mean it's actually valuable or measuring what you think it's measuring. Maybe that's another constraint that will get us to a better place, where instead of it being burn as many tokens as you can, I shared with you something that someone shared with me on Instagram of the hypothetical CEO, "I need you to spend a trillion tokens a month. How many tokens are you spending?" My agents are making other agents feel good. Like, oh, wow, that's doesn't seem like a good use of money.
Chris: My poetry writing agents.
Nate: Yes. My agents writing poetry, and then I'm judging it. Maybe that's part of it too is as we're experimenting, because there's a lot of experiments, because this is a brand new tool. I think one of the challenges in software from the beginning is that we are often building things that no one has ever built before, using tools that we literally just invented, and so it takes some time.
I suspect when we started building buildings of any specific size, we made some mistakes, and walls caved in, and people lost limbs or died, and we learned, oh, this is not the right way to do that. This is the right way to do it. This is the property of this. This is the math behind it. I think we'll get there, but I think we need to have a little bit of patience and understand that we're going to have to try some things before we zero in on. This is the best way to get there.
Chris: Historically, I think we also have seen, with ML, a lot longer runway before models actually reach the public consciousness, if they ever even do. What I'm thinking of is something like what Meta might build, like the Instagram algorithm. They're refining those algorithms behind the scenes and training against those for years before. Maybe that algorithm was a bad example, but for years before the public was really ever conscious of these things.
LLM, it's been such a fast approach on the hype cycle that we have multiple compounding things that make this a very complex space, which is A, user behavior, B, actually getting the underlying LLMs to behave like we want. Then C, getting the agent harnesses on top of that to behave like we want. It's one of these situations where you turn one knob, and the other thing goes crazy, and vice versa. Bringing it back to earlier, I think we're very much in this weird transitory, growing-pain space.
Nate: I definitely feel that as well. I pendle them between some existential dread. As I mentioned earlier, If Anyone Builds It, We All Die, I don't really think that's a likely outcome, honestly, to some very optimistic things. I think about the kinds of applications, the kinds of software that these tools will enable us to build that we could not have built in the past. Back to learning to code with a literal text editor compared to the IDEs we have today, and I think what an exponential increase in what we can do, the complexity of problems we can solve.
I think the most positive outcome of this is the kinds of problems that we cannot solve without these tools, and what is the benefit to all of us for that. Whether it's much better weather forecasts. Think about how much better that's gotten in our lifetimes. Wasn't that long ago that you'd have to look at the back page of the newspaper, and we're like, it might rain today. Now my phone can tell me that rain is stopping in 15 minutes. It's usually right within plus or minus a few minutes. Is there anything you see coming down the slope here, coming down the pipe that you're like, "man, this is really going to be pretty cool, or can't wait to see how this impacts us"?
Chris: I'm excited to see the stabilization of these models and what I think will be their exit back behind the curtain, again, where machine learning usually lives. Not that they're going to go away, but I think LLM-based agentic AI is very much going to become a sidecar to workflows and tasks that once we've solved a lot of these guardrail issues, harnesses, task attention, staying on task, we're not going to be as interactive with these things.
I think that'll do a couple of things: a), we start to move away from these metrics like number of tokens spent, and get back to focusing on accomplishing actual tasks; b), I think it gets out of the way for whatever comes next, and it's very hard to predict what that is. This is even pre-LLM, just regular old transformer-based. We've seen a lot of cool stuff come out of this space in terms of how it gets applied in medicine with protein folding and drug discovery. I think we might see something like that, which going a little bit back to what we were talking about earlier, but some transformative way of developing these models or representing knowledge that is not token-based, spitting out tokens, I really think that's what's coming next, maybe in the next two years or so.
Nate: You mentioned this earlier, and I do want to circle back to it, but so Anthropic announces Mythos and says, "Whoa, this is so scary, we can't release it." Then announces Glasswing, and we're only going to release it to certain partners to play with it. I know at least anecdotally that the companies that had access to it, the NDAs, were strict to the point where you couldn't even really admit to your coworker that you had access to it. Then we discovered that if you were on the right Discord server and you went to basically the right location on the web, you could find it. Any thoughts on if it's that scary and that dangerous, shouldn't it have been harder to find?
Chris: Certainly, a few things with that. I'm pretty sure we had the same perception, or OpenAI was doing the exact same thing for GPT-3 or maybe GPT-4, which was this model is so radically different and intelligent that we cannot release it to the public. Then three weeks later, there it was. Today, we sit here, and I would call GPT-3 trash compared to what else is out there. This is a terrible analogy in terms of how it represents what I think of humanity, but handing a child a gun for the first time without teaching them about safety, whatever else is always going to be a dangerous thing, so maybe that's why they slow-drip these models.
At the same time, we know that Anthropic is pouring billions of dollars into this stuff. This could be a really clever marketing play, even. This all came to be through a leaked press release that someone found buried in some code. We don't know if their product team put that there. I doubt that. I guess what I'm saying is that there's a million reasons that they could be keeping this model under wrapped and it could very much simply have to do with capacity. They don't want the public knowing that we're a little short on GPUs right now, and so we're going to tell everyone this is too dangerous. I guess it's hard to speculate, is what I'm getting at.
Nate: I think it's always wise to have some salt when you listen to some of these releases. I think we always have to ask, is there some other motivation behind this? Whether that's to pump up a stock price, whether that's to, to your point, we don't actually have enough capacity to let everybody play with this, so we're just going to say it's too dangerous or we're just going to constrain that to our partners or trusted friends to go ahead and look at. I always think, how does the phrase go, "Extraordinary claims require extraordinary evidence?" We'll see.
I'd like to wrap up, Chris, with the fact that there's so much that goes on in this space. One of the things that I love about my job is I get to pick your brain on a regular basis. How are you keeping up with all the changes in models and papers? Are there any resources you recommend? Anything that you would say, "If you really want to try to keep your fingers firmly on the pulse, here's how to do it"?
Chris: We're in an age where that's a difficult question. You very much fall into AI slope quite quickly, and not necessarily the content, but how it's presented to you. I want to call it a legacy feature at this point, but on my Android phone, and I'm sure you can get to this through just the Google page too, they send me a curated stream of articles and stuff, and they do a great job of finding recent papers and stuff that I would find interesting.
LinkedIn, but you have to make sure you're following trusted sources, people you trust. I'm certainly guilty of it myself. I've seen claims on LinkedIn that don't always hold up when you read the underlying paper. Make sure you're following trusted people on LinkedIn, Reddit, Twitter. We've always had this problem, though. This is media literacy. Don't believe everything you see.
Nate: It's getting harder.
Chris: Absolutely. What about you? Where are your go to sources?
Nate: Same thing. I've got this curated set of feeds that I follow, and that usually does a pretty good job of surfacing these things up as they occur. I think that's it, is you just have to have those lines in the water, and then react accordingly, and then adjust. There are some very good people. I do read a lot of Gary Marcus. I've been following a lot of Ed Zitron as well. Although Ed writes a lot, you know what I mean? Seemingly everything he puts out is 12,000 words, 16,000 words. I'm like, "I really need the executive summary of this," and it's great.
There's some amazing stuff in there, but it's like, "I've got to strap in, this is going to be a chunk of time I have to devote to it." I think that's it. The other part of it too is just having these kinds of conversations. I was at Arc of AI recently. Just talking to other presenters and attendees, and just having this back and forth is really valuable. Just like, "What's working for you? What's not working for you? Where are you struggling with?" We have to work together to figure this stuff out. We'll get there faster as a community then we will each of us trying to do this on our own, and so I would encourage you to engage in these conversations and see where that leads us.
Chris: That's a nice conclusion to land with, which is, I think AI companions still cannot replace good old human interaction.
Nate: 100%. I think that is the perfect message to end on, Chris. I want to thank you for hanging out with me, brother, pleasure as always.
Chris: Thanks for having me.
Nate: I want to thank all of you for hanging out with us as well. I hope this was useful. See you on the next one. Cheers.
Chris: Bye.