Posted 2026-06-16 • 8 min read
Something is happening. In a single week we got a cap on intelligence you can rent depending on passport color, microsoft is throwing a towel at buildign large models and now urges enterprises to write loops and Cursor is aiming to replace github and starts to build their own general model and product.
This all extremely synchronised with how i personally work - how this could be possible that stuff in my head and Satia's one is the same? Surely i dont have same visibility of the market. But i started to write loops, environments, instead of doing coding manually. "Go figure" i am tellling my agent and almost always it one shots investigation, most of the time one-shots solution. I was annoyed that Fable rejected questions "find all vulnerabilities when we assume this and that", but thats only question of time. Writing loops to build and then launch huge model to evaluate if it clears a bar - this seems to be a short-term future already. Many nice tasks are loop-able. UI performance, UI consistency, backend performance, raw algorithms, spec-comformance. It seems next gen models would simply do it with less mistakes meaning that loop would end faster with less feedback iterations.
I event tried to write VALUES.md, where i put what i value personally in specific project - do i value performance or code clarity? what kind of algorithms i like more? what kind of libraries? what i would like to use for inspiration? it is really high level stuff, much higher than before and it seems it works really well. Like "hey i like how sync is working at telegram with this and that" or "app must be absolute maximum startup performance with full data immediatelly shown without loaders". And it works fine, it can benchmark evaluate itself and pick good approach that i like.
I dont like the result tho, it is not fun to maintain often and to read code you get back to the AI to understand how it works - it would summarize faster than your reading. So it is slop after slop. It is like CEO isolating themselves from everyone via some execs which now act as a buffer and org often start to rot. What if the same happens in organizations when they use ai heavily to replace middle management?
It seems next step is just next cycle of psychosis but within executives, i would even thinkg that this psychosis would be cyclical. Different enterprises move with different speed and different will hit pyschosis at different times. In coding this cycle repeated multiple times already. Starting with AutoGPT 3+ years ago, then computer use was supposed to fix everything. Frankly it somehow barely worked - and any better model solved most of the problems just like we saw Fable solving all frustration of using opus. No harness helped to solve the frustrations of the model. No amount of self-improving loops helped gpt-3.5 turbo to ship working code. Until Anthropic decided to get opus to learn how to build personal knowledge base, it somehow didnt work well before and today it just one shots it.
I am trying even simple things with opus level models and they can't get many stuff correctly. I am experimenting with Pydantic Monty and i have to put 20k characters system prompt to explain model how this environment is different from normal python and it still tries to import packages that are explicitly were told to the model that they do not exist. They still hallucinate functions that doesnt exist tyring to imagine what could be in there. GPT seems to handle this much better, but it is much less human. So now opus must be trained on monty dialect or what? If models fail at this how would they not fail at middle manager tasks?
Cutting edge engineers are trying to be agent managers now, which is essentially middle management like role. This is a very tedious work, everyone learned that you can really only have managing 3-5 complicated tasks, may be scale it to 10, but that's it. So now engineers trying to scale management (which is the same task as the replacing real middle managers at orgs), but this still fails miserably.
And this is a current cap on intelligence that we can have for a while! Composer would be similar to the Opus/GPT, chineese models would be probably even better at some stuff but would be still on a low taste spectrum. Meanwhile opus and gpt now are worse than previous models - so everything will probably converge. Anthropic idea is that solving coding means solving AGI, so if Composer would solve coding then it would be on the same level as Claude according to the Anthropic thesis.
Honestly writing this all i feel that LLM is still text processor, text in - text out. Even a simple double repetition of prompts still yields a better results because of how casual attention is working. Even Claude Code has repeitions in it's system prompts. Emergent capabilities so far seems by mixing different things together, like something fomr algebra and something from geometry and produce interesting results, while it is usually studied by a separate mathematichians and never tried together. But will the bio capability help with running a tire shop? Definitely better than 3.5 turbo and it gets better, so i guess yes? Is it now? Probably no. Can you edge it? Very hard, harness become barely useful after models got trained more for coding. AGENTS.md was always stale until codex learned to maintain it after few releases.
Have you guys tried to fix something offline with GPT? Like ask it about your car: you have to push back on half of the sentences to make it work, it constantly mixes up the generations of the car, hallucinates part failure modes, sources knowledge from random websites. Few guys got one part and it worked for them, i got similar one and got fucked on track with some risk of dying, but GPT/Claude would tell me it is good part based on few comments on the renlist. So to solve that you must build some kind of verified knowledge base? But how? it is barely exist in digital form. You had to learn, from internet, from talking to people, build an intuition who is good enought to listen to, look for reputation when possible? reddit data is weird: which is better r/sf or r/sfcirclejerk? Quite often parody becomes more real than original. Who is going to decide? Clearly for different people different answer. But thats what labs were doing year ago - trying to get private data, but i guess they figured out that such data wont exist, so they apparently started to buy RL environments (but i barely see people who does that). It it takes 10b to train a model for coding which has insane amount of data, how much time would it take to train on hidden knowledge of car repairs? So like are we going to get Porsche 911 78' and tweak everything and test until asi would learn things? Or idea that asi would manage everything and then learn that? but why would anyone gave this data to the labs since it would kill your business instantly? trad businesses know that well.
So some of my predictions:
We neglected happy for a while mostly because we anticipated some dramatic change where we couldn't be relevant anymore, but what if there wont be any? it feels we are in perpetual motion - just one more prompt, one more guiderail, just 6 more months and models would one shot that. Yeah sure, but should we simply stop doing anything for couple of years? We did at happy and it was a mistake - nothing new did came since then.
@ex3ndr@ex3ndr@founders@ex3ndr