2025-07-15 00:59:24
Claude Code logs detailed usage information to the
~/.claude/
directory. ccusage is a neat little Node.js tool which reads that information and shows you a readable summary of your usage patterns, including the estimated cost in USD per day.
You can run it using npx like this:
npx ccusage@latest
Tags: javascript, nodejs, anthropic, claude-code
2025-07-14 02:47:13
Today is the 20th anniversary of the first commit to the public Django repository!
Ten years ago we threw a multi-day 10th birthday party for Django back in its birthtown of Lawrence, Kansas. As a personal celebration of the 20th, I'm revisiting the talk I gave at that event and writing it up here.
Here's the YouTube video. Below is a full transcript, plus my slides and some present-day annotations.
Presented 11th July 2015 at Django Birthday in Lawrence, Kansas
My original talk title, as you'll see on your programs, was "Some Things I've Built with Django." But then I realized that we're here in the birthplace of Django, celebrating the 10th birthday of the framework, and nobody's told the origin story yet. So, I've switched things around a little bit. I'm going to talk about the origin story of Django, and then if I have time, I'll do the self-indulgent bit and talk about some of the projects I've shipped since then.
I think Jacob's introduction hit on something I've never really realized about myself. I do love shipping things. The follow-up and the long-term thing I'm not quite so strong on. And that came to focus when I was putting together this talk and realized that basically every project I'm going to show you, I had to dig out of the Internet Archive.
Ten years on from writing this talk I'm proud that I've managed to overcome my weakness in following-up - I'm now actively maintaining a bewildering array of projects, having finally figured out how to maintain things in addition to creating them!
But that said, I will tell you the origin story of Django.
I put this annotated version of my 10 year old talk together using a few different tools.
I fetched the audio from YouTube using yt-dlp:
yt-dlp -x --audio-format mp3 \
"https://youtube.com/watch?v=wqii_iX0RTs"
I then ran the mp3 through MacWhisper to generate an initial transcript. I cleaned that up by pasting it into Claude Opus 4 with this prompt:
Take this audio transcript of a talk and clean it up very slightly - I want paragraph breaks and tiny edits like removing ums or "sort of" or things like that, but other than that the content should be exactly as presented.
I converted a PDF of the slides into a JPEG per page using this command (found with the llm-cmd plugin):
pdftoppm -jpeg -jpegopt quality=70 django-birthday.pdf django-birthday
Then I used my annotated presentations tool (described here) to combine the slides and transcript, making minor edits and adding links using Markdown in that interface.
Tags: adrian-holovaty, devfort, django, history, jacob-kaplan-moss, lawrence, lawrence-com, lawrence-journal-world, python, my-talks, the-guardian, annotated-talks
2025-07-13 02:12:23
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
METR - for Model Evaluation & Threat Research - are a non-profit research institute founded by Beth Barnes, a former alignment researcher at OpenAI (see Wikipedia). They've previously contributed to system cards for OpenAI and Anthropic, but this new research represents a slightly different direction for them:We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.
The full paper (PDF) has a lot of details that are missing from the linked summary.
METR recruited 16 experienced open source developers for their study, with varying levels of exposure to LLM tools. They then assigned them tasks from their own open source projects, randomly assigning whether AI was allowed or not allowed for each of those tasks.
They found a surprising difference between developer estimates and actual completion times:
After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%—AI tooling slowed developers down.
I shared my initial intuition about this paper on Hacker News the other day:
My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.
This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.
They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a "you can use AI" v.s. "you can't use AI" rule.
So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.
A quarter of the participants saw increased performance, 3/4 saw reduced performance.
One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:
However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it's plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.
My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.
I got an insightful reply there from Nate Rush, one of the authors of the study, which included these notes:
- Some prior studies that find speedup do so with developers that have similar (or less!) experience with the tools they use. In other words, the "steep learning curve" theory doesn't differentially explain our results vs. other results.
- Prior to the study, 90+% of developers had reasonable experience prompting LLMs. Before we found slowdown, this was the only concern that most external reviewers had about experience was about prompting -- as prompting was considered the primary skill. In general, the standard wisdom was/is Cursor is very easy to pick up if you're used to VSCode, which most developers used prior to the study.
- Imagine all these developers had a TON of AI experience. One thing this might do is make them worse programmers when not using AI (relatable, at least for me), which in turn would raise the speedup we find (but not because AI was better, but just because with AI is much worse). In other words, we're sorta in between a rock and a hard place here -- it's just plain hard to figure out what the right baseline should be!
- We shared information on developer prior experience with expert forecasters. Even with this information, forecasters were still dramatically over-optimistic about speedup.
- As you say, it's totally possible that there is a long-tail of skills to using these tools -- things you only pick up and realize after hundreds of hours of usage. Our study doesn't really speak to this. I'd be excited for future literature to explore this more.
In general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the factors table on page 11).
Here's their table of the most likely factors:
I think Nate's right that jumping straight to a conclusion about a single factor is a shallow and unproductive way to think about this report.
That said, I can't resist the temptation to do exactly that! The factor that stands out most to me is that these developers were all working in repositories they have a deep understanding of already, presumably on non-trivial issues since any trivial issues are likely to have been resolved in the past.
I think this is a really interesting paper. Measuring developer productivity is notoriously difficult. I hope this paper inspires more work with a similar level of detail to analyzing how professional programmers spend their time:
To compare how developers spend their time with and without AI assistance, we manually label a subset of 128 screen recordings with fine-grained activity labels, totaling 143 hours of video.
Via Hacker News
Tags: open-source, productivity, ai, generative-ai, llms, ai-assisted-programming, paper-review
2025-07-13 01:07:15
Grok 4 Heavy won't reveal its system prompt
Grok 4 Heavy is the "think much harder" version of Grok 4 that's currently only available on their $300/month plan. Jeremy Howard relays a report from a Grok 4 Heavy user who wishes to remain anonymous: it turns out that Heavy, unlike regular Grok 4, has measures in place to prevent it from sharing its system prompt:Sometimes it will start to spit out parts of the prompt before some other mechanism kicks in to prevent it from continuing.
This is notable because Grok have previously indicated that system prompt transparency is a desirable trait of their models, including in this now deleted tweet from Grok's Igor Babuschkin (screenshot captured by Jeremy):
In related prompt transparency news, Grok's retrospective on why Grok started spitting out antisemitic tropes last week included the text "You tell it like it is and you are not afraid to offend people who are politically correct" as part of the system prompt blamed for the problem. That text isn't present in the history of their previous published system prompts.
Given the past week of mishaps I think xAI would be wise to reaffirm their dedication to prompt transparency and set things up so the xai-org/grok-prompts repository updates automatically when new prompts are deployed - their current manual process for that is clearly not adequate for the job!
Tags: ai, generative-ai, llms, grok, ai-ethics
2025-07-13 00:12:18
crates.io is the Rust ecosystem's equivalent of PyPI. Inspired by PyPI's GitHub integration (see my TIL, I use this for dozens of my packages now) they've added a similar feature:
Trusted Publishing eliminates the need for GitHub Actions secrets when publishing crates from your CI/CD pipeline. Instead of managing API tokens, you can now configure which GitHub repository you trust directly on crates.io.
They're missing one feature that PyPI has: on PyPI you can create a "pending publisher" for your first release. crates.io currently requires the first release to be manual:
To get started with Trusted Publishing, you'll need to publish your first release manually. After that, you can set up trusted publishing for future releases.
Via @charliermarsh
2025-07-12 23:41:22
On the morning of July 8, 2025, we observed undesired responses and immediately began investigating.
To identify the specific language in the instructions causing the undesired behavior, we conducted multiple ablations and experiments to pinpoint the main culprits. We identified the operative lines responsible for the undesired behavior as:
- “You tell it like it is and you are not afraid to offend people who are politically correct.”
- “Understand the tone, context and language of the post. Reflect that in your response.”
- “Reply to the post just like a human, keep it engaging, dont repeat the information which is already present in the original post.”
These operative lines had the following undesired results:
- They undesirably steered the @grok functionality to ignore its core values in certain circumstances in order to make the response engaging to the user. Specifically, certain user prompts might end up producing responses containing unethical or controversial opinions to engage the user.
- They undesirably caused @grok functionality to reinforce any previously user-triggered leanings, including any hate speech in the same X thread.
- In particular, the instruction to “follow the tone and context” of the X user undesirably caused the @grok functionality to prioritize adhering to prior posts in the thread, including any unsavory posts, as opposed to responding responsibly or refusing to respond to unsavory requests.
— @grok, presumably trying to explain Mecha-Hitler
Tags: ai-ethics, prompt-engineering, grok, ai-personality, generative-ai, ai, llms