Carrotly← back to field notes

15 April 20266 min readby Michael Cutler

How TooLong decides what's worth keeping in a summary

Most AI summaries are a sea of generic bullet points. We built TooLong to do the opposite — keep only what earned its place, and tell you honestly when there isn't much. Notes on the editorial choices behind a short summary.

TooLong turns a YouTube link into a short summary. That is the whole product. The interesting part isn't the model that writes the summary; it's the rules we wrap around it to decide what should be in there at all. A summary is an editorial act, not a compression one, and most of our work has been on that distinction.

Most summaries are compression, not editing.

If you paste a transcript into a generic model and ask for a summary, what you get back is a smaller version of the same thing. The throat-clearing is shorter. The sponsorship read is paraphrased. The host's social-media plug is mentioned in passing instead of at length. It reads like a video, just faster. That isn't what most people want when they ask for a summary; it's what they're trying to escape.

An editor does the opposite job. They read the whole thing, decide what was actually worth saying, and write that down. Half the words on the page get cut because they were filler in the first place. The remaining half are reordered so the argument is clearer than it was in the original. The reader gets the thing they would have gotten if they'd watched, minus the time and the patience.

We wanted TooLong to behave like an editor. That sounds like a prompt-engineering problem and it is partly that, but mostly it's a problem of being concrete about what gets kept and what gets cut. The rules below are the ones we've written down so far.

What earns its place.

A line earns its place in a TooLong summary if it is one of the following:

  • A number. "Costs dropped 40 percent in eighteen months" is worth keeping. "Costs dropped a lot" is not.
  • A date. When something happened anchors the rest of the claim.
  • A name. A person, a company, a paper, a study. Anything the reader could go and look up themselves.
  • A method. The actual technique someone is describing — not that they're describing one.
  • A counter-intuitive claim. If the speaker says something the reader probably wouldn't have guessed, that's the part most worth surfacing.
  • A stated source. "According to the WHO" or "in the original paper" — the attribution lets the reader check.
  • A turn in the argument. The moment the speaker changes their mind, concedes a point, or pivots. These are the load-bearing beams of a good video.

What we cut, by default: the recap of the previous video, the request to like and subscribe, the sponsor read, the restated question ("so the question is, why does X..."), the "I'll get to that in a moment", the meta-commentary about the channel, the throat-clearing intro, and anything the timestamp itself already says. If a section header is "Three reasons it failed", we don't need a bullet that says "the speaker gives three reasons it failed". We need the three reasons.

The test we use internally is: could a reader act on this line, remember it, or look it up? If none of the three, it shouldn't be there.

A timestamp is a promise.

Most timestamped summaries are useless because the timestamps point at section headers. "12:34 — economic discussion" is a label, not a promise. The reader has to watch from 12:34 to find out whether the bit they care about is there.

A TooLong timestamp tries to be a promise instead. "12:34 — argues GDP per capita misleads when comparing Singapore and Norway because it ignores cost of living and state-provided services" tells you what's at 12:34 specifically enough that you can decide whether to click. If the claim is wrong or you disagree, you know in a sentence; if it's interesting, you have the context before you press play.

This is harder than it sounds because the model has to commit. A vague timestamp is safe — it's hard to be wrong about "economic discussion" — and a specific one is exposed. We bias toward the exposed version. When the model can't write a specific timestamp because the section genuinely is just a meander, we'd rather drop the timestamp than fake one. A consequence: a 90-minute interview might end up with eight timestamps in our summary, not forty. Eight specific promises beat forty vague ones.

A related rule: when a creator provides their own chapter markers, we read them but don't trust them. A creator's chapter list is a marketing artefact — written to put keywords near the top and convince the algorithm the video has structure. A chapter titled "The shocking truth about X" is usually where the speaker walks back the strong version of the claim. We treat chapters as a hint about what the creator thought was important, and form our own view of what actually was.

Sometimes the answer is "there isn't much here".

The honest failure mode of a summary tool is the one where there isn't enough content to summarise. A 14-minute product review might have one useful sentence ("the battery lasts about six hours under normal use") buried in twelve minutes of unboxing and chat. A 40-minute podcast might be two people agreeing with each other warmly for the whole runtime without making a specific claim either way.

When that happens, the right summary is short. One line, sometimes two. Not a padded paragraph that pretends there was more. We'd rather a user gets a summary that says "this video has one claim worth surfacing: X" and feels slightly short-changed by the length, than a user who reads a four-paragraph summary, watches the video, and discovers we made most of it up.

This is the bit that's hardest to get right, and it's the bit we'll have more to say about later this year. The model wants to fill the page. We're still working out how to let it leave the page mostly empty when that's the truthful answer.

What we'd still like to get better at.

A short, honest list of things our summaries don't do well yet:

  • Visual content. A summary of a video that's mostly a diagram or a screen recording leaves out the diagram.
  • Sarcasm and irony. The model occasionally reports the speaker's joke as their actual position. We catch this in review more than we'd like.
  • Long, slow arguments. A two-hour philosophy lecture that builds gradually is harder to summarise than a punchy ten-minute explainer, because the load-bearing claims are spread out and reference each other.
  • Non-English videos. Our summaries are in English; we don't yet do justice to the original language's idioms when we translate them inline.

A summary is a small object. Most of the work of making it good is invisible — it's in the lines we decided not to write. That's true of most of the things we make, and it's especially true here. If you've used TooLong and noticed somewhere we got the editing wrong — kept a line we should have cut, or cut one we should have kept — email us. Those notes are how the rules above got written in the first place.

toolongcraftsummarisation

Was this useful? Email us a reply →

keep reading