We deserve a better streams API for JavaScript

(blog.cloudflare.com)

85 points | by nnx 1 hour ago

11 comments

  • conartist6 39 minutes ago
    As it happens i have an even better API than this article proposes!

    They propose just using an async iterator of UInt8Array. I almost like this idea, but it's not quite all the way there.

    They propose this:

      type Stream<T> = {
        next(): Promise<{ done, value: UInt8Array<T> }>
      }
    
    I propose this, which I call a stream iterator!

      type Stream<T> = {
        next(): { done, value: T } | Promise<{ done, value: T }>
      }
    
    Obviously I'm gonna be biased, but I'm pretty sure my version is also objectively superior:

    - I can easily make mine from theirs

    - In theirs the conceptual "stream" is defined by an iterator of iterators, meaning you need a for loop of for loops to step through it. In mine it's just one iterator and it can be consumed with one for loop.

    - I'm not limited to having only streams of integers, they are

    - My way, if I define a sync transform over a sync input, the whole iteration can be sync making it possible to get and use the result in sync functions. This is huge as otherwise you have to write all the code twice: once with sync iterator and for loops and once with async iterators and for await loops.

    - The problem with thrashing Promises when splitting input up into words goes away. With async iterators, creating two words means creating two promises. With stream iterators if you have the data available there's no need for promises at all, you just yield it.

    - Stream iterators can help you manage concurrency, which is a huge thing that async iterators cannot do. Async iterators can't do this because if they see a promise they will always wait for it. That's the same as saying "if there is any concurrency, it will always be eliminated."

    • Joker_vD 4 minutes ago
      > Obviously I'm gonna be biased, but I'm pretty sure my version is also objectively superior:

      > - I can easily make mine from theirs

      That... doesn't make it superior? On the contrary, theirs can't be easily made out of yours, except by either returning trivial 1-byte chunks, or by arbitrary buffering. So their proposal is a superior primitive.

      On the whole, I/O-oriented iterators probably should return chunks of T, otherwise you get buffer bloat for free. The readv/writev were introduced for a reason, you know.

    • paxys 3 minutes ago
      And how do you ensure T is serializable over the wire?
    • conartist6 24 minutes ago
      There's one more interesting consequence: you rid yourself of the feedback problem.

      To see the problem let's create a stream with feedback. Lets say we have an assembly line that produces muffins from ingredients, and the recipe says that every third muffin we produce must be mushed up and used as an ingredient for further muffins. This works OK until someone adds a final stage to the assembly line, which puts muffins in boxes of 12. Now the line gets completely stuck! It can't get a muffin to use on the start of the line because it hasn't made a full box of muffins yet, and it can't make a full box of muffins because it's starved for ingredients after 3.

      If we're mandated to clump the items together we're implicitly assuming that there's no feedback, yet there's also no reason that feedback shouldn't be a first-class ability of streams.

  • z3t4 8 minutes ago
    [delayed]
  • ai-christianson 56 minutes ago
    The point about BYOB reads is spot on. It's frustrating that such a critical feature for performance and reducing GC pressure ended up being so difficult to implement correctly in the WHATWG standard. A simpler, more ergonomic approach to buffer management would go a long way for those of us building high-performance data processing tools in JS.
    • slowcache 35 minutes ago
      > high-performance data processing tools in JS

      I may be naive in asking this, but what leads someone to building high perf data tools in JS? JS doesn't seem to me like it would be the tool of choice for such things

      • n_e 18 minutes ago
        I have a SaaS project where the backend is in JS. I also have some data processing to do with large file (several TB). Doing it is in JS is more convenient as I can reuse code from the backend, and it is also the language I know best.

        Performance-wise, I get about half the throughput I had with the same processsing done it rust, which doesn't change anything for my use-case.

        However that's not really relevant to the context of the post as I'm using node.js streams which are both saner and fast. I'm guessing that the post is relevant to people using server-side runtimes that only implement web streams.

      • moron4hire 5 minutes ago
        You don't always have a choice on where you deliver your software. It'd be nice to have good tools wherever you are forced to work.
      • thadt 20 minutes ago
        Browsers
        • speed_spread 9 minutes ago
          Since when are browsers themselves built in JavaScript? Mainstream, fast ones?
  • shevy-java 55 minutes ago
    We deserve a better language than JavaScript.

    Sadly it will never happen. WebAssembly failed to keep some of its promises here.

    • gejose 34 minutes ago
      There's always a comment like this in most discussions about javascript.
    • postalrat 36 minutes ago
      Where can I find these not kept promises?
  • murmansk 31 minutes ago
    For gods sake, finally, somebody have said this!
  • ralusek 23 minutes ago
    I tinkered with an alternative to stream interfaces:

    https://github.com/ralusek/streamie

    allows you to do things like

        infiniteRecords
        .map(item => doSomeAsyncThing(item), { concurrency: 5 });
    
    And then because I found that I often want to switch between batching items vs dealing with single items:

        infiniteRecords
        .map(item => doSomeAsyncSingularThing(item), { concurrency: 5 })
        .map(groupOf10 => doSomeBatchThing(groupsOf10), { batchSize: 10 })
        // Can flatten back to single items
        .map(item => backToSingleItem(item), { flatten: true });
  • dilap 55 minutes ago
    > The problems aren't bugs; they're consequences of design decisions that may have made sense a decade ago, but don't align with how JavaScript developers write code today.

    > I'm not here to disparage the work that came before — I'm here to start a conversation about what can potentially come next.

    Terrible LLM-slop style. Is Mr Snell letting an LLM write the article for him or has he just appropriated the style?

    • jasnell 15 minutes ago
      Heh, I was using emdashes and tricolons long before LLMs appropriated the style but I did let the agent handle some of the details on this. Honestly, it really is just easier sometimes... Especially for blogs posts like this when I've also got a book I'm writing, code to maintain etc. Use tools available to make life easier.
      • eis 3 minutes ago
        People are understandably a bit sensitized and sceptical after the last AI generated blog post (and code slop!) by Cloudflare blew up. Personally I'm fine with using AI to help write stuff as long as everything is proof-read and actually represents the authors thoughts. I would have opted to be a bit more careful and not use AI for a few blog posts after the last incident though if I was working at Cloudflare...
    • azangru 20 minutes ago
      What was it specifically about the style that stood out as incongruous, or that hindered comprehension? What was it that made you stumble and start paying close attention to the style rather than to the message? I am looking at the two examples, and I can't see anything wrong with them, especially in the context of the article. They both employ the same rhetorical technique of antithesis, a juxtaposition of contrasting ideas. Surely people wrote like this before? Surely no-one complained?
    • nebezb 14 minutes ago
      The idea is well articulated and comes across clear. What’s the issue? Taking a magnifying glass to the whole article to find sentence structure you think is “LLM-slop” is an odd way to dismiss the article entirely.

      I’ve read my fair share of LLM slop. This doesn’t qualify.

    • lapcat 50 minutes ago
      You’ve got it backwards: LLMs were trained on human writing and appropriated our style.
      • have_faith 20 minutes ago
        Partially true. They've been trained and then aligned towards a preferred style. They don't use em-dashes because they are over-represented in the training material (majority of people don't use them).
    • jitl 40 minutes ago
      cloudflare does seem to love ai written everything
  • kg 1 hour ago
    It's a real shame that BYOB (bring your own buffer) reads are so complex and such a pain in the neck because for large reads they make a huge difference in terms of GC traffic (for allocating temporary buffers) and CPU time (for the copies).

    In an ideal world you could just ask the host to stream 100MB of stuff into a byte array or slice of the wasm heap. Alas.

    • amluto 34 minutes ago
      I wonder if you can get most of the benefit BYOB with a much simpler API:

          for await (const chunk of stream) {
              // process the chunk
              stream.returnChunk(chunk);
          }
      
      This would be entirely optional. If you don’t return the chunk and instead let GC free it, you get the normal behavior. If you do return it, then the stream is permitted to return it again later.

      (Lately I’ve been thinking that a really nice stream or receive API would return an object with a linear type so that you must consume it and possibly even return it. This would make it impossible to write code where task cancellation causes you to lose received data. Sadly, mainstream languages can’t do this directly.)

  • animanoir 18 minutes ago
    [dead]
  • Feathercrown 1 hour ago
    [flagged]
  • user3939382 1 hour ago
    “ The Streams Standard was developed between 2014 and 2016 with an ambitious goal to provide "APIs for creating, composing, and consuming streams of data that map efficiently to low-level I/O primitives." Before Web streams, the web platform had no standard way to work with streaming data.”

    This is what UDP is for. Everything actually has to be async all the way down and since it’s not, we’ll just completely reimplement the OS and network on top of itself and hey maybe when we’re done with that we can do it a third time to have the cloud of clouds.

    The entire stack we’re using right down to the hardware is not fit for purpose and we’re burning our talent and money building these ever more brittle towering abstractions.

    • afavour 1 hour ago
      UDP is a protocol, not an API
    • delaminator 39 minutes ago
      We're too busy building products while waiting for the perfect system to arrive.