I've built several programmatic video generation systems for work over the past few years (ads, social videos, automated clips, etc.), and I kept running into the same frustration:
> Every project ended up reinventing a slightly different DSL for timelines, layers, animations, and transitions.
Despite very different use cases, the code always converged into the same patterns:
- layout + timing
- repeated elements over time
- imperative glue code to manage state and sequencing
Meanwhile, web developers already have decades of experience solving similar problems with HTML, CSS, and the DOM — just not over time.
So as an experiment, I started building htmlv:
> an HTML-inspired markup language for video, where the DOM exists along a timeline instead of an infinite vertical scroll.
# Core idea
- Time-based layout instead of vertical layout
- A temporal DOM where repeating elements extend time, not height
- Reuse familiar concepts: HTML structure, CSS styling, JavaScript-driven DOM updates
- Fixed viewport (aspect-ratio aware), closer to video than documents
This is not meant to replace video editors or After Effects.
The target is code-first video generation where:
- content is data-driven
- layouts are reusable
- engineers (not motion designers) own the pipeline
# Why I'm posting
Before investing more time, I'd really like feedback from people who've:
- built video pipelines
- designed DSLs
- worked on media tooling
- or have strong opinions about why this is a terrible idea
Questions I’m wrestling with:
- Is HTML a fundamentally bad mental model for time-based media?
- Does this become unmaintainable at scale?
- Am I underestimating how different “time” is from “layout”?
- Are there existing tools or standards I should study more closely?
I’m not looking for validation — criticism is very welcome.
If this is doomed, I’d much rather know why early.
Thanks in advance for any thoughts, advice, or brutal feedback.
I've built several programmatic video generation systems for work over the past few years (ads, social videos, automated clips, etc.), and I kept running into the same frustration: > Every project ended up reinventing a slightly different DSL for timelines, layers, animations, and transitions. Despite very different use cases, the code always converged into the same patterns: - layout + timing - repeated elements over time - imperative glue code to manage state and sequencing Meanwhile, web developers already have decades of experience solving similar problems with HTML, CSS, and the DOM — just not over time. So as an experiment, I started building htmlv: > an HTML-inspired markup language for video, where the DOM exists along a timeline instead of an infinite vertical scroll.
GitHub: https://github.com/xxatsushixx/htmlv
# Core idea - Time-based layout instead of vertical layout - A temporal DOM where repeating elements extend time, not height - Reuse familiar concepts: HTML structure, CSS styling, JavaScript-driven DOM updates - Fixed viewport (aspect-ratio aware), closer to video than documents This is not meant to replace video editors or After Effects. The target is code-first video generation where: - content is data-driven - layouts are reusable - engineers (not motion designers) own the pipeline
# Example Structure <!DOCTYPE htmlv> <html> <head> <title>Sample Video</title> <link rel="stylesheet" href="styles.css"> <script src="script.js"></script> <meta name="seed" content="12345"> <meta name="framerate" content="30fps"> <meta name="compile-mode" content="precompile"> </head> <body> <scene style="time-length: 10s; scene-transition: fade 2s;"> <text class="title">Welcome to htmlv</text> </scene> <scene style="time-length: 15s;"> <video src="background-loop.mp4"></video> <scene> <text class="subtitle">Creating videos with code</text> </scene> </scene> </body> </html>
# Why I'm posting Before investing more time, I'd really like feedback from people who've: - built video pipelines - designed DSLs - worked on media tooling - or have strong opinions about why this is a terrible idea Questions I’m wrestling with: - Is HTML a fundamentally bad mental model for time-based media? - Does this become unmaintainable at scale? - Am I underestimating how different “time” is from “layout”? - Are there existing tools or standards I should study more closely? I’m not looking for validation — criticism is very welcome. If this is doomed, I’d much rather know why early.
Thanks in advance for any thoughts, advice, or brutal feedback.