Kefir: Solo-developed full C17/C23 compiler with extensive validation

(kefir.protopopov.lv)

59 points | by jprotopopov 4 days ago

3 comments

renehsz 1 day ago
Wow, hats off to you! This is one of the most impressive solo projects I've seen in a while!
Making a toy C compiler isn't rocket science, but developing one that's complete and production-ready is whole nother story. AFAICT, Kefir fits into the latter category:
- C17/C23 compliance
- x86_64 codegen
- debug info gen
- SSA-based optimization passes (the most important ones)
- has a widely-compatible cc cli
- is extensively tested/fuzzed
Some advantages compared to the big three (GCC, Clang, MSVC):
- It's fairly small and simple
- That means it's understandable and predictable - no surprises in terms of what it can and cannot do (regarding optimizations in particular)
- Compilation is probably very fast (although I haven't done any benchmarking)
- It could be modified or extended fairly easily
There might be real interest in this compiler from people/companies who
- value predictability and stability very highly
- want to be in control of their entire software supply chain (including the build tools)
- simply want a faster compiler for their debug builds
Think security/high assurance people, even the suckless or handmade community might be interested.
So it's time to market this thing! Get some momentum going! It would be too sad to see this project fade away in silence. Announce it in lots of places, maybe get it on Compiler Explorer, etc. (I'm not saying that you have to do this, of course. But some people could genuinely benefit from Kefir.)
P.S. Seems like JKU has earned its reputation as one of the best CS schools in Austria ;-)
[-]
- jprotopopov 1 day ago
  (I thought that the announcement has completely faded, so haven't even checked the replies).
  I'll immediately reveal some issues with the project. On the compilation speed, it is unfortunately atrocious. There are multiple reasons for that:
  1. Initially the compiler was much less ambitious, and was basically generating stack-based threaded code, so everything was much simpler. I have managed to make it more reasonable in terms of code generation (now it has real code generator, optimization pipeline), but there is still huge legacy in the code base. There is a whole layer of stack-based IR which is translated from the AST, and then transformed into equivalent SSA-based IR. Removing that means rewriting the whole translator part, for which I am not ready.
  2. You've outlined some appealing points (standard compliance, debug info, optimization passes), but again -- this came at the expense of being over-engineered and bloated inside. Whenever, I have to implement some new feature, I hedge and over-abstract to keep it manageable/avoid hitting unanticipated problems in the future. This has served quite well in terms of development velocity and extending the scope (many parts have not seen complete refactoring since the initial implementation in 2020/2021, I just build up), but efficiency of the compiler itself suffered.
  3. I have not particularly prioritized this issue. Basically, I start optimizing the compiler itself only when something gets too unreasonable (either, in terms of run time, or memory). There are all kinds of inefficiencies, O(n^2) algorithms and such simply because I knew that I would be able to swap that part out should that be necessary, but never actually did. I think the compiler efficiency has been the most de-prioritized concern for me.
  Basically, if one is concerned with compilation speed, it is literally better to pick gcc, not even talking about something like tcc. Kefir is abysmal in that respect. I terms, of code base size, it is 144k (sans tests, 260k in total) which is again not exactly small. It's manageable for me, but not hacker-friendly.
  With respect to marketing, I am kind of torn. I cannot work on this thing full time, unless somebody is ready to provide sufficient full-time funding for myself and also other expenses (machines for tests, etc). Otherwise, I'll just increase the workload on myself and reduce the amount of time I can spend actually working on the code, so it'll probably be net loss for the project. Either way, for now I treat it closer to an art project than a production tool.
  As for compiled code performance, I have addressed it here https://lobste.rs/s/fxyvwf -- it's better than, say, tcc, but nowhere near well-established compilers. I think this is reasonable to expect, and the exact ways to improve that a bit are also clear to me, it's only question of development effort
  P.S. JKU is a great school, although by the time I enrolled there the project has already been on the verge of bootstrapping itself.
  EDIT: formatting
Western0 7 hours ago
I need only one small things.
package system like rust. I need normal C library but easy download and using. Nothing more. I like C. I can produce nice code but library is still problem.
[-]
- jprotopopov 6 hours ago
  I am going to hijack this comment for a little rant about libc and independent compilers. Existing libc implementations have varying levels of hostility towards non-gcc/clang compilers. Glibc is probably the worst offender here, musl is the most compliant (but even there are some assumptions).
  I am in no way accusing any of libc developers, they got much higher priority things to do than supporting obscure compilers, and this is my problem as a compiler developer, but nevertheless. For instance, glibc may simply override __attribute__ keyword with an empty macro, or omit packed attributes for non GNU C compilers (breaking epoll ABI along the way). Of course, strictly standard-compliant compiler may not have attributes at all, but in my opinion #warning/#error directive would have been more appropriate than silently producing ABI breakage.
  Although, I have not engaged with glibc developers on these topics, and mostly documented encountered challenges as patches in my own test suite.
oguz-ismail 1 day ago
Cool project. Unlike tcc and cproc though kefir doesn't seem very good at handling big arrays. This
```
    $ kefir -c - <<x
    int a[] = {
    $(seq 10000000 | tr '\n' ,)
    };
    x
```
allocates gigabytes of memory and eventually crashes WSL on my machine.
[-]
- jprotopopov 1 day ago
  I have addressed compiler inefficiency in the sibling comment. This is indeed a problem. Empty arrays of such size should be compile-able (there is sparse representation for arrays). However, I would say that this use case is not particularly practical, at least in none of the projects from my test suite this has been an issue.
  [-]
  - oguz-ismail 1 day ago
    Good work either way. Congrats!