Parameterized types in C using the new tag compatibility rule

(nullprogram.com)

91 points | by ingve 11 hours ago

11 comments

  • Arnavion 10 minutes ago
    Neat similarity to Zig's approach to generic types. The generic type is defined as a type constructor, a function that returns a type. Every instantiation of that generic type is an invocation of that function. So the generic growable list type is `fn ArrayList(comptype T: type) type` and a function that takes two lists of i32 and returns a third is `fn foo(a: ArrayList(i32), b: ArrayList(i32)) ArrayList(i32)`
  • fuhsnn 6 hours ago
    The recent #def #enddef proposal[1] would eliminate the need for backslashes to define readable macros, making this pattern much more pleasant, finger crossed for its inclusion in C2Y!

    [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3531.txt

    • cb321 6 hours ago
      While long-def's might be nice, you can even back in ANSI C 89 get rid of the backslash pattern (or need to cc -E and run through GNU indent/whatever) by "flipping the script" and defining whole files "parameterized" by their macro environment like https://github.com/c-blake/bst or https://github.com/glouw/ctl/

      Add a namespacing macro and you have a whole generics system, unlike that in TFA.

      So, it might add more value to have the C std add an `#include "file.c" name1=val1 name2=val2` preprocessor syntax where name1, name2 would be on a "stack" and be popped after processing the file. This would let you do types/functions/whatever "generic modules" with manual instantiation which kind of fits with C (manual management of memory, bounds checking, etc.) but preprocessor-assisted "macro scoping" for nested generics. Perhaps an idea to play with in your slimcc fork?

      • glouwbug 23 minutes ago
        I've been thinking of maybe doing CTL2 with this. Maybe if #def makes it in.
    • hyperbolablabla 3 hours ago
      I really don't think the backslashes are that annoying? Seems unnecessary to complicate the spec with stuff like this.
  • JonChesterfield 4 hours ago
    Not personally interested in this hack, but https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3037.pdf means struct foo {} defined multiple times with the same fields in the same TU now refers to the same thing instead of to UB and that is a good bugfix.
  • IAmLiterallyAB 2 hours ago
    If you're reaching for that hack, just use C++? You don't have to go all in on C++-isms, you can always write C-style C++ and only use the features you need.
    • pton_xd 30 minutes ago
      Yeah as someone who writes C in C++, everytime I see posts bending over backwards trying to fit paramterized types into C I just cringe a little. I understand the appeal of sticking to "pure" C, but... why do that to yourself? Come on over, we've got lambdas, operator overloading for those special circumstances... the water's fine!
      • pjmlp 25 minutes ago
        Some people will do as much as they can to hurt themselves, only to avoid using C++.

        Note as the newer versions are basically C++ without Classes kind of thing.

        • glouwbug 5 minutes ago
          I think the main appeal is subset lock-down and compile times. ~5000 lines in C gets me sub second iteration times, while ~5000 lines in C++ hits the 10 second mark. Including both iostream and format in C++ gets any projects up into the ~1.5 second mark which kills my iteration interests.

          Second to that I'd say the appeal is just watching something you've known for a long time grow slowly and steadily.

    • waynecochran 2 hours ago
      Not always a viable option -- especially for embedded and systems programming.
  • unwind 8 hours ago
    I think this is an interesting change, even though I (as someone who has loved C for 30+ years and use it daily in a professional capacity) don't immediately see a lot of use-cases I'm sure they can be found as the author demonstrates. Cool, and a good post!
    • glouwbug 1 hour ago
      Combined with C23's auto (see vec_for) you can technically backport the entirety of C++'s STL (of course with skeeto's limitation in his last paragraph in mind). gcc -std=c23. It is a _very_ useful feature for even the mundane, like resizable arrays:

        #include <stdlib.h>
        #include <stdio.h>
        
        #define vec(T) struct vec##T { T* val; int size; int cap; }
        
        #define vec_push(self, x) {                                                 \
            if((self).size == (self).cap) {                                         \
                (self).cap = (self).cap == 0 ? 1 : 2 * (self).cap;                  \
                (self).val = realloc((self).val, sizeof(*(self).val) * (self).cap); \
            }                                                                       \
            (self).val[(self).size++] = x;                                          \
        }
        
        #define vec_for(self, at, ...)             \
            for(int i = 0; i < (self).size; i++) { \
                auto at = &(self).val[i];          \
                __VA_ARGS__                        \
            }
        
        typedef vec(char) string;
        
        void string_push(string* self, char* chars)
        {
            if(self->size > 0)
            {
                self->size -= 1;
            }
            while(*chars)
            {
                vec_push(*self, *chars++);
            }
            vec_push(*self, '\0');
        }
        
        int main()
        {
            vec(int) a = {};
            vec_push(a, 1);
            vec_push(a, 2);
            vec_push(a, 3);
            vec_for(a, at, {
                printf("%d\n", *at);
            });
            vec(double) b = {};
            vec_push(b, 1.0);
            vec_push(b, 2.0);
            vec_push(b, 3.0);
            vec_for(b, at, {
                printf("%f\n", *at);
            });
            string c = {};
            string_push(&c, "this is a test");
            string_push(&c, " ");
            string_push(&c, "for c23");
            printf("%s\n", c.val);
        }
  • uecker 1 hour ago
    Here is my experimental library for generic types with some godbolt links to try: https://github.com/uecker/noplate
  • rwmj 7 hours ago
    Slighty off-topic, why is he using ptrdiff_t (instead of size_t) for the cap & len types?
    • r1chardnl 6 hours ago
      From one of his other blogposts. "Guidelines for computing sizes and subscripts"

        Never mix unsigned and signed operands. Prefer signed. If you need to convert an operand, see (2).
      
      https://nullprogram.com/blog/2024/05/24/

      https://www.youtube.com/watch?v=wvtFGa6XJDU

      • poly2it 5 hours ago
        I still don't understand how these arguments make sense for new code. Naturally, sizes should be unsigned because they represent values which cannot be unsigned. If you do pointer/size arithmetic, the only solution to avoid overflows is to overflow-check and range-check before computation.

        You cannot even check the signedness of a signed size to detect an overflow, because signed overflow is undefined!

        The remaining argument from what I can tell is that comparisons between signed and unsigned sizes are bug-prone. There is however, a dedicated warning to resolve this instantly.

        It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.

        Given this, I can't understand the justification. I'm currently using unsigned sizes. If you have anything contradicting, please comment :^)

        • sparkie 3 hours ago
          C offers a different solution to the problem in Annex K of the standard. It provides a type `rsize_t`, which like `size_t` is unsigned, and has the same bit width, but where `RSIZE_MAX` is recommended to be `SIZE_MAX >> 1` or smaller. You perform bounds checking as `<= RSIZE_MAX` to ensure that a value used for indexing is not in the range that would be considered negative if converted to a signed integer. A negative value provided where `rsize_t` is expected would fail the check `<= RSIZE_MAX`.

          IMO, this is a better approach than using signed types for indexing, but AFAIK, it's not included in GCC/glibc or gnulib. It's an optional extension and you're supposed to define `__STDC_WANT_LIB_EXT1__` to use it.

          I don't know if any compiler actually supports it. It came from Microsoft and was submitted for standardization, but ISO made some changes from Microsoft's own implementation.

          https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1173.pdf#p...

          https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1225.pdf

        • windward 2 hours ago
          Pointer arithmetic that could overflow would probably involve a heap and therefore be less likely to require a relative, negative offset. Just use the addresses and errors you get from allocation.
        • sim7c00 5 hours ago
          I dont know either.

          int somearray[10];

          new_ptr = somearray + signed_value;

          or

          element = somearray[signedvalue];

          this seems almost criminal to how my brain does logic/C code.

          The only thing i could think of is this:

          somearray+=11; somearray[-1] // index set to somearray[10] ??

          if i'd see my CPU execute that i'd want it to please stop. I'd want my compiler to shout at me like a little child, and be mean until i do better.

          -Wall -Wextra -Wextra -Wpedantic <-- that should flag i think any of these weird practices.

          As you stated tho, i'd be keen to learn why i am wrong!

          • windward 2 hours ago
            In the implementation of something like a deque or merge sort, you could have a variable that represents offsets from pointers but which could sensibly be negative. C developers culturally aren't as particular about theoretical correctness of types as developers in some other languages - there's a lot of implicit casting being used - so you'll typically see an `int` used for this. If you do wish to bring some rigidity to your type system, you may argue that this value is distinct from a general integer which could be used for any arithmetic and definitely not just a pointer. So it should be a signed pointer difference.

            Arrays aren't the best example, since they are inherently about linear, scalar offsets, but you might see a negative offset from the start of a (decayed) array in the implementation of an allocator with clobber canaries before and after the data.

        • ncruces 4 hours ago
          > It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.

          Why?

          By the definition of ptrdiff_t, ISTM the size of any object allocated by malloc cannot be out of bounds of ptrdiff_t, so I'm not sure how can you have a useful size_t that uses the sign bit?

        • foldr 4 hours ago
          Stroustrup believes that signed should be preferred to unsigned even for values that can’t be less than zero: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14...
    • rurban 4 hours ago
      Skeeto and Stroustrup are a bit confused about valid index types. They prefer signed, which will lead to overflows on negative values, but have the advantage of using only half of the valid ranges, so there's more heap for the rest. Very confused
  • o11c 1 hour ago
    Are we getting a non-broken `_Generic` yet? Because that's the thing that made me give up with disgust the last project I tried to write in C. Manually having to do `extern template` a few times is nothing in comparison.
  • tialaramex 6 hours ago
    It seems as though this makes it impossible to do the new-type paradigm in C23 ? If Goose and Beaver differ only in their name, C now thinks they're the same type so too bad we can tell a Beaver to fly even though we deliberately required a Goose ?
    • yorwba 6 hours ago
      "Tag compatibility" means that the name has to be the same. The issue the proposal is trying to address is that "struct Goose { float weight; }" and "struct Goose { float weight; }" are different types if declared in different locations of the same translation unit, but the same if declared in different translation units. With tag compatibility, they would always be treated as being the same.

      "struct Goose { float weight; }" and "struct Beaver { float weight; }" would remain incompatible, as would "struct { float weight; }" and "struct { float weight; }" (since they're declared without tags.)

      • tialaramex 6 hours ago
        Ah, thanks, that makes sense.
  • Surac 7 hours ago
    i fear this will make slopy code compile more often OK.
    • poly2it 5 hours ago
      Dear God I hope nobody is committing unreviewed LLM output in C codebases.
      • pjmlp 24 minutes ago
        Eventually they will generate executables directly.
      • pests 1 hour ago
        No worries, the LLM commits it for you.
    • ioasuncvinvaer 7 hours ago
      Can you give an example?