A leap year check in three instructions

(hueffner.de)

434 points | by gnabgib 50 days ago

30 comments

_heimdall 50 days ago
> return ((y * 1073750999) & 3221352463) <= 126976;
> How does this work? The answer is surprisingly complex.
I don't think anyone is surprised in the complexity of any explanation for that algorithm :D
divbzero 50 days ago
> Note that modern compilers like gcc or clang will produce something like is_leap_year2 from is_leap_year1, so there is not much point in doing this in C source, but it might be useful in other programming languages.
The optimizations that compilers can achieve kind of amaze me.
Indeed, the latest version of cal from util-linux keeps it simple in the C source:
```
  return ( !(year % 4) && (year % 100) ) || !(year % 400);
```
https://github.com/util-linux/util-linux/blob/v2.41/misc-uti...
[-]
- cookiengineer 50 days ago
  But this is wrong and can only represent dates after the specific year when they switched from the Julian to Gregorian calendar!
  For more on this, I recommend reading and implementing a function that calculates the day of the week [1]. Then you can join me in the special insanity hell of people that were trying to deal with human calendars.
  And then you should implement a test case for the dates between Thursday 4 October 1582 and Friday 15 October 1582 :)
  [1] https://en.m.wikipedia.org/wiki/Determination_of_the_day_of_...
  [-]
  - LegionMammal978 50 days ago
    > the specific year
    The problem is, which "specific" year? The English were using "old-style" dates long after 1582. Better not to try to solve this intractable problem in software, but instead annotate every old date you receive with its correct calendar, which may even be a proleptic Gregorian calendar in some fields of study.
    (How do you determine the correct calendar? Through careful inspection of context! Alas, people writing the dates rarely indicated this, and later readers tend to get the calendars hopelessly mangled up. Not to mention the changes in the start of the year. At least the day of week can act as an indicator, when available.)
  - voxic11 50 days ago
    The full code is
```
   static int leap_year(const struct cal_control *ctl, int32_t year)
   {
    if (year <= ctl->reform_year)
     return !(year % 4);

    return ( !(year % 4) && (year % 100) ) || !(year % 400);
   }
```
    Where reform_year is the year the Gregorian calendar was adopted in the specific context specified (defaults to 1752 which is the year it was adopted by GB and therefore also the US).
    So it does account for Julian dates.
    [-]
    - divbzero 50 days ago
      cal doesn’t offer an option to use the 1582 reform date, but looks like it does handle the 1752 adoption in Great Britain correctly:
      $ cal 9 1752 September 1752 Su Mo Tu We Th Fr Sa 1 2 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
      [-]
      - madcaptenor 50 days ago
        But `ncal` does offer that option. Here's October 1582 in "Italy", which didn't exist back then:
        $ ncal -sIT 10 1582 October 1582 Mo 1 18 25 Tu 2 19 26 We 3 20 27 Th 4 21 28 Fr 15 22 29 Sa 16 23 30 Su 17 24 31
        France apparently took a couple months to get on board (or maybe just to find out):
        $ncal -sFR 12 1582 December 1582 Mo 3 20 27 Tu 4 21 28 We 5 22 29 Th 6 23 30 Fr 7 24 31 Sa 1 8 25 Su 2 9 26
        `ncal -p` gives a list of the country codes it accepts. (These are current countries so it's a bit ahistorical for, say, Germany.)
        Sadly they don't implement the weird thing Sweden did in the early 18th century: https://en.wikipedia.org/wiki/Swedish_calendar
- sigmoid10 50 days ago
  I like how the linux one is also easier to understand because it doesn't perform three sequential checks which actually invert the last two conditions plus a default return. That's the kind of stuff that can make you crazy if you ever have to debug it.
  [-]
  - darkwater 50 days ago
    I wondered 3 minutes "this is not right" til I realized that
```
  if ((y % 25) != 0) return true;
```
    was actually checking for different from 0 (which in hindsight makes also sense because the century years by default are not leap unless they divide by 400)
qingcharles 50 days ago
I love these incomprehensible magic number optimizations. Every time I see one I wonder how many optimizations like this we missed back in the old days when we were writing all our inner loops in assembly?
Does anyone have a collection of these things?
[-]
- ryao 50 days ago
  Here is a short list:
  https://graphics.stanford.edu/~seander/bithacks.html
  It is not on the list, but #define CMP(X, Y) (((X) > (Y)) - ((X) < (Y))) is an efficient way to do generic comparisons for things that want UNIX-style comparators. If you compare the output against 0 to check for some form of greater than, less than or equality, the compiler should automatically simplify it. For example, CMP(X, Y) > 0 is simplified to (X > Y) by a compiler.
  The signum(x) function that is equivalent to CMP(X, 0) can be done in 3 or 4 instructions depending on your architecture without any comparison operations:
  https://www.cs.cornell.edu/courses/cs6120/2022sp/blog/supero...
  It is such a famous example, that compilers probably optimize CMP(X, 0) to that, but I have not checked. Coincidentally, the expansion of CMP(X, 0) is on the bit hacks list.
  There are a few more superoptimized mathematical operations listed here:
  https://www2.cs.arizona.edu/~collberg/Teaching/553/2011/Reso...
  Note that the assembly code appears to be for the Motorola 68000 processor and it makes use of flags that are set in edge cases to work.
  Finally, there is a list of helpful macros for bit operations that originated in OpenSolaris (as far as I know) here:
  https://github.com/freebsd/freebsd-src/blob/master/sys/cddl/...
  There used to be an Open Solaris blog post on them, but Oracle has taken it down.
  Enjoy!
  [-]
  - JdeBP 50 days ago
    For an entire book on this stuff, see Henry S. Warren Jr's Hackers Delight. The "three valued compare function" is in chapter 2, for example.
  - eru 50 days ago
    > It is not on the list, but #define CMP(X, Y) (((X) > (Y)) - ((X) < (Y))) is an efficient way to do generic comparisons for things that want UNIX-style comparators. If you compare the output against 0 to check for some form of greater than, less than or equality, the compiler should automatically simplify it. For example, CMP(X, Y) > 0 is simplified to (X > Y) by a compiler.
    I guess this only applies when the compiler knows what version of > you are using?
    Eg it might not work in C++ when < and > are overloaded for eg strings?
    [-]
    - ryao 50 days ago
      My comment had been meant for C, but it should apply to C++ too even when operator overloading is used, provided the comparisons are simple and inlined. If you add overloads for the > and < operators in your string example to a place where they would inline, and the overload compares .length(), this should simplify. For example, godbolt shows that CMP(X, Y) == 0 is optimized to one mov instruction and one cmp instruction despite operator overloads when I implement your string example:
      https://godbolt.org/z/nGbPhz86q
      If you did not inline the operator overloads and had them in another compilation unit, do not expect this to simplify (unless you use LTO).
      If you have compound comparators in the operator overloads (such that on equality in one field, it considers a second for a tie breaker), I would not expect it to simplify, although the compiler could surprise me.
      [-]
      - eru 49 days ago
        I was more thinking of eg lexicographic comparisons of strings, not just comparing by length.
        Yes, if you have a smart enough compiler, or a simple enough comparison, this will simplify.
        [-]
        ryao 48 days ago
        You could use CMP(A, B) as part of your lexographic comparison and then have it output the result of the first non-zero result (unless you find both strings are equal, in which case, you would output zero) when comparing characters.
        If you implement the operators, you can use CMP(A, B) to turn it into a three value output, since it works solely using Boolean logic, but I would be surprised if it simplified. I am half prepared to be surprised since the compiler might do some CSE after inlining and then do some other transformation. That said, you really only want to use CMP(A, B) for numerical comparisons.
        [-]
        eru 47 days ago
        Yes, you can definitely manually define it. I was talking about what we can reasonably expect the compiler to figure out on its own.
    - trollbridge 50 days ago
      The compiler would resolve that before the optimiser.
      [-]
      - eru 49 days ago
        I'd like to see that resolved for eg lexicographic comparison of strings.
  - kmoser 50 days ago
    There's also this classic: https://en.wikipedia.org/wiki/Fast_inverse_square_root
    [-]
    - ryao 50 days ago
      That is an approximation. If approximations are acceptable, then here is a trick you might like. In loops that call cosf(i * C) and/or sinf(i * C), where i is incremented by 1 on each iteration and C is some constant expression, you can call cosf() and sinf() once (or twice if i starts at something other than 0 or 1) outside of the loop and use the angle addition formula to do accumulation via multiplication and addition inside the loop. The loop will run significantly faster.
      Even if you only need one of cosf() or sinf(), many CPUs calculate both values at the same time, so taking the other is free. If you only need single precision values, you can do this in double precision to avoid much of the errors you would get by doing this in single precision.
      This trick can be used to accelerate the RoPE relative positional encoding calculations used in inference for llama 3 and likely others. I have done this and seen a measurable speed up, although these calculations are such a small part of inference that it was a small improvement.
- owl_vision 50 days ago
  there is "Hacker's Delight" by Henry S. Warren, Jr.
  https://en.wikipedia.org/wiki/Hacker's_Delight
  [-]
  - qingcharles 50 days ago
    Looks awesome, thank you :)
- tylerhou 50 days ago
  You should look at supercompilation.
  [-]
  - mshockwave 50 days ago
    sometimes also known as superoptimization, which many of them also use SMT solvers like Z3 mentioned in the article
    [-]
    - tylerhou 50 days ago
      Yes, sorry, superoptimization is the correct term.
- masfuerte 50 days ago
  We didn't miss them. In those days they weren't optimizations. Multiplications were really expensive.
  [-]
  - JdeBP 50 days ago
    Multiplications of this word length, one should clarify. It's not that multiplication was an inherently more expensive or different operation back then (assuming from context here that the "old days" of coding inner loops in assembly language pre-date even the 32-bit ALU era). Binary multiplication has not changed in millennia. Ancient Egyptians were using the same binary integer multiplication logic 5 millennia ago as ALUs do today.
    It was that generally the fast hardware multiplication operations in ALUs didn't have very many bits in the register word length, so multiplications of wider words had to be done with library functions that did long multiplication in (say) base 256.
    So this code in the headlined article would not be "three instructions" but three calls to internal helper library functions used by the compiler for long-word multiplication, comparison, and bitwise AND; not markedly more optimal than three internal helper function calls for the three original modulo operations, and in fact less optimal than the bit-twiddled modulo-powers-of-2 version found halfway down the headlined article, which would only need check the least significant byte and not call library functions for two of the 32-bit modulo operations.
    Bonus points to anyone who remembers the helper function names in Microsoft BASIC's runtime library straight off the top of xyr head. It is probably a good thing that I finally seem to have forgotten them. (-: They all began with "B$" as I recall.
    [-]
    - kruador 50 days ago
      Most 8-bit CPUs didn't even have a hardware multiply instruction. To multiply on a 6502, for example, or a Z80, you have to add repeatedly. You can multiply by a power of 2 by shifting left, so you can get a bigger result by switching between shifting and adding or subtracting. Although, again, on these earlier CPUs you can only shift by one bit at a time, rather than by a variable number of bits.
      There's also the difference between multiplying by a hard-coded value, which can be implemented with shifts and adds, and multiplying two variables, which has to be done with an algorithm.
      The 8086 did have multiply instructions, but they were implemented as a loop in the microcode, adding the multiplicand, or not, once for each bit in the multiplier. More at https://www.righto.com/2023/03/8086-multiplication-microcode.... Multiplying by a fixed value using shifts and adds could be faster.
      The prototype ARM1 did not have a multiply instruction. The architecture does have a barrel shifter which can shift one of the operands by any number of bits. For a fixed multiplication, it's possible to compute multiplying by a power of two, by (power of two plus 1), or by (power of two minus 1) in a single instruction. The latter is why ARM has both a SUB (subtract) instruction, computing rd := rs1 - Operand2, and a RSB (Reverse SuBtract) instruction, computing rd := Operand2 - rs1. The second operand goes through the barrel shifter, allowing you to write an instruction like 'RSB R0, R1, R1, #4' meaning 'R0 := (R1 << 4) - R1', or in other words '(R1 * 16) - R1', or R1 * 15.
      ARMv2 added in MUL and MLA (MuLtiply and Accumulate) instructions. The hardware ARM2 implementation uses a Booth's encoder to multiply 2 bits at a time, taking up to 16 cycles for 32 bits. It can exit early if the remaining bits are all 0s.
      Later ARM cores implemented an optional wider multiplier (that's the 'M' in 'ARM7TDMI', for example) that could multiply more bits at a time, therefore executing in fewer cycles. I believe ARM7TDMI was 8-bit, completing in up to 4 cycles (again, offering early exit). Modern ARM cores can do 64-bit multiplies in a single cycle.
      [-]
      - cbm-vic-20 50 days ago
        The base RISC-V instruction set does not include hardware multiply instructions. Most implementations do include the M (or related) extensions that provide them, but if you are building a processor that doesn't need it, you don't need to include it.
    - eru 50 days ago
      > Multiplications of this word length, one should clarify. It's not that multiplication was an inherently more expensive or different operation back then (assuming from context here that the "old days" of coding inner loops in assembly language pre-date even the 32-bit ALU era). Binary multiplication has not changed in millennia. Ancient Egyptians were using the same binary integer multiplication logic 5 millennia ago as ALUs do today.
      Well, we can actually multiply long binary numbers asymptotically faster than Ancient Egyptians.
      See eg https://en.wikipedia.org/wiki/Karatsuba_algorithm
    - kens 50 days ago
      > Binary multiplication has not changed in millennia. Ancient Egyptians were using the same binary integer multiplication logic 5 millennia ago as ALUs do today.
      It turns out that multiplication in modern ALUs is very different. The Pentium, for instance, does multiplication using base-8, not base-2, cutting the number of additions by a factor of 3. It also uses Booth's algorithm, so much of the time it is subtracting, not adding.
  - godelski 50 days ago
    Related, Computerphile had a video a few months ago where they try to put compute time relative to human time, similar to the way one might visualize an atom by making the proton the size of a golfball. I think it can help put some costs into perspective and really show why branching maters as well as the great engineering done to hide some of the slowdowns. But definitely some things are being marked simply by the sheer speed of the clock (like how the small size of a proton hides how empty an atom is)
```
  https://youtube.com/watch?v=PpaQrzoDW2I
```
  - kurthr 50 days ago
    and divides were worse. (1 cycle add, 10 cycle mult, 60 cycle div)
    [-]
    - genewitch 50 days ago
      That's fair but mod is division, or no? So realistically the new magic number version would be faster. Assuming there is 32 bit int support. Sorry, this is above my paygrade.
      [-]
      - bobmcnamara 50 days ago
        Many compiles will compute div-by-a-constant using the invert, multiply, and shift off the remainder trick. Once you have that, you can do mod-by-a-constant as a derivative and usually still beat 1-bit or 2-bit division.
    - ryao 50 days ago
      Division still is worse:
      https://github.com/ridiculousfish/libdivide
    - qingcharles 50 days ago
      Yeah, I'm thinking more of ones that remove all the divs from some crazy math functions for graphics rendering and replace them all with bit shifts or boolean ops.
  - Someone 50 days ago
    And branches were cheaper without pipelining
  - TacticalCoder 50 days ago
    [dead]
22c 50 days ago
Part-way through the section on bit-twiddling, I thought to myself "Oh I wonder if we could use a solver here". Lo and behold, I was pleasantly surprised to see the author then take that exact approach. Love the attention to detail in this post!
dndn1 50 days ago
If you need to know a leap year and it's before the year 6000, I made an interactive calculator and visualization [1].
It's >3 machine instructions (and I admire the mathematical tricks included in the post), but it does do thousands of calculations fairly quickly :)
[1] https://calculang.dev/examples-viewer?id=leap-year
[-]
- ctxc 50 days ago
  Please make stuff fit into viewport width, hard to use on mobile as it stands :)
  [-]
  - dndn1 50 days ago
    Thanks for the prompt! I made a new gallery that's mobile friendly and I almost forgot why I need to land it - adding it to the list!
dahart 50 days ago
Looks like gcc & clang use some of the bit-twiddling tricks when you compile the original function with -O3: https://godbolt.org/z/eshd9axod
```
    is_leap_year(unsigned int):
            xor     eax, eax
            test    dil, 3
            jne     .L1
            imul    edi, edi, -1030792151
            mov     eax, 1
            mov     edx, edi
            ror     edx, 2
            cmp     edx, 42949672
            ja      .L1
            ror     edi, 4
            cmp     edi, 10737418
            setbe   al
    .L1:
            ret
```
[-]
- ryao 50 days ago
  They are sometimes very good at using mathematical identities to do simplifications. The following commit was actually inspired by the output of GCC:
  https://github.com/openzfs/spl/commit/8fc851b7b5315c9cae9255...
  Jason had noticed that GCC’s assembly output did not match the original macro when looking for a solution to the unsigned integer overflow warning that a PaX GCC plugin had output (erroneously in my opinion). He had conjectured we could safely adopt GCC’s version as a workaround. I gave him the proof of correctness for the commit message and it was accepted into ZFS. As you can see from the proof, deriving that from the original required 4 steps. I assume that GCC had gone through a similar process to derive its output.
nullc 50 days ago
There are many cute binary/logic tricks, if you like them be sure to read Hackers Delight and https://graphics.stanford.edu/~seander/bithacks.html . Once you've studied enough of them you'll find yourself easily coming up with more.
Warning: This may increase or decrease your popularity with fellow programmers, depending on how lucky you are in encountering problems where they make an important performance difference rather than a readability problem for people who have not deeply internalized bit twiddling.
Multiply and mask for varrious purposes is a thing I commonly use in my own code-- it's much more attractive now that it was decades ago because almost all computers we target these days have extremely fast multipliers.
These full-with logic operations and multipliers give you kind of a very parallel computer packed into a single instruction. The only problem is that it's a little tricky to program. :)
At least this one was easy to explain mechanically. Some bit hacks require p-adic numbers and other elements of number theory to explain.
fuzunoglu 50 days ago
Taking a look at numbers in binary reveals some interesting patterns. Although seems obvious, it was interesting to me when I realized that all prime numbers except 2 end with 1.
[-]
- silisili 50 days ago
  Not trying to be a jerk, but why is that interesting? Am I missing something more than all odd numbers end in 1, and primes by their nature cannot be even(except 2, as you mentioned).
  [-]
  - lkirkwood 50 days ago
    Just nitpick but all odd numbers end in an odd number, not 1, and all even numbers end in an even number i.e. a multiple of 2.
    [-]
    - silisili 50 days ago
      Sure, but OP was talking about binary representation.
      [-]
      - dmichulke 50 days ago
        GP's statement is true for all number systems with an even base (so also for binary and decimal)
    - Etheryte 50 days ago
      All odd numbers end in 1 in binary.
npendleton 50 days ago
This is so cool!
Terrible nitpick, but this is actually 3 operations, not instructions. On x86 you get 4:
```
  is_leap_year_fast:
        imul    eax, edi, 1073750999
        and     eax, -1073614833
        cmp     eax, 126977
        setb    al
        ret
```
On ARM you get a bit more due to instruction encoding:
```
  is_leap_year_fast:
        ldr     r1, .LCPI0_0
        mul     r0, r0, r1
        ldr     r1, .LCPI0_1
        and     r1, r0, r1
        mov     r0, #0
        cmp     r1, #126976
        movwls  r0, #1
        bx      lr
  .LCPI0_0:
        .long   1073750999
  .LCPI0_1:
        .long   3221352463
```
Compiler explorer reference: https://godbolt.org/z/7ajYqbT9z
[-]
- gpderetta 50 days ago
  You could argue that the setb and ret are not part of the leap year check itself. For example if the compiled inlined the call into a caller doing:
```
    if(is_leap_year_fast()) {...}
```
  Then the ret would obviously go away and the setb wouldn't be necessary as it could generate directly a conditional jmp from the result of the cmp.
  [-]
  - npendleton 50 days ago
    Hah, great point!
ReptileMan 50 days ago
Somewhat relevant and related.
>“So, it’s a bug in Lotus 123?”
>“Yeah, but probably an intentional one. Lotus had to fit in 640K. That’s not a lot of memory. If you ignore 1900, you can figure out if a given year is a leap year just by looking to see if the rightmost two bits are zero. That’s really fast and easy. The Lotus guys probably figured it didn’t matter to be wrong for those two months way in the past. It looks like the Basic guys wanted to be anal about those two months, so they moved the epoch one day back.”
https://www.joelonsoftware.com/2006/06/16/my-first-billg-rev...
[-]
- xrisk 50 days ago
  That was a great read. Thanks for sharing!
usr1106 50 days ago
Interesting. In one place the author argues: 0 is missing, but we already know...
The is no year 0, it goes 1 BC, 1 AD. So testing whether 0 is a leap year is moot.
[-]
- skissane 50 days ago
  > The is no year 0, it goes 1 BC, 1 AD. So testing whether 0 is a leap year is moot.
  Not true if you use astronomical year numbering: https://en.m.wikipedia.org/wiki/Astronomical_year_numbering
  Which is arguably the right thing to do outside of specific domains (such as history) in which BCE is entrenched
  If your software really has to display years in BCE, I think the cleanest way is store it as astronomical year numbering internally, then convert to CE/BCE on output
  [-]
  - rf15 50 days ago
    > Astronomers use the Julian calendar for years before 1582, including the year 0, and the Gregorian calendar for years after 1582
    So what happens when it's 1582? (sorry, currently no time to articulate a good wiki fix)
    [-]
    - skissane 50 days ago
      I think they use the original Gregorian cutover, in which 1582-10-04 is followed by 1582-10-15, and the dates 1582-10-05 through 1582-10-14 don’t exist.
      However, in general, I think proleptic Gregorian is simpler. But in astronomy do what the astronomers do. And in history, dates between 1582 and 1923 (inclusive), you really need to explicitly mark the date as Gregorian or Julian, or have contextual information (such as the country) to determine which one to use.
      1923 because that was when Greece switched from Julian to Gregorian, the last country to officially do so. Although various other countries in the Middle East and Asia adopted the Gregorian calendar more recently than 1923 - e.g. Saudi Arabia switched from the Islamic calendar to the Gregorian for most commercial purposes in 2016, and for most government purposes in 2023 - those later adoptions aren’t relevant to Julian-Gregorian cutover since they weren’t moving from Julian to Gregorian, they were moving from something non-Western to Gregorian
      Large chunks of the Eastern Orthodox Church still use the Julian calendar for religious purposes; other parts theoretically use a calendar called “Revised Julian” which is identical to Gregorian until 2800 and different thereafter - although I wonder if humanity (and those churches) are still around in 2800, will they actually deviate from Gregorian at that point, or will they decide not to after all, or forget that they were officially supposed to
      [-]
      - LegionMammal978 50 days ago
        It's frustrating when people just pick an arbitrary interpretation of old dates instead of actually looking at the context. Even beyond Julian vs. Gregorian leap-year rules, we also see historical variation in the start of the numbered year (January 1 vs. different dates in March). I'm not aware of any software that can represent all such dates as they appear in the primary sources: they always have to be translated through someone's manual effort, which can easily introduce misinterpretations. At least the day of week, when available, can help double-check a calendar assignment.
- JdeBP 50 days ago
  Go back to the start of the article, and you'll find that using the proleptic Gregorian calendar with astronomical year numbering is a premise for the algorithm.
  Without that design constraint, testing for leap years becomes locale-dependent and very complex indeed.
- timewizard 50 days ago
  ISO8601 accepts year 0. It is 1 BC in astronomical calendars. All the BC years gain a -1 offset as a result.
  [-]
  - usr1106 50 days ago
    Interesting, how standards just ignore reality.
    At work we had discussions what date format to use in our product. It's for trained users only (but not IT people), English UI only, but used on several continents. Our regulatory expert propsed ISO8601. I did not agree, because that is not used anywhere in daily life except by 8 millions Swedes. I voted 15-Apr-2025 is much less prone to human error. (None of us "won". Different formats in different places still...)
    [-]
    - deredede 50 days ago
      > that is not used anywhere in daily life
      Does it matter? MM-DD-YYYY is used in America and makes DD-MM-YYYY ambiguous, but as far as I know nobody uses YYYY-DD-MM, so ISO8601 should be perfectly fine, especially if users are trained. Besides, if you're not used to it, starting with the year forces you to think, which is desirable if you want to avoid human error.
      [-]
      - dahart 50 days ago
        I couldn’t have named the standard and never read it before today, but I’ve used YYYY-MM-DD for naming my own folders & files for a couple of decades, for the simple reason that it sorts correctly in chronological order.
        [-]
        usr1106 49 days ago
        You do it, I do it, probably many programmers or other systematically thinking people do it. How big share we are of the general public? Even many programs don't do it in their default naming when saving files. I find myself renaming often enough.
- nmehner 50 days ago
  https://listverse.com/2019/05/19/10-bizarre-calendar-fixes-t...
  Everything before the introduction of the gregorian calendar is moot:
  "In 1582, the pope suggested that the whole of Europe skip ten days to be in sync with the new calendar. Several religious European kingdoms obeyed and jumped from October 4 to October 15."
  So you cannot use any date recorded before that time for calculations.
  And before that it gets even more random:
  "The priests’ observations of the lunar cycles were not accurate. They also deliberately avoided leap years over superstitions. Things got worse when they started receiving bribes to declare a year longer or shorter than necessary. Some years were so long that an extra month called Intercalaris or Mercedonius was added."
  [-]
  - usr1106 50 days ago
    Before 1582 the rule is just simpler. If it is divisible by 4 it's a leap year. So the difference is relevant for years 300, 500, 600, 700, 900 etc. For ranges spanning those years the Gregorian algorithm would result in results not matching reality.
    When the Julian calendar was really adopted I don't know. Certainly not 0001-01-01. And of course it varies by country like Gregorian.
    [-]
    - pbhjpbhj 50 days ago
      From Wikipedia:
      >The Julian calendar was proposed in 46 BC by (and takes its name from) Julius Caesar, as a reform of the earlier Roman calendar, which was largely a lunisolar one.[2] It took effect on 1 January 45 BC, by his edict.
      Not knowing the year seems unhinged somehow.
      [-]
      - AdhemarVandamme 50 days ago
        It was already known to scholars that the length of a (tropical) year is close to 365-and-a-quarter days since at least 238 BC (when Ptolemy III tried to fix the length of the year in the Egyptian calender to 365-and-a-quarter days in the Canopus Decree).
        However, due to a mistranslation the Roman pontifices got it wrong at the introduction of the Julian calendar. The Romans counted inclusively, which means: counting with both the start and end included. (That is why Christians say in a literal translation from Latin that Jesus has risen on the third day, even though he died on a Friday and is said to have risen two days later, on the next Sunday.)
        In the first years of the Julian calendar, the Roman pontifices inserted a leap day “every fourth year”, which in their way of counting means: every 3 years. Authors differ on exactly which years were leap years. The error got corrected under Augustus by skipping a few leap years and then following the “every 4 years” rule since either AD 4 or AD 8. See the explanation and the table in https://en.wikipedia.org/wiki/Julian_calendar#Leap_year_erro...
        Also note that at the time, years were mostly identified by the names of the consuls rather than by a number. Historians might use numbers, counting from when they thought Rome was founded (Ab urbe condita), but of course they differed among each other on when that was. The chronology by Atticus and Varro, which placed the founding of the city on 21 April 753 BC in the proleptic Julian calendar, was not the only one.
NelsonMinar 50 days ago
It's rare to read code that makes me literally laugh out loud. What a delight.
olq 50 days ago
This is gold! HN Kino! Taking the hardest problem in the world, date checking, and casually bitflipping the hell out of it. Hats off man :D
andrepd 50 days ago
Interesting cppcon talk related to this (author is cited in TFA): https://www.youtube.com/watch?v=0s9F4QWAl-E
croisillon 50 days ago
They should add that to https://github.com/alexmacarthur/current-time-api
nickysielicki 50 days ago
Knowing how to use z3 for stuff like this is a superpower that not a lot of people have, but is definitely worth knowing if you work with code that needs to be optimized at this level. I have an mcp script that interfaces with z3, and this comment is a reminder to myself to find some time to expand it in the future for this specific flow.
It’s also worth calling out angr as an interface between capstone and z3, which can take this to another level.
JdeBP 50 days ago
One of these days, someone will get creative and do this sort of thing for the leap year algorithm of the Revised Julian Calendar instead of the Gregorian one.
charcircuit 50 days ago
The original function is likely only going to be 3 instructions. xor, test, jne and only 1 of these is dependent on a previous instruction. In the "fast" version from the article there are 4 instructions with each depending on the previous instruction. I'm not surprised it lost in the benchmark.
[-]
- Dylan16807 50 days ago
  A branch that triggers 3/4 of the time will not perform well.
  Whether that matters comes down to how this function integrates into the rest of the program.
  [-]
  - charcircuit 50 days ago
    I don't think the years tested will be random. I think practically it will see long strings of the same value.
- degamad 50 days ago
  3 instructions and a branch prediction.
userbinator 50 days ago
At first I thought it would be using the BCD instructions in some way, as shown near the bottom of this article: https://news.ycombinator.com/item?id=8477254
[-]
- kragen 50 days ago
  Yeah, I was thinking AAM, but that wouldn't get you to 3.
high_pathetic 50 days ago
https://news.ycombinator.com/item?id=42915723#42924485
JdeBP 50 days ago
If the author ever reads this: The hyperlink to Jacob Pratt's article has the hyperlink URL and text swapped.
[-]
- hairtuq 50 days ago
  Thanks, fixed.
lame-lexem 50 days ago
tha page seems to have problems with layout overflow in equation blocks on mobile. It seems that because spans are inline elements they won't overflow. I think you can make them block elements and enable some form of overflow to solve it.
AGivant 49 days ago
Code is written not just for computer to execute, but for other people to read and understand. If you put such code readability will suffer tremendously. Beside that, what are you doing in your code that you need to optimize leap year check?! This clever tricks it remind me phrase that lead to many accidents- 'Hold my beer and see what I can do!'
esafak 50 days ago
I'd be impressed if an LLM derived this independently.
[-]
- 1_08iu 50 days ago
  Perhaps compilers that use AI to generate machine code are the future... :^)
- m3kw9 50 days ago
  Wouldn’t be surprised if they could now
  [-]
  - shric 50 days ago
    I would, an LLM has never produced an original idea.
smeej 50 days ago
> whether a year 0 ≤ y ≤ 102499
or equal to 0?
[-]
- cestith 50 days ago
  Yes, on the calendar being discussed.
crossroadsguy 50 days ago
This reminds me of once when I was giving an algo/ds interview (in Java) and the interviewer started asking me questions where one of the answers usually was “to be memorised” shit like this and he started pestering me to give him those answers (even though I said I don’t know and definitely don’t recall) and that too in C. As per him “everyone who coded knew C.. at least in college” and started becoming a bit more hostile. I think it was the first interview that I had ended as an interviewee.
And I call this thing “bit gymnastics”.
[-]
- abrookewood 50 days ago
  Sounds like someone desperate to prove their superiority - now imagine working with them every day.
  [-]
  - andrewmcwatters 50 days ago
    God, everyone has one of these stupid stories, don't they? For me, it was some moron at a Wordpress sweatshop who pointed out that the way to find the number of parameters of a function was through (a now deprecated and unsupported!) arity property.
    OK, great. 16 years into my career, and I've never needed to count the number of parameters of a JavaScript function. And if you needed to, that wasn't going to be the property you'd read anyway.
  - pastage 50 days ago
    It is not they way todo it agreed on that, but it can be one of the only way of getting hand wavy persons to get technical. Interviewing can be frustrating when you do not see what you expected. So no need to believe it was done toxicly.
    [-]
    - crossroadsguy 50 days ago
      > getting hand wavy persons to get technical
      Interesting.
      And then I literally said “hostile” in my comment and you didn’t find it toxic? Strange but it’s okay :)
      [-]
      - pastage 48 days ago
        I sympathize, your experiences are of course different from mine. Hostile is such a useless tactic at interviews that I just find it mind boggling that someone would do it on purpose.
Quenby 50 days ago
Never thought a leap year check could be this interesting. Maybe low-level programmers had already discovered tricks like this long ago，they just never got written down? Feels like there’s still so much like this, hidden in old code, waiting to be rediscovered. If anyone has a collection of these kinds of techniques, I’d really love to dig into it.
[-]
- arp242 50 days ago
  Lotus 1-2-3 famously considers 1900 a leap year (carried over in Excel for compatibility), presumably because it just does the "(y & 3) != 0" check (arguably a reasonable optimisation given the hardware of the time).
  I'd be surprised if someone found this solution before, as it seems both relatively difficult to find and a small optimisation.
- nickdothutton 50 days ago
  There are things I had to learn at home in the 80s on the z80 that I have mostly forgotten. Very occasionally I wheel something out to “show the kids” (those in their 20s) and it feels like performing a magic trick.
drewg123 50 days ago
I tend to be of the opinion that for modern general purpose CPUs in this era, such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.
But do you know what's not free? Memory accesses[1]. So when I'm optimizing things, I focus on making things more cache friendly.
[1] http://gec.di.uminho.pt/discip/minf/ac0102/1000gap_proc-mem_...
[-]
- andrepd 50 days ago
  > I tend to be of the opinion that for modern general purpose CPUs in this era, such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.
  What does this mean? Free? Optimisations are totally unnecessary because... instructions are free?
  The implementation in TFA is probably on the order of 5x more efficient than a naive approach. This is time and energy as well. I don't understand what "free" means in this context.
  Calendar operations are performed probably trillions of times every second across all types of computers. If you can make them more time- and energy-efficient, why wouldn't you?
  If there's a problem with modern software it's too much bloat, not too much optimisation.
  [-]
  - jdlshore 50 days ago
    GP made an important point that you seemed to have missed: in modern architectures, it’s much more important to minimize memory access than to minimize instructions. They weren’t saying optimization isn’t important, they were describing how to optimize on modern systems.
  - drewg123 50 days ago
    If this is indeed done trillions of times a second, which I frankly have a hard time believing, then sure, it might be worth it. But on a modern CPU, focusing on an optimization like this is a poor use of developer resources. There are likely several other optimizations related to cache locality that you could find in less time than it would take to do this, and those other optimizations would probably give several orders of magnitude more improvement.
    Not to mention that the final code is basically a giant WTF for anybody reading it. It will be an attractive nuisance that people will be drawn to, like moths to a flame, any time there is a bug around calendar operations.
    [-]
    - wtetzner 50 days ago
      > But on a modern CPU, focusing on an optimization like this is a poor use of developer resources.
      How many people are rolling their own datetime code? This seems like a totally fine optimization to put into popular datetime libraries.
    - andrepd 50 days ago
      > There are likely several other optimizations related to cache locality that you could find in less time than it would take to do this, and those other optimizations would probably give several orders of magnitude more improvement.
      How is cache / memory access relevant in a subroutine that performs a check on a 16bit number?
      > Not to mention that the final code is basically a giant WTF for anybody reading it. It will be an attractive nuisance that people will be drawn to, like moths to a flame, any time there is a bug around calendar operations.
      1: comments are your friend
      2: a unit test can assert that this function is equivalent to the naive one in about half a millisecond.
- achierius 50 days ago
  Just because CPU performance is increasing faster than DRAM speeds doesn't mean that CPU performance is "free" while memory is "expensive". One thing that you're ignoring is the impact of caches and prefetching logic which have significantly improved the performance of memory-bound workloads in the last 5-10 years. DRAM might be slow, but if you avoid going out to DRAM...
  More broadly, it 100% depends on your workload. You'd be surprised at how many workloads are compute-bound, even today: LLM inference might be memory bound (thus how it's possible to get remotely good performance on a CPU), but training, esp. prefill, is very much not. And once you get out of LLMs, I'd say that most applications of ML tend to be compute bound.
- drewg123 50 days ago
  To temper this slightly, these sorts of optimizations are useful on embedded CPUs for device firmware, IOT, etc. I've worked on smart NIC CPUs where cycles were so precious we'd do all kinds of crazy unreadable things.
  [-]
  - bigiain 50 days ago
    I suspect most IOT device manufacturers expect/design their device to be landfill before worrying about leap year math. (In my least optimistic moments, I suspect some of them may intentionally implement known broken algorithms that make their eWaste stop working correctly at some point in the near future that's statistically likely to bear beyond the warranty period.)
    [-]
    - crote 50 days ago
      "Is year divisible by four" will work perfectly fine for the next 75 years. Random consumer devices are definitely not going to survive that long, so being capable of dealing with it adds exactly zero value to the product.
      It's hard to imagine being in a situation where the cost of the additional "year divisible by 100" check, or even the "year divisible by 400" check is too much to bear, and it's trivial enough that the developer overhead is negligible, but you never know when you need those extra handful of bytes of memory I guess...
      [-]
      - bigiain 49 days ago
        > but you never know when you need those extra handful of bytes of memory I guess...
        I guess on all those crappiest of 32 bit devices with 2.6 based kernels (which are practically unupgradeable due to vendor patches[1]) which will never make it to 2100 to the "year divisible by 100" check, because their time_t overflows in 2038.
        On the plus side, that overflow turns the date into 1901, so perhaps they won't actually blow up due to leapyear bugs for another couple of years? That'll be a fun vintage computing archeological bug fix for someone.
        [1] I am at least partly responsible for one of these products from a failed hardware startup I work at in 2012/13.
  - sitzkrieg 50 days ago
    on the flip side of the topic, trying to do any datetime handling on the edge of embedded compute is going to be wrong 100% of the time anyway
    [-]
    - RaoulP 50 days ago
      Would you mind elaborating?
- PaulKeeble 50 days ago
  The thing is about these optimisations (assuming they test as higher performance) is that they can get applied in a library and then everyone benefits from the speedup that took some hard graft to work out. Very few people bake their own date API nowadays if they can avoid it since it already exists and techniques like this just speed up every programme whether its on the critical path or not.
  [-]
  - codexb 50 days ago
    That's basically compilers these days. It used to be that you could try and optimize your code, inline things here and there, but these days, you're not going to beat the compiler optimization.
    [-]
    - kragen 50 days ago
      That is a meme that people repeat a lot, but it turns out to be wrong:
      https://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf (though see https://blog.regehr.org/archives/1515: "This piece (...) explains why Daniel J. Bernstein’s talk, The death of optimizing compilers (audio [http://cr.yp.to/talks/2015.04.16/audio.ogg]) is wrong", citing https://news.ycombinator.com/item?id=9397169)
      https://blog.royalsloth.eu/posts/the-compiler-will-optimize-...
      http://lua-users.org/lists/lua-l/2011-02/msg00742.html
      https://web.archive.org/web/20150213004932/http://x264dev.mu...
      [-]
      - kragen 50 days ago
        I guess I should mention that https://blog.regehr.org/archives/1515 doesn't dispute that people can pretty much always beat the shit out of optimizing compilers; Regehr explicitly says, "of course there’s plenty of hot code that wants to be optimized by hand." Rather, where he disagrees is whether it's worthwhile to use optimizing compilers for other code.
        Daniel Berlin's https://news.ycombinator.com/item?id=9397169 does kind of disagree, saying, "If GCC didn't beat an expert at optimizing interpreter loops, it was because they didn't file a bug and give us code to optimize," but his actual example is the CPython interpreter loop, which is light-years from the kind of hand-optimized assembly interpreter Mike Pall's post is talking about, and moreover it wasn't feeding an interpreter loop to GCC but rather replacing interpretation with run-time compilation. Mostly what he disagrees about is the same thing Regehr disagrees about: whether there's enough code in the category of "not worth hand-optimizing but still runs often enough to matter", not whether you can beat a compiler by hand-optimizing your code. On the contrary, he brings up whole categories of code where compilers can't hope to compete with hand-optimization, such as numerical algorithms where optimization requires sacrificing numerical stability. mpweiher's comment in response discusses other scenarios where compilers can't hope to compete, like systems-level optimization.
        It's worth reading the comments by haberman and Mike Pall in the HN thread there where they correct Berlin about LuaJIT, and kjksf also points out a number of widely-used libraries that got 2–4× speedups over optimized C by hand-optimizing the assembly: libjpeg-turbo, Skia, and ffmpeg. It'd be interesting to see if the intervening 10 years have changed the situation, because GCC and LLVM have improved in that time, but I doubt they've improved by even 50%, much less 300%.
    - ryao 50 days ago
      Meanwhile, GCC will happily implement bsearch() without cmov instructions and the result will be slower than a custom implementation on which it emits cmov instructions. I do not believe anyone has filed a bug report specifically about the inefficient bsearch(), but the bug report I filed a few years ago on inefficient code generation for binary search functions is still open, so I see no point in bothering:
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110001
      Binary searches on OpenZFS B-Tree nodes are faster in part because we did not wait for the compiler:
      https://github.com/openzfs/zfs/commit/677c6f8457943fe5b56d7a...
      Eliminating comparator function overhead via inlining is also a part of the improvement, which we would not have had because the OpenZFS code is not built with LTO, so even if the compiler fixes that bug, the patch will still have been useful.
    - godelski 50 days ago
      Definitely not true.
      You aren't going to beat the compiler if you have to consider a wide range of inputs and outputs but that isn't a typical setting and you can actually beat them. Even in general settings this can be true because it's still a really hard problem for the compiler to infer things you might know. That's why C++ has all those compiler hints and why people optimize with gcc flags other than -O.
      It's often easy to beat Blas in matrix multiplication if you know some conditions on that matrix. Because Blas will check to find the best algo first but might not know (you could call directly of course and there you likely won't win, but you're competing against a person not the compiler).
      Never over estimate the compiler. The work the PL people do is unquestionably useful but they'll also be the first to say you can beat it.
      You should always do what Knuth suggested (the often misunderstood "premature optimization" quote) and get the profile.
    - nerdralph 50 days ago
      I was disappointed with how bad avr-gcc was at optimizing code. Here's one of several examples I found. https://nerdralph.blogspot.com/2015/03/fastest-avr-software-...
    - matheusmoreira 50 days ago
      These days optimizing compilers are your number one enemy.
      They'll "optimize" your code by deleting it. They'll "prove" your null/overflow checks are useless and just delete them. Then they'll "prove" your entire function is useless or undefined and just "optimize" it to a no-op or something. Make enough things undefined and maybe they'll turn the main function into a no-op.
      In languages like C, people are well advised to disable some problematic optimizations and explicitly force the compiler to assume some implementation details to make things sane.
      [-]
      - ryao 50 days ago
        If they prove a NULL check is always false, it means you have dead code.
        For example:
        if (p == NULL) return; if (p == NULL) doSomething();
        It is safe to delete the second one. Even if it is not deleted, it will never be executed.
        What is problematic is when they remove something like memset() right before a free operation, when the memset() is needed to sanitize sensitive data like encryption keys. There are ways of forcing compilers to retain the memset(), such as using functions designed not to be optimized out, such as explicit_bzero(). You can see how we took care of this problem in OpenZFS here:
        https://github.com/openzfs/zfs/pull/14544
        [-]
        matheusmoreira 50 days ago
        They just think you have lots of dead code because of silly undefined behavior nonsense.
        char *allocate_a_string_please(int n) { if (n + 1 < n) return 0; // overflow return malloc(n + 1); // space for the NUL }
        This code seems okay at first glance, it's a simple integer overflow check that makes sense to anyone who reads it. The addition will overflow when n equals INT_MAX, it's going to wrap around and the function will return NULL. Reasonable.
        Unfortunately, we cannot have nice things because of optimizing compilers and the holy C standard.
        The compiler "knows" that signed integer overflow is undefined. In practice, it just assumes that integer overflow cannot ever happen and uses this "fact" to "optimize" this program. Since signed integers "cannot" overflow, it "proves" that the condition always evaluates to false. This leads it to conclude that both the condition and the consequent are dead code.
        Then it just deletes the safety check and introduces potential security vulnerabilities into the software.
        They had to add literal compiler builtins to let people detect overflow conditions and make the compiler actually generate the code they want it to generate.
        Fighting the compiler's assumptions and axioms gets annoying at some point and people eventually discover the mercy of compiler flags such as -fwrapv and -fno-strict-aliasing. Anyone doing systems programming with strict aliasing enabled is probably doing it wrong. Can't even cast pointers without the compiler screwing things up.
        [-]
        ryao 50 days ago
        I would consider the use of signed integers for sizes to be wrong, but if you insist on using them in this example, just test for (n == INT_MAX). malloc itself uses size_t, which is unsigned.
        I have been known to write patches converting signed integers to unsigned integers in places where signed arithmetic makes no sense.
        [-]
        matheusmoreira 49 days ago
        The real problem is the fact compilers constantly screw up perfectly good code by optimizing it based on unreasonable "letter of the law" assumptions.
        The issue of signed versus unsigned is tangential at best. There are good arguments in favor of the use of signed and unsigned integers. Neither type should cause compilers to screw code up beyond recognition.
        The fact is signed integer overflow makes sense to programmers and works just fine on the machines people care to write code for. Nobody really cares that the C standard says signed integer overflow is undefined. That's just an excuse. If it's undefined, then simply define it.
        Of course you can test for INT_MAX. The problem is you have to somehow know that you must do it that way instead of trying to observe the actual overflow. People tend to learn that sort of knowledge by being burned by optimizing compilers. I'd very much rather not have adversarial compilers instead.
        [-]
        ryao 49 days ago
        I have been known to fix bugs involving signed integer overflow in C code others wrote.
        In any case, send your feedback to the C standard committee. They can change the standard. It would not be a bad change to make.
        oguz-ismail 50 days ago
        >if (n + 1 < n)
        No one does this
        [-]
        matheusmoreira 50 days ago
        Oh people absolutely do this.
        Here's a 2018 example.
        https://github.com/mruby/mruby/commit/180f39bf4c5246ff77ef71...
        https://github.com/mruby/mruby/issues/4062
        while (l >= bsiz - blen) { bsiz *= 2; if (bsiz < 0) mrb_raise(mrb, E_ARGUMENT_ERROR, "too big specifier"); }
        > bsiz*=2 can become negative.
        > However with -O2 the mrb_raise is never triggered, since bsiz is a signed integer.
        > Signed integer overflows are undefined behaviour and thus gcc removes the check.
        People have even categorized this as a compiler vulnerability.
        https://www.kb.cert.org/vuls/id/162289
        > C compilers may silently discard some wraparound checks
        And they aren't wrong.
        The programmer wrote reasonable code that makes sense and perfectly aligns with their mental model of the machine.
        The compiler took this code and screwed it up because it violates compiler assumptions about some abstract C machine nobody really cares about.
        [-]
        Joker_vD 50 days ago
        I propose we rewrite everything in my yet-unnamed new low-level language:
        loop while l >= bsize - blen; bsiz, ovf := bsiz * 2; if ovf <> 0 then mrb_raise(mrb, E_ARGUMENT_ERROR, ("too big specifier\x00")); end end
        [-]
        ryao 50 days ago
        Just stop using signed integers to hold sizes. malloc itself takes size_t, which is unsigned.
        Dylan16807 50 days ago
        If they prove a check is always false, it means you have dead code or you made a mistake.
        It is very very hard to write C without mistakes.
        When not-actually-dead code gets removed, the consequences of many mistakes get orders of magnitudes worse.
- godelski 50 days ago
```
  > modern CPUs are so fast that instructions are almost free.
```
  Please don't.
  These things compound. You especially need to consider typical computer usage involves using more than one application at a time. There's a tragedy of the commons issue that's often ignored. It can be if you're optimizing your code (you're minimizing your share!) but it can't be if you're not.
  I guarantee you we'd have a lot of faster things if people invested even a little time (these also compound :). Two great examples might be Llama.cpp and FlashAttention. Both of these have had a huge impact of people (among a number of other works) but don't get nearly the same attention as other stuff. These are popular instances but I promise you that there's a million problems like these waiting to be solved. It's just not flashy, but hey plumbers and garbagemen are pretty critical jobs too
  [-]
  - EnPissant 50 days ago
    You haven't refuted the parent comment at all. They asserted that instructions are insignificant, and other things, such as memory accesses, dominate.
    [-]
    - godelski 50 days ago
      >> You especially need to consider typical computer usage involves using more than one application at a time. There's a tragedy of the commons issue
      These resources include:
      - disk/ssd/long term memory - RAM/System memory - Cache BUT ALSO - Registers - CPU Cores - Busses/Lanes/Bandwith - Locks - Network
      My point is that I/O only dominates when you're actually acting efficiently. This is dominating in the case of measuring a single operating program.
      You're forgetting that when multiple programs are running that there's a lot more going on. There's a lot more communication going on too. The caches are super tiny and in high competition. To handle interlacing all those instructions. Even a program's niceness can dramatically change total performance. This is especially true when we're talking about unoptimized programs because all those little things that the OS has to manage pile up.
      Get out your computer architecture book and do a skim to refresh. Even Knuth's Book (s)[1] discuss much of this because to write good programs you gotta understand the environment they're running in. Otherwise I'd be like trying to build a car but not knowing if you're building it for the city, Antarctica, or even the moon. The environment is critical to the assumptions you can make.
      [0] https://en.wikipedia.org/wiki/Nice_(Unix)
      [1] https://www-cs-faculty.stanford.edu/~knuth/taocp.html
      [-]
      - godelski 49 days ago
        Actually had a good real world example today. My TV had an update a month ago and since then most of the apps don't actually work. Netflix plays with like 3 pixels, Hulu just hangs. Luckily I use the TV as a monitor 99% of the time. But I think we all know how slow a lot of these systems get. Things getting slower over time... it works fine when things ship but like I suggested, issues build over time. One app here, another there, and before you know it you're buying a new TV, computer, phone, whatever
    - windward 50 days ago
      They do, until you have a tough problem that's still too slow after it's cache efficient
- kreco 50 days ago
  > such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.
  I'm amazed by the fact there is always someone who will say that such optimization are totally unnecessary.
  [-]
  - godelski 50 days ago
    I'm amazed by the fact there is always someone who misinterprets Knuth's "premature optimization", reading as "don't optimize" instead of "pull out the profiler"
  - recursive 50 days ago
    Some people have significant positions on CPU manufacturers, so there will always be at least a few.
- kragen 50 days ago
  It's true that this code was optimized from 2.6ns down to 0.9ns, a saving of 1.7ns, while an L2 cache miss might be 80ns. But 1.7ns is still about 2% of the 80ns, and it's about 70% of the 2.6ns. You don't want to start optimizing by reducing things that are 2% of your cost, but 2% isn't insignificant.
  The bigger issue is that probably you don't need to do leap-year checks very often so probably your leap-year check isn't the place to focus unless it's, like, sending a SQL query across a data center or something.
- timewizard 50 days ago
  They're not free at all. They're unlikely to create noticeable latency; however, CPU clock speeds and thus power consumption haven't been constant for years now. You are paying for that lack of optimization and mobile users would thank you to do it.
  There is no silver bullet.
- __turbobrew__ 50 days ago
  This is why linear arrays are the fastest datastructure unless proven otherwise.
- croes 50 days ago
  Related
  > The world could run on older hardware if software optimization was a priority
  https://news.ycombinator.com/item?id=43971464
  [-]
  - Dylan16807 50 days ago
    I disagree.
    This is the kind of optimization that makes you need a 1.5MHz CPU instead of a 1MHz CPU, but saves devs weeks of effort. It's the kind of thing you give up when you move optimization from priority 1 to 2, or from 2 to 3. It would still run blazingly fast on a 30 year old computer. It's a perfectly good tradeoff.
    The stuff that bogs down modern hardware is optimization being priority 8 or not even on the list of considerations.
- GuB-42 50 days ago
  Integer division (and modulo) is not cheap on most CPUs. Along with memory access and branch prediction, it is something worth optimizing for.
  And since you are talking about memory. Code also goes in memory. Shorter code is more cache friendly.
  I don't see a use case where it matters for this particular application (it doesn't mean there isn't) but well targeted micro-optimizations absolutely matter.
- adonovan 50 days ago
  Quite right. If you use a compiled language (e.g. Go) the difference between the two implementations is indeed negligible.
  https://go.dev/play/p/i72xCyhqRkC
- klysm 50 days ago
  I think about branches a lot too when optimizing
captaincrunch 50 days ago
This is fast, READABLE, and accurate:
bool is_leap_year(uint32_t y) { // Works for Gregorian years in range [0, 65535] return ((!(y & 3)) && ((y % 25 != 0) || !(y & 15))); }
[-]
- andrepd 50 days ago
  This impl is mentioned in TFA.. It's much slower and includes branches.
  [-]
  - hoten 50 days ago
    I'd expect even without optimizations on, there wouldn't be branches in the output for that code.
    [-]
    - kragen 50 days ago
      There are, even with optimizations on. You could have checked: https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename...
      I didn't find any way to get a compiler to generate a branchless version. I tried clang and GCC, both for amd64, with -O0, -O5, -Os, and for clang, -Oz.
      [-]
      - mmozeiko 50 days ago
        If you change logic and/or to bitwise and/or then it'll be branchless.
        [-]
        kragen 49 days ago
        True: https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename... but I understood hoten to be saying that compilers would generally produce that version from the short-circuiting version, and they don't.
        [-]
        hoten 48 days ago
        Yeah I was wrong.
        Do we know why the compiler doesn't do it? Surely the output is the same and avoiding branches is clearly faster.
        Maybe short circuiting requires such an optimization not be made?
        [-]
        kragen 47 days ago
        There are cases where the optimization wouldn't be safe (like i < n && a[i] != k) but this is not one of them. Maybe the compiler is just dum. Or maybe avoiding branches is not clearly faster in cases like this? Have you measured this particular case?
- windward 50 days ago
  >READABLE
  Great. It will be useful for the exhaustive tests of the faster version.
- kragen 50 days ago
  You commented out your entire function body and the closing }. Also, on 32-bit platforms, it doesn't stop working at 65535.
  [-]
  - captaincrunch 50 days ago
    just a formatting issue on my side, there were \n.
    [-]
    - archargelod 50 days ago
      This website eats newlines, unless you double them (one of the annoying features of markdown). You can use codeblocks by putting 4 spaces before each line:
      int main() { // this should be properly formatted return 0; };
    - kragen 50 days ago
      If you fix it, other people can test your code without having to fix the syntax themselves first.