Git commands I run before reading any code

(piechowski.io)

560 points | by grepsedawk 4 hours ago

34 comments

  • pzmarzly 3 hours ago
    Jujutsu equivalents, if anyone is curious:

    What Changes the Most

        jj log --no-graph -r 'ancestors(trunk()) & committer_date(after:"1 year ago")' \
          -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
          | sort | uniq -c | sort -nr | head -20
    
    Who Built This

        jj log --no-graph -r 'ancestors(trunk()) & ~merges()' \
          -T 'self.author().name() ++ "\n"' \
          | sort | uniq -c | sort -nr
    
    Where Do Bugs Cluster

        jj log --no-graph -r 'ancestors(trunk()) & description(regex:"(?i)fix|bug|broken")' \
          -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
          | sort | uniq -c | sort -nr | head -20
    
    Is This Project Accelerating or Dying

        jj log --no-graph -r 'ancestors(trunk())' \
          -T 'self.committer().timestamp().format("%Y-%m") ++ "\n"' \
          | sort | uniq -c
    
    How Often Is the Team Firefighting

        jj log --no-graph \
          -r 'ancestors(trunk()) & committer_date(after:"1 year ago") & description(regex:"(?i)revert|hotfix|emergency|rollback")'
    
    Much more verbose, closer to programming than shell scripting. But less flags to remember.
    • palata 3 hours ago
      To me, it makes jujutsu look like the Nix of VCSes.

      Not meaning to offend anyone: Nix is cool, but adds complexity. And as a disclaimer: I used jujutsu for a few months and went back to git. Mostly because git is wired in my fingers, and git is everywhere. Those examples of what jujutsu can do and not git sound nice, but in those few months I never remotely had a need for them, so it felt overkill for me.

      • Jenk 2 hours ago
        Tbf you wouldn't use/switch to jj for (because of) those kind of commands, and are quite the outlier in the grand list of reasons to use jj. However the option to use the revset language in that manner is a high-ranking reason to use jj in my opinion.

        The most frequent "complex" command I use is to find commits in my name that are unsigned, and then sign them (this is owing to my workflow with agents that commit on my behalf but I'm not going to give agents my private key!)

            jj log -r 'mine() & ~signed()'
        
            # or if yolo mode...
        
            jj sign -r 'mine() & ~signed()'
        
        I hadn't even spared a moment to consider the git equivalent but I would humbly expect it to be quite obtuse.
        • palata 2 hours ago
          Actually, signing was one of the annoying parts of jujutsu for me: I sign with a security key, and the way jujutsu handled signing was very painful to me (I know it can be configured and I tried a few different ways, but it felt inherent to how jujutsu handles commits (revisions?)).
    • stingraycharles 1 hour ago
      I don’t understand how people can remember all these custom scripting languages. I can’t even remember most git flags, I’m ecstatic when I remember how to iterate over arrays in “jq”, I can’t fathom how people remember these types of syntaxes.
      • Cthulhu_ 53 minutes ago
        I don't, I will google things and fiddle, then put it in a git alias (with a comment on what it does and / or where I got it from) and push it to my private dotfiles repo, taking it with me between computers and projects.
      • crispyambulance 26 minutes ago
        I am convinced that the vast majority of professionals simply don't bother to remember and, ESPECIALLY WITH GIT, just look stuff up every single time the workflow deviates from their daily usage.

        At this point perhaps a million person-years have been sacrificed to the semantically incoherent shit UX of git. I have loathed git from the beginning but there's effectively no other choice.

        That said, the OP's commands are useful, I am copying them (because obviously I won't ever memorize them).

      • mgfist 1 hour ago
        Same, but now with AI I don't have to remember that anymore
    • faangguyindia 2 hours ago
      I can't remember all of this, does anyone know of any LLM model trained on CLI which can be run locally?
      • fainpul 22 minutes ago
      • lamasery 1 hour ago
        If you copy those commands into a file and use that file to prompt the “sh” LLM.
        • stingraycharles 1 hour ago
          That works until you need a small variation of any of these commands and you’re lost.
      • esafak 45 minutes ago
        Not a model, but a product: warp.dev
    • gib444 1 hour ago
      Hah someone really looked at jq (?) and thought: "yes, more of this everywhere". I feel jq is like marmite
    • huflungdung 51 minutes ago
      [dead]
  • bsuvc 1 hour ago
    I love how the author thinks developers write commit messages.

    All joking aside, it really is a chronic problem in the corporate world. Most codebases I encounter just have "changed stuff" or "hope this works now".

    It's a small minority of developers (myself included) who consider the git commit log to be important enough to spend time writing something meaningful.

    AI generated commit messages helps this a lot, if developers would actually use it (I hope they will).

    • mikepurvis 43 minutes ago
      In codebases where PRs are squashed on merge, the commit messages on the main branch end up being the PR body description text, and that's actually reviewed so tends to be much better I find.
    • ramijames 31 minutes ago
      This is a team lead/CTO problem. A good leader will be explicit in their expectations that developers write good commit messages. I've certainly had good leaders that expect this.
    • grepsedawk 28 minutes ago
      Only two of the five depend on commit messages. Churn, authorship, and velocity work regardless. Even teams with terrible hygiene write "fix" when something breaks.
    • 8cvor6j844qw_d6 47 minutes ago
      > AI generated commit messages

      git log --oneline and a sprinkle of your personal sauce on .claude goes a long way :)

    • sigmoid10 59 minutes ago
      Only two of the five insights are based on commit messages and the author acknowledges that they won't work in projects without message discipline. But the remaining ones will give you valuable insights even into the most lazy project department.
    • itmitica 53 minutes ago
      I love how the commentator thinks a developer makes decisions based on commit messages.

      Random, subjective, or written in a state of mental exhaustion commit messages.

      I also love the switcheroo the author made: git not logs. But hey :)

  • mattrighetti 2 hours ago
    I have a summary alias that kind of does similar things

      # summary: print a helpful summary of some typical metrics
      summary = "!f() { \
        printf \"Summary of this branch...\n\"; \
        printf \"%s\n\" $(git rev-parse --abbrev-ref HEAD); \
        printf \"%s first commit timestamp\n\" $(git log --date-order --format=%cI | tail -1); \
        printf \"%s latest commit timestamp\n\" $(git log -1 --date-order --format=%cI); \
        printf \"%d commit count\n\" $(git rev-list --count HEAD); \
        printf \"%d date count\n\" $(git log --format=oneline --format=\"%ad\" --date=format:\"%Y-%m-%d\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
        printf \"%d tag count\n\" $(git tag | wc -l); \
        printf \"%d author count\n\" $(git log --format=oneline --format=\"%aE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
        printf \"%d committer count\n\" $(git log --format=oneline --format=\"%cE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
        printf \"%d local branch count\n\" $(git branch | grep -v \" -> \" | wc -l); \
        printf \"%d remote branch count\n\" $(git branch -r | grep -v \" -> \" | wc -l); \
        printf \"\nSummary of this directory...\n\"; \
        printf \"%s\n\" $(pwd); \
        printf \"%d file count via git ls-files\n\" $(git ls-files | wc -l); \
        printf \"%d file count via find command\n\" $(find . | wc -l); \
        printf \"%d disk usage\n\" $(du -s | awk '{print $1}'); \
        printf \"\nMost-active authors, with commit count and %%...\n\"; git log-of-count-and-email | head -7; \
        printf \"\nMost-active dates, with commit count and %%...\n\"; git log-of-count-and-day | head -7; \
        printf \"\nMost-active files, with churn count\n\"; git churn | head -7; \
      }; f"
    
    EDIT: props to https://github.com/GitAlias/gitalias
    • duskdozer 2 hours ago
      Curious - why write it as a function in presumably .gitconfig and not just a git-summary script in your path? Just seems like a lot of extra escapes and quotes and stuff
      • mattrighetti 2 hours ago
        It's a very old config that I copied from someone many years ago, agree that it's a bit hard to parse visually.
      • Cthulhu_ 51 minutes ago
        Not the poster, but one theory: so you only need to copy one file. Portability.
        • mr_mitm 24 minutes ago
          Looks like the above assumes a POSIX shell, so one could argue a dedicated script would actually be more portable.
    • TonyStr 1 hour ago
      Looks nice. Unfortunately I don't have log-of-count-and-email, log-of-count-and-day or churn
    • ape4 1 hour ago
      You could make a local `man` page.
  • ramon156 3 hours ago
    > The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”

    The most changed file is the one people are afraid of touching?

    • rbonvall 2 hours ago
      Just like that place that's so crowded nobody goes there anymore.
    • dewey 3 hours ago
      I've just tried this, and the most touched files are also the most irrelevant or boring files (auto generated, entry-point of the service etc.) in my tests.
      • nulltrace 2 hours ago
        Yeah same thing happens with lockfiles and CI configs. You end up filtering out half the list before it tells you anything useful.
      • pydry 1 hour ago
        I just tried it too and it basically just flagged a handful of 1500+ line files which probably ought to be broken up eventually but arent causing any serious problems.
        • Cthulhu_ 50 minutes ago
          If it's (like in my case) dependency management, localization or config files, breaking them up will likely only cause more issues. Make sure that it's an actual improvement before breaking things up.
    • jbjbjbjb 1 hour ago
      This command needs a warning. Using this command and drawing too many conclusions from it, especially if you’re new, will make you look stupid in front of your team mates.

      I ran this on the repo I have open and after I filtered out the non code files it really can only tell me which features we worked on in the last year. It says more about how we decided to split up the features into increments than anything to do with bugs and “churn”.

      • Pay08 1 hour ago
        Good thing that the article contains that warning, then.
        • jbjbjbjb 1 hour ago
          Not really strong enough in a post about what to do in a codebase you’re not familiar with. In that situation you’re probably new to the team and organisation and likely to get off on the wrong foot with people if you assume their code “hurts”.
      • Eldt 24 minutes ago
        Better for people to know they're just blindly copying tools and parroting their output as if it's automatically meaningfully. Any warning against that should be built into the individual, for their own sake
    • mememememememo 3 hours ago
      Yes. Because the fear is butressed with necessity. You have to edit the file, and so does everyone else and that is a recipe for a lot of mess. I can think back over years of files like this. Usually kilolines of impossible to reason about doeverything.
    • jollyllama 35 minutes ago
      Yeah, the truth is going to be a lot more subtle than this.
    • mchaver 2 hours ago
      Definitely not in my experience. The most changed are the change logs, files with version numbers and readmes. I don't think anyone is afraid of keeping those up to date.
    • KptMarchewa 1 hour ago
      In my case, it's .github/CODEOWNERS.

      Nobody is afraid of changing it.

    • szszrk 2 hours ago
      Could be also that a frequently edited file had most opportunity to be broken. And it was edited by the most random crowd.
  • JetSetIlly 3 hours ago
    Some nice ideas but the regexes should include word boundaries. For example:

    git log -i -E --grep="\b(fix|fixed|fixes|bug|broken)\b" --name-only --format='' | sort | uniq -c | sort -nr | head -20

    I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.

    • grepsedawk 38 minutes ago
      Good catch, that's better
  • icedchai 16 minutes ago
    I wouldn't trust "commit counts." The quality and content of a "commit" can vary widely between developers. I have one guy on my team who commits only working code that has been thoroughly tested locally, another guy who commits one line changes that often don't work, only to be followed by fixes, and more fixes. His "commits" have about 1/100th of the value of the first guy.
  • whstl 1 hour ago
    > One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.

    In my experience, when the team doesn't squash, this will reflect the messiest members of the team.

    The top committer on the repository I maintain has 8x more commits than the second one. They were fired before I joined and nobody even remembers what they did. Git itself says: not much, just changing the same few files over and over.

    Of course if nobody is making a mess in their own commits, this is not an issue. But if they are, squash can be quite more truthful.

  • croemer 2 hours ago
    Rather than using an LLM to write fluffy paragraphs explaining what each command does and what it tells them, the author should have shown their output (truncated if necessary)
    • markus_zhang 1 hour ago
      I also feel this reads like an AI slop, but at least I learned 5 commands. Not too bad.
  • Cthulhu_ 54 minutes ago
    For "what changes the most", in my project it's package.json / lock (because of automatic dependency updates) and translation / localization files; I'd argue that's pretty normal and healthy.

    For the "bus factor", there's one guy and then there's me, but I stopped being a primary contributor to this project nearly two years ago, lol.

  • pscanf 50 minutes ago
    I just finished¹ building an experimental tool that tries to figure out if a repo is slopware or not just by looking at it's git history (plus some GitHub activity data).

    The takeaway from my experiment is that you can really tell a lot by how / when / what people commit, but conclusions are very hard to generalize.

    For example, I've also stumbled upon the "merge vs squash" issue, where squashes compress and mostly hide big chunks of history, so drawing conclusions from a squashed commit is basically just wild guessing.

    (The author of course has also flagged this. But I just wanted to add my voice: yeah, careful to generalize.)

    ¹ Nothing is ever finished.

  • fzaninotto 1 hour ago
    Instead of focusing on the top 20 files, you can map the entire codebase with data taken from git log using ArcheoloGit [1].

    [1]: https://github.com/marmelab/ArcheoloGit

  • bullen 1 hour ago
    Dying or stabilizing?

    Most good projects end up solving a problem permanently and if there is no salary to protect with bogus new features it is then to be considered final?

  • niedbalski 1 hour ago
    Ages ago, google released an algorithm to identify hotspots in code by using commit messages. https://github.com/niedbalski/python-bugspots
  • niedbalski 1 hour ago
    Ages ago google wrote an algorithm to detect hotspots by using commit messages, https://github.com/niedbalski/python-bugspots
  • seba_dos1 3 hours ago
    > If the team squashes every PR into a single commit, this output reflects who merged, not who wrote.

    Squash-merge workflows are stupid (you lose information without gaining anything in return as it was easily filterable at retrieval anyway) and only useful as a workaround for people not knowing how to use git, but git stores the author and committer names separately, so it doesn't matter who merged, but rather whether the squashed patchset consisted of commits with multiple authors (and even then you could store it with Co-authored-by trailers, but that's harder to use in such oneliners).

    • theshrike79 2 hours ago
      Can you explain to me (an avid squash-merger) what extra information do you gain by having commits that say "argh, let's see if this works", "crap, the CI is failing again, small fix to see if it works", "pushing before leaving for vacation" in the main git history?

      With a squash merge one PR is one commit, simple, clean and easy to roll back or cherry-pick to another branch.

      • seba_dos1 2 hours ago
        These commits reaching the reviewer are a sign of either not knowing how to use git or not respecting their time. You clean things up and split into logical chunks when you get ready to push into a shared place.
        • theshrike79 1 hour ago
          Why would the reviewer look at the commit messages instead of the code?

          1. Open PR page in whatever tool you're using

          2. Read title + description to see what's up

          3. Swap to diff and start reading through the changes

          4. Comment and/or approve

          I've never heard anyone bothering to read the previous commit messages for a second, why would they care?

          • seba_dos1 1 hour ago
            Because it's a useful abstraction. If you only look at PRs and don't ever care about commits, why are they even being sent to reviewer in the first place? Just send a diff file.

            Having atomic commits lets you actually benefit from having them. Suddenly you don't have to perform weird dances with interconnected PRs with dependencies as "PR too big" is not such a problem anymore as long as commits are digestible; you can have things property bisectable; you can preserve shared authorship; you can range-diff and have a better view on what and how changed between review passes, and so on...

            The unit of change is commit, and PRs group commits you want someone to pull. If you don't want or need any of that, you're just sending a patch file in a needlessly elaborate way.

            • Anon1096 52 minutes ago
              > If you only look at PRs and don't ever care about commits, why are they even being sent to reviewer in the first place? Just send a diff file.

              This is in fact what hg does with amending changesets and yes it works far better. Keep PRs small and atomic and you never need to worry about what happens intra-pr. If you need bigger units of work that's what stacking is for.

          • ipsento606 54 minutes ago
            >Swap to diff and start reading through the changes

            this forces the reviewer to view the entire diff at once, which can greatly increase the cognitive load vs. being able to view diffs of logical units of work

            for tiny PRs it may not matter, but for substantial PRs it can matter a lot

        • croemer 2 hours ago
          What if the shared place is the place where you run a bunch of CI? Then you push your work early to a branch to see the results, fix them etc.
          • seba_dos1 2 hours ago
            You can do whatever you want with stuff nobody else looks at. I do too.

            I meant "shared place" as an open review request or a shared branch rather than shared underlying infrastructure. Shared by people's minds.

          • mr_mitm 2 hours ago
            You can always force-push a cleaned up version of your branch when you are ready for review, or start a new one and delete the WIP one.
            • croemer 1 hour ago
              You can, but instead you can also just squash merge in one click. And avoid that people merge there dozens of fixes if you allow anything but squash merge.
            • theshrike79 1 hour ago
              I hate (and fear) force-pushing and "cleaning up" git history as much as other people dislike squash-merging =)

              It just feels wrong to force push, destroying stuff that used to be there.

              And I don't have the time or energy to bisect through my shitty PR commits and combine them into something clean looking - I can just squash instead.

              • seba_dos1 1 hour ago
                Nothing is destroyed by a force push. It just overwrites a single pointer, and even keeps its old value in reflog.

                Things that aren't referenced by anything anymore will eventually get garbage collected and actually destroyed, but you can just keep a reference somewhere to prevent that from happening if you need. Or even disable garbage collection completely.

                Looks like people's fears about git come just from not knowing what it does.

        • zaphirplane 2 hours ago
          What are examples of better ones. I don’t get the let me show the world my work and I’m not a fan of large PR
          • duskdozer 2 hours ago
            if you mean better messages, it's not really that. those junk messages should be rewritten and if the commits don't stand alone, merged together with rebase. it's the "logical chunks" the parent mentioned.

            it's hard to say fully, but unless a changeset is quite small or otherwise is basically 0% or 100%, there are usually smaller steps.

            like kind of contrived but say you have one function that uses a helper. if there's a bug in the function, and it turns out to fix that it makes a lot more sense to change the return type of the helper, you would make commit 1 to change the return type, then commit 2 fix the bug. would these be separate PRs? probably not to me but I guess it depends on your project workflow. keeping them in separate commits even if they're small lets you bisect more easily later on in case there was some unforseen or untested problem that was introduced, leading you to smaller chunks of code to check for the cause.

            • orsorna 1 hour ago
              If the code base is idempotent, I don't think showing commit history is helpful. It also makes rebases more complex than needed down the line. Thus I'd rather squash on merge.

              I've never considered how an engineer approaches a problem. As long as I can understand the fundamental change and it passes preflights/CI I don't care if it was scryed from a crystal ball.

              This does mean it is on the onus of the engineer to explain their change in natural language. In their own words of course.

              • seba_dos1 1 hour ago
                Commits don't show "how an engineer approaches a problem". Commits are the unit of change that are supposed to go into the final repository, purposefully prepared by the engineer and presented for review. The only thing you do by squashing on merge is to artificially limit the review unit to a single commit to optimize the workflow towards people who don't know how to use git. Personally I don't think it's a good thing to optimize for.
        • yokoprime 2 hours ago
          Haha, good luck working with a team with more than 2 people. A good reviewer looks at the end-state and does not care about individual commits. If im curious about a specific change i just look at the blame.
          • tasuki 2 hours ago
            > A good reviewer looks at the end-state and does not care about individual commits.

            Then I must be a bad reviewer. In a past job, I had a colleague who meticulously crafted his commits - his PRs were a joy to review because I could go commit by commit in logical chunks, rather than wading through a single 3k line diff. I tried to do the same for him and hope I succeeded.

            • theshrike79 1 hour ago
              And then someone comments on a thing, they change it and force-push another "clean" history on top and all of your work is wasted because the PR is now completely different =)
            • mgfist 1 hour ago
              Why are those not just separate PRs? Or if they really needed to be merged at once - they should still be separate PRs but on a feature branch
              • seba_dos1 49 minutes ago
                Why have PRs - groups of commits to pull - then if all you need is a single patch file?
            • KptMarchewa 1 hour ago
              Split the PR rather than force me to wade through your commit history. Use graphite or something else that allows you to stack PRs.
          • jfengel 1 hour ago
            Sometimes I have to go back and fix a bug that appeared during another branch. Having the original commits helps me bisect it.

            Not often, but given that it costs me nothing to have it all in my tree, I'd rather have it than not.

          • hhjinks 2 hours ago
            You review code not to verify the actual output of the code, but the code itself. For bugs, for maintainability. Commit hygiene is part of that.
          • seba_dos1 2 hours ago
            I have no troubles working on big FLOSS projects where reviews usually happen at the commit level :)
      • Aachen 2 hours ago
        If someone uses git commits like the save function of their editor and doesn't write messages intended for reading by anyone else, it makes sense to want to hide them

        For other cases, you lose the information about why things are this way. It's too verbose to //comment on every like with how it came to be this way but on (non-rare in total, but rare per line) occasion it's useful to see what the change was that made the line be like this, or even just who to potentially ask for help (when >1 person worked on a feature branch, which I'd say is common)

        • seba_dos1 2 hours ago
          > If someone uses git commits like the save function of their editor

          I use it like that too and yet the reviewers don't get to see these commits. Git has very powerful tools for manipulating the commit graph that many people just don't bother to learn. Imagine if I sent a patchset to the Linux Kernel Mailing List containing such "fix typo", "please work now", "wtf" patches - my shamelessness has its limits!

          • Aachen 1 hour ago
            Seems like a lot of extra effort (save, add, commit, come up with some message even if it's a prayer to work now) only to undo it again later and create a patch or alternate history out of the final version. Why bother with the intermediate commits if you're not planning for it to be part of the history?
            • seba_dos1 30 minutes ago
              Git is a version control system. It does not care about what it versions.

              When I work on something, I commit often and use the commit graph as a undo tool on steroids. I can see what I tried, I can cherry-pick or revert stuff while experimenting, I can leave promising but unfinished stuff to look at later, or I can just commit to have a simple way to send stuff to CI.

              Once I'm done working on something, it's time to take a step back, look at the big picture, see how many changes my experiments have actually yielded, separate them, describe and decide whether they go to review together or split in some way, as sometimes working on a single thing requires multiple distinct changes (one PR with multiple commits), but sometimes working in a single session yields fixes for multiple unrelated issues (several PRs). Only then it gets presented to the reviewer.

              It just happens that I can do both these distinct jobs with a single tool.

            • thi2 31 minutes ago
              Because I might want to go back to this current messy state but I don't want to commit it like this (hardcoded test strings, debug logs, cutted corners to see if something works, you name it).

              I simply commit something like "WIP: testing xy" and if its working and properly implemented i can squash/rebase/edit the commit message and force push it to my feature branch. Using a Git client like Gitkraken makes this incredibly easy, takes seconds.

              This way I can leverage version control without committing bogus states to the final PR.

            • skydhash 1 hour ago
              If the team is using a PR workflow, the PR is a working place to produce one single commit. The individual commits are just timestamped changes and comments. Think of it as the equivalent of annotated diff in mailing list conversation.
      • tasuki 2 hours ago
        You gain the extra information by having reasonable commit messages rather than the ones you mentioned. To fix CI you force push.

        Can you explain to me what an avid squash-merger puts into the commit message of the squashed commit composed of commits "argh, let's see if this works", "crap, the CI is failing again, small fix to see if it works", and "pushing before leaving for vacation" ?

        • theshrike79 1 hour ago
          The squashed commit from the PR -> main will have a clean title + description that says what was added.

          Usually pretty close to what the PR title + description are actually, just without the videos and screenshots.

          Example:

          feat(ui): Add support for tagging users

          * Users can be tagged via the user page * User tags visible in search results (configurable)

          etc..

          I don't need to spend extra time cleaning up my git commits and force-pushing on the PR branch, losing context for code reviews etc. Nor does anyone have to see my shitty angry commits when I tried to figure out why Playwright tests ran on my machine and failed in the CI for 10 commits.

      • thi2 29 minutes ago
        Why are those commits ending in the PR? Just unprofessional to work like that.
    • LinXitoW 12 minutes ago
      How does not squash merging deal with the fact that branches disappear when merging? What I mean is that the information "this commit happened in the context of this PR or this overarching goal" goes missing. When you squash, you use the one central unit of information management in Git: the commit.
    • mcpherrinm 1 hour ago
      Squash merge is the only reasonable way to use GitHub:

      If you update a PR with review feedback, you shouldn’t change existing commits because GitHub’s tools for showing you what has changed since your last review assume you are pushing new commits.

      But then you don’t want those multiple commits addressing PR feedback to merge as they’re noise.

      So sure, there’s workflows with Git that doesn’t need squashing. But they’re incompatible with GitHub, which is at least where I keep my code today.

      Is it perfect? No. But neither is git, and I live in the world I am given.

      • mgfist 47 minutes ago
        Yes, I think people who are anti squash merge are those who don't work in Github and use a patch based system or something different. If you're sending a patch for linux, yes it makes sense that you want to send one complete, well described patch. But Github's tooling is based around the squash merge. It works well and I don't know anyone in real life who has issues with it.

        And to counter some specific points:

        * In a github PR, you write the main commit msg and description once per PR, then you tack on as many commits as you want, and everyone knows they're all just pieces of work towards the main goal of the eventually squashed commit

        * Forcing a clean up every time you make a new commit is not only annoying extra work, but it also overwrites history that might be important for the review of that PR (but not important for what ends up in main branch).

        * When follow up is requested, you can just tack on new commits, and reviewers can easily see what new code was added since their last review. If you had to force overwrite your whole commit chain for the PR, this becomes very annoying and not useful to reviewers.

        * In the end, squash merge means you clean up things once, instead of potentially many times

    • arnorhs 2 hours ago
      The author is talking about the case where you have coherent commits, probably from multiple PRs/merges, that get merged into a main branch as a single commit.

      Yeah, I can imagine it being annoying that sqashing in that case wipes the author attribution, when not everybody is doing PRs against the main branch.

      However, calling all squash-merge workflows "stupid" without any nuance.. well that's "stupid" :)

      • seba_dos1 2 hours ago
        I don't think there's much nuance in the "I don't know --first-parent exists" workflow. Yes, you may sometimes squash-merge a contribution coming from someone who can't use git well when you realize that it will just be simpler for everyone to do that than to demand them to clean their stuff up, but that's pretty much the only time you actually have a good reason to do that.
        • skydhash 58 minutes ago
          Do people actually share PR as in different people contributing to the same branch?

          Also I can understand not squashing if the contribution comes from outside the organization. But in that case, I would expect a cleaned up history. But if every contribution is from members of the team, who can merge their own PR, squash merge is an easy way to get a clean history. Especially when most PR should be a single commit.

      • duskdozer 2 hours ago
        I think the point is that if you have to squash, the PR-maker was already gitting wrong. They should have "squashed" on their end to one or more smaller, logically coherent commits, and then submitted that result.
        • skydhash 1 hour ago
          It’s not “having to squash”. The intent was already for a PR to be a single commit. I could squash it on my end and merge by rebasing, but any alteration would then need to be force-pushed. So I don’t bother. I squash-merge when it’s ready and delete the branch.
    • lamasery 1 hour ago
      Squash-merge is entirely fine for small PRs. Cleaning up the commits in advance (probably to just squash them to one or two anyway) is extra work, and anything that discourages people from pushing often (to get the code off their local machine) needs to be well-justified. Just review the (smallish!) total outcome of all the commits and squash after review. A few well-placed messages on the commit, attached to relevant lines, are more helpful and less work than cleaning up the commit history of a smallish PR.

      For really large PRs, I’m more inclined to agree with you, but those should probably have their own small-PR-and-squash-merge flow that naturally cleans up their git history, anyway.

      I categorically disagree that squash-merge is “stupid” but agree there are many ways to skin this cat.

    • filcuk 2 hours ago
      Having the tree easy to filter doesn't matter if it returns hundreds of commits you have to sift through for no reason.
      • seba_dos1 2 hours ago
        Having the commit graph easy to filter means exactly that you don't have to sift through hundreds of commits for no reason. What else did you think it would mean?
  • nola-a 1 hour ago
    For more insights on Git, check out https://github.com/nolasoft/okgit
  • alkonaut 2 hours ago
    Trusting the messages to contain specific keywords seems optimistic. I don't think I used "emergency" or "hotfix" ever. "Revert" is some times automatically created by some tools (E.g. un-merging a PR).
  • gherkinnn 3 hours ago
    These are some helpful heuristics, thanks.

    This list is also one of many arguments for maintaining good Git discipline.

  • therealdeal2020 32 minutes ago
    superficial. If I have to unfuck the backend 10 times a week in our API adapter, then these commands will show me constantly changing the API adapter, although it's the backend team constantly fixing their own bugs
  • baquero 1 hour ago
  • TacticalCoder 9 minutes ago
    > The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”

    I've got my Emacs set up to display next to every file that is versioned the number of commits that file has been modified in (for the curious: using a modified all-the-icons-ivy-rich + custom elisp code + custom Bash scripts I wrote and it's trickier than it seems to do in a way that doesn't slows everything down). For example in the menu to open a file or open a recently visited file etc.: basically in every file list, in addition to its size, owner, permissions, etc. I also add the number of commits if it's a versioned file.

    I like the fix/bug/broken search in TFA to see where the bugs gather.

  • tom-blk 34 minutes ago
    Nice! Will probably adopt this, seems to give a great overview!
  • traceroute66 3 hours ago
    > The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about.

    What a weird check and assumption.

    I mean, surely most of the "20 most-changed files" will be README and docs, plus language-specific lock-files etc. ?

    So if you're not accounting for those in your git/jj syntax you're going to end up with an awful lot of false-positive noise.

    • grepsedawk 17 minutes ago
      Fair point. I skip lockfiles, changelogs, and generated code. The first application file on the list is the one that matters. Should have been explicit about that in the post.
    • theshrike79 2 hours ago
      Why would you touch the README file hundreds of times a year?

      You're right about package.json, pnpm-lock etc though, but those are easy to filter out if the project in question uses them.

      • traceroute66 2 hours ago
        > Why would you touch the README file hundreds of times a year?

        You're right, perhaps I should have said CHANGELOG etc.

        Although some projects e.g. bump version numbers in README or add extra one-liner examples ....

      • raxxorraxor 2 hours ago
        Some readme files include changelogs. But aside from that I think this can still net some useful information. I like to look at the most recently changed files in a repo as well.
    • jbjbjbjb 1 hour ago
      It’s easy enough to filter those out with grep. It still is relatively meaningless. If the team incrementally adds things then it’s just going to show what additions were made. It isn’t churn at all.
  • boxed 2 hours ago
    Just looking at how often a file changes without knowing how big the file is seems a bit silly. Surely it should be changes/line or something?
    • grepsedawk 16 minutes ago
      Sure, normalizing by size would be more precise. But this is a quick gut check to know which files to look at first, not a metric.
  • aa-jv 2 hours ago
    Great tips, added to notes.txt for future use ..

    Another one I do, is:

        $alias gss='git for-each-ref --sort=-committerdate'
    
        $gss
    
        ce652ca83817e83f6041f7e5cd177f2d023a5489 commit refs/heads/project-feature-development
        ce652ca83817e83f6041f7e5cd177f2d023a5489 commit refs/remotes/origin/project-feature-development
        1ef272ea1d3552b59c3d22478afa9819d90dfb39 commit refs/remotes/origin/feature/feature-removal-from-good-state
        c30b4c67298a5fa944d0b387119c1e5ddaf551f1 commit refs/remotes/origin/feature/feature-removal
        eda340eb2c9e75eeb650b5a8850b1879b6b1f704 commit refs/remotes/origin/HEAD
        eda340eb2c9e75eeb650b5a8850b1879b6b1f704 commit refs/remotes/origin/main
        3f874b24fd49c1011e6866c8ec0f259991a24c94 commit refs/heads/project-bugfix-emergency
        ...
    
    
    This way I can see right away which branches are 'ahead' of the pack, what 'the pack' looks like, and what is up and coming for future reference ... in fact I use the 'gss' alias to find out whats going on, regularly, i.e. "git fetch --all && gss" - doing this regularly, and even historically logging it to a file on login, helps see activity in the repo without too much digging. I just watch the hashes.
  • user20251219 1 hour ago
    thank you - these are useful
  • lpribis 1 hour ago
    I was curious what information I could glean from these for some popular repos. Caveat: I'm primarily an low-level embedded developer so I don't interface with large open source projects at the source level very often (other than occasionally the linux kernel). I chose some projects at random that I use.

    *Mainline linux*

    Most changed files: pretty much what I expected for 1 and 2... the "cutting edge" of Linux development over other OSes -- bpf and containers. The bpf verifier and AMD GPU driver might get a boost in this list due to sheer LoCs in those files (26K and 14K respectively). An intel equivalent of amdgpu_dm is #21 in the list (drivers/gpu/drm/i915/display/intel_display.c) and nvidia is nowhere to be seen (presumably due to out-of-tree modules/blobs?).

        186 kernel/bpf/verifier.c
        174 fs/namespace.c
        162 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
        161 kernel/sched/ext.c
        159 fs/f2fs/f2fs.h
    
    Bus factor: obviously none. The top 4

        10399 Christoph Hellwig -> I only know his name because of drama last year regarding rust bindings to DMA subsystem
         8481 Mauro Carvalho Chehab -> I also know his name from the classic "Mauro, shut the fuck up!" Linus rant
         8413 Takashi Iwai -> Listed as maintainer for sound subsystem, I think he manages ALSA
         8072 Al Viro -> His name is all over bunch of filesystem code
    
    Buggy files: Intel comes out on top of GPU drivers this time (twice). Along with KVM for x86(64), the main allocator, and BTRFS.

        1477 drivers/gpu/drm/i915/intel_display.c
        1406 MAINTAINERS
        1390 sound/pci/hda/patch_realtek.c
        1102 drivers/gpu/drm/i915/i915_drv.h
         943 arch/x86/kvm/x86.c
         928 mm/page_alloc.c
         871 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
         862 drivers/gpu/drm/i915/i915_reg.h
         840 fs/btrfs/inode.c
    
    *GCC*

    Most changed files: IR autovectorization code, riscv heuristics tables, and C++ template handling (pt.c is "paramaterized types").

        152 gcc/tree-vect-stmts.cc
        145 gcc/config/riscv/riscv.cc
        131 gcc/tree-vect-loop.cc
        116 gcc/cp/pt.cc
    
    Buggy files: DWARF debuginfo generation, x86 heuristics tables, RS6000(?!) heuristic tables. I had to look up RS6000, it's an IBM instruction set from the 90s lol. cp-tree.h is an interesting file, it seems be the main C(++) AST datastructures.

       1017 gcc/dwarf2out.c
        885 gcc/config/i386/i386.c
        796 gcc/cp/cp-tree.h
        740 gcc/config/rs6000/rs6000.c
        720 gcc/cp/pt.c
    
    *xfwm4* Most changed files: the list is dominated by *.po localizations. I filtered these out. Even after this, I discovered there is very little active development in the last few years. If I extend to 4 years ago, I get: 1. src/client.c - Realizing this project is too "small" to glean much from this. client.c is just the core X client management code. Makes sense. 2. src/placement.c - Other core window management code.

    This has not told me much other than where most of the functionality of this project lies.

    Bus factor: Pretty huge. Not really an issue in this case due to lack of development I guess.

        3298  Olivier Fourdan
         530  Anonymous
         319  Xfce Bot
         121  Jasper Huijsmans
    
    
    Files with bug commits: Very similar distribution to most changed files. Not enough datapoints in this one to draw any big conclusions.

    I think these massive open projects (excl xfwm) are generally pretty consistent code quality across the heavily trodden areas because of the amount of manpower available to refactor the pain points. I've yet to see an example of "god help you if you have to change that file" in e.g. linux, but I have of course seen that situation many times in large proprietary codebases.

    • grepsedawk 29 minutes ago
      Big projects tend to self-correct. These commands hit differently on private codebases with 3-10 contributors, where high-churn usually means one person patching the same thing repeatedly.
  • tracerbits 21 minutes ago
    [dead]
  • strimoza 1 hour ago
    [dead]
  • T3RMINATED 2 hours ago
    [dead]
  • T3RMINATED 2 hours ago
    [dead]
  • youre-wrong3 1 hour ago
    [flagged]