SQL style guide by Simon Holywell

(sqlstyle.guide)

56 points | by thunderbong 231 days ago

19 comments

  • SPBS 230 days ago
    > Spaces should be used to line up the code so that the root keywords all end on the same character boundary.

      SELECT file_hash
        FROM file_system
       WHERE file_name = '.vimrc';
    
    This style is annoying and I wish it gained less traction. It looks neat but it puts so much burden on the query writer, especially when you modify the query and all of the sudden you need to indent multiple lines just to make them all align. You know what's neat and still easy to modify/diff? Just indent a new line for each row.

        SELECT
            file_hash
        FROM
            file_system
        WHERE
            file_name = '.vimrc';
    • abraae 230 days ago
      IMO in the modern day there is no place for any indentation styling that can't be achieved automatically via a pretty printer such as golang has.
      • snorremd 230 days ago
        This. Relying on developers manually trying to follow a style guide is a recipe for not having a consistent style. Instead something like pgFormatter should be used. I'm not sure what the state of SQL formatters and IDE support is these days. Not sure how many command based options there are.

        And people who use things like Datagrip or other IDEs will probably format with their IDE's preferences unless there is a plugin for things like pgFormatter. This works well if there is a company mandated editor/IDE, but not so well when you have developers across various editors and IDEs.

      • kmoser 230 days ago
        Automatic formatters and pretty printers never seem to be able to make the exceptions necessary for me to use them. For example, I want the contents of all my HTML tags to be formatted as one long line (think of <p> tags), except when they happen to contain a SQL statement which I want to remain formatted exactly as written.
    • emmelaich 230 days ago
      Also, could uppercase go away and never come back? Please?
      • y42 230 days ago
        but why? it's a quick and easy way to distinguish commands from arguments
        • dagss 230 days ago
          Why do you not make that argument for "if" and "else" in Go/Java/...?

          Editors highlight syntax.

          • thiht 230 days ago
            Editors don’t syntax highlight SQL queries written as strings. That’s the main reason I write my queries with uppercase keywords in my Go programs
            • wcrossbow 230 days ago
              Editors can highlight SQL queries embedded as strings. Neovim can do it, and I'm pretty confident it's not going to be alone in that respect.

              edit: Not the editor I use but thought it might be helpful. Here is an extension, which I haven't tested, to do this in VSCode: https://marketplace.visualstudio.com/items?itemName=iuyoy.hi...

              • mijamo 230 days ago
                I have never seen a syntax highlighter for SQL that actually covers the real deal from Postgres dialect. Basic stuff is covered and then suddenly you use a combination that isn't covered and the colors are all wrong. This is even true for pgadmin, which is ironic. Unlike most programming languages, SQL built in syntax is huuuuuge and it is very hard to cover it all, especially as it varies with the dialect.
                • dagss 224 days ago
                  I use Jetbrains and there is at least full coverage for MSSQL in my experience, which is a huge dialect -- not only syntax highlighting but full IDE features like autocompletion and target name refactoring etc.

                  And 10 other dialects are listed..

                • emmelaich 230 days ago
                  Maybe it's time for programming languages for use something like markdown in strings for embedded sql and dsl. e.g.

                  ```sql

                  ```

                • eddd-ddde 230 days ago
                  Any jetbrains IDE with sql tools will work perfectly in my experience.
          • vlvdus 230 days ago
            That would be a nice change too. Also THEN, BEGIN, END to replace varous brackets.
    • harterrt 230 days ago
      Agreed. Fwiw, Mozilla’s style guide prohibits rivers like this.

      https://docs.telemetry.mozilla.org/concepts/sql_style

    • Izkata 230 days ago
      I find splitting out over lines like that harder to read because the table-like columns now overlap with each other and aren't aligned with the keyword they belong to.
  • bob1029 230 days ago
    I think my #1 rule for SQL these days is to abuse common table expressions as much as possible. No amount of whitespace cleanliness can compensate for a poorly organized problem. There is (in my mind) no longer an excuse for trying to join 10+ tables all at once in a single heroic attempt. Decompose the problem and let the query planner figure that shit out for you, just as you would with a compiler and code.

    With CTEs you can offload sophisticated joins and constraints in such a way that less experienced developers can follow behind more easily.

    Once you find multiple queries using the same WITH clauses, you can create more permanent views that further centralize and optimize these concerns.

    • jd3 230 days ago
      A week before being laid off last month, I solved a decade+ old open problem at our company which first occurred since Django doesn't natively support CTE's, leading to years of technical debt from the ersatz sql/query plans produced by our fragile queries.

      I ended up manually overloading get_extra_restriction on a custom ForeignKey class (we couldn't use FilteredRelation b/c we were still on django 1.11), which ensured that the JOIN ON ... clause limited the tables being joined to their correct partition/schema while being accessed through a view

      The view thing is a long story — it was a legacy PAC codebase from the '90s which used 13+ schemas in a mysql db that was then being synced to our postgres db through Amazon DMS. All of the tables on each view contain identical source_schema/CompanyID columns, hence the

          '%(remote_alias)s. "source_schema" = %(fk_alias)s. "source_schema" AND '
          '%(remote_alias)s. "CompanyID" = %(fk_alias)s. "CompanyID"
      
      etc. approach

      before/after query plan in depesz: https://imgur.com/a/HQbNSIL

      • tankenmate 230 days ago
        Good call.

        As an aside why not use postgres's mysql foreign data wrapper instead of syncing with the mysql database?

        • jd3 229 days ago
          I was not tech lead on that part of the project and did not contribute to any of its initial research/implementation, sadly; I was brought in post-hoc to help with perf issues/cleanup.

          To directly answer your question, though, without going into too much detail, there were pci compliance issues with some tables/columns in the mysql db, so if I had to guess, perhaps DMS had the capability to assuage that concern more elegantly than mysql_fdw[0], though the mysql_fwd EXCEPT parameter seems to also do the same thing, so honestly, I have no idea.

              By default, all views and tables existing in a particular database on the MySQL host are imported. Using this option, you can exclude specified foreign tables from the import.[1]
          
          [0]: https://github.com/EnterpriseDB/mysql_fdw

          [1]: https://www.enterprisedb.com/docs/mysql_data_adapter/latest/...

    • prudentpomelo 230 days ago
      CTEs would be such a blessing. I am stuck using mysql 5.6. So many queries would just get simpler.
  • agubelu 230 days ago
    > Try to only use standard SQL functions instead of vendor-specific functions for reasons of portability.

    Hard disagree here. "Let's do/not do this, in case we decide to change databases in the future" is one of the greatest lies we tell ourselves. You're just making your life harder now and in the near future, for the nebulous promise of "seamlessly replacing your database backend if needed".

    In 95% of cases, it's not needed, and you're getting all of the downsides for no benefit. And if it's needed in a late stage of your application's life, changing a bunch of SQL functions will be just one tiny problem among many bigger ones.

  • harterrt 230 days ago
    For comparison, here’s Mozilla’s SQL style guide: https://docs.telemetry.mozilla.org/concepts/sql_style
    • hnlmorg 230 days ago
      This is a much better style guide in my opinion.

      It’s still highly readable but also much much easier to write and modify.

      Though I am biased because it’s also how I used to write SQL back when PL/SQL was my day job. Albeit I fell into this design because it proved to be the easiest way to write and maintain readable code.

    • yen223 230 days ago
      Thanks for sharing this!

      It looks so much cleaner in my eyes.

      Plus it uses constant-sized indents, which means less futzing about with spaces and all that.

      Also means you can comment out the first select item, something you can't do with the article's approach.

      • harterrt 230 days ago
        Yes! I can’t see the point of enforcing fussing with indents to get a river.
    • cwbriscoe 230 days ago
      I am definitely not a fan of that style. Wastes too much vertical space without much benefit.
      • mjevans 230 days ago
        The benefit is how quickly an experienced programmer can accurately isolate portions of logic and understand / mutate them.

        It also achieves that in monochrome, which is likely to be the case when an SQL query is in a shell script's <<< HEREDOC or in a string blob in a log file or source code for another language's compiler.

      • harterrt 230 days ago
        What would you change?
    • seer 230 days ago
      I think this guide misses the point that “JOIN” is not a root keyword but a modification on “FROM”. It is more akin to logical “AND”, “OR”, etc.

      And this stacks much better once you start doing complex joins especially when you can add parentheses to change where you actually join

          FROM a JOIN b JOIN c
      
      Can be different than

          FROM a JOIN (b JOIN C)
      
      Apart from that I think I came up independently to the exact same rules when building the prettier extension for SQL a few years back.
      • steve_gh 230 days ago
        SQLis based on set theory, which is asdociative. So (a JOIN b) JOIN c = a JOIN (b JOIN c)

        Your DB's query planner should optimise given the available indices.

  • gnabgib 231 days ago
    Page title: SQL Style Guide, discussions in:

    2018 (59 points, 16 comments) https://news.ycombinator.com/item?id=17924917

    2016 (257 points, 147 comments) https://news.ycombinator.com/item?id=12671667

    2015 (16 points, 10 comments) https://news.ycombinator.com/item?id=9941150

  • hcarvalhoalves 231 days ago
    I’m probably alone in this, but I dislike naming tables in plural.

    IMO, reading “SELECT employee.first_name” makes much more sense than “SELECT staff.first_name”.

    • turbojet1321 230 days ago
      Back In The Day at uni in the early 2000s, we were taught that table names should always be singular, and that's mostly what I've seen in the real world since.

      I also think the advice around join table naming is a bit silly. Calling a join table between `cars` and `mechanics` `services` (rather than `cars_mechanics`) does not make the relationship clearer, and only works when the relationships maps to a the real world where the relationship has a commonly used name.

      The more I read of this guide, the uglier the SQL is.

    • other_herbert 230 days ago
      You can always alias to a singular … like

      join users as user on user….

      Then do as you please without the that if you are dealing with a user or leave it plural if multiple…

      And if we’re talking personal preference I really dislike caps in reserved words in sql, even before highlighting was everywhere it still just feels archaic for no good reason

    • ammojamo 230 days ago
      You are not alone at all, I also prefer singular names for the same reason. I reserve plural names for the rare cases where the single row of a table actually contains information about more than one item, which is usually when I'm doing something denormalized or non-relational e.g. CREATE TABLE user_settings ( user_id INT, settings_data JSON)
    • mnsc 230 days ago
      You left out the where.

      SELECT employee.name where role = 'developer'

      Vs

      SELECT staff.name” where role = 'developer'

      Then the plural one reads better

      • turbojet1321 230 days ago
        I don't think it does, because `role` is an attribute of an employee.

           SELECT employee.Name
           FROM employee
           WHERE employee.Role = 'developer' 
        
        
        reads much better to me than

           SELECT employees.Name
           FROM employees
           WHERE employees.Role = 'developer'
    • croes 230 days ago
      According to the guide it would be e.first_name or s.first_name.
    • hackernewds 231 days ago
      Yes, you are indeed alone in this
  • ivanb 230 days ago
    SQL is around 40 and there is still no reliable tool for SQL formatting comparable to Prettier or gofmt. At least not for Postgres. There are several formatters but they break on advanced features like stored procedures. If someone is looking for an open source project to contribute to, a tool like this would be greatly appreciated.
  • prudentpomelo 230 days ago
    Please don't make me write uppercase keywords. They make my eyes bleed and hurt my hands. Why not let the syntax highlighter do the heavy lifting for you? Dress your keywords with mauve or a nice butterscotch. Don't shout at the database. Write queries as if you are texting your best friend—all lowercase. Your friend understands and so does the database.

      select      e.first_name,
                  e.last_name,
                  s.amount
      from        employee as e
      inner join  salary as s
                  on e.id = s.employee_id;
  • pqwEfkvjs 230 days ago
    > Where possible avoid simply using id as the primary identifier for the table.

    I've found the opposite true in my limited experience, at least when doing any sort of ORM, then having id implicitly as the primary key makes life so much easier.

    • switch007 230 days ago
      Old school SQL gurus will tell you the ORM is wrong and just to override it. It breaks USING
      • eddd-ddde 230 days ago
        Exactly, USING shines when IDs are employee_id or task_id in every table.
  • cwbriscoe 230 days ago
    This is really good advice and the coding style (alignment) matches what I came to without any real guidance when I was learning SQL 20+ years ago. The only thing I slack on, is uppercasing the keywords. I hate switching case so much. But, I will fit the coding style of the codebase I am working on when it comes to that.
    • bartvk 230 days ago
      IMHO, not all good. In this style guide, the person says to "SELECT, FROM, etc. are all right aligned while the actual column names and implementation-specific details are left aligned."

      Seems like a lot of work to me. And I don't know any formatter that'd do that work.

  • Sn0wCoder 231 days ago
    Not bad advice. The one about “where possible avoid simply using id as the primary identifier for the table” stood out to me. In the past with multiple ORMs (ya, ya, we all hate them) the default was to map to a column named id. Also when doing joins its cleaner to use the table_name.id or alias.id then table_name.table_name_id or alias.table_name_id or whatever else besides id is used. The best is when multiple people have worked on the project over the years and the columns are a combo of camel, snake, camel_snake, all UPPER / lower. Must look at the table definitions or ERD every time you want to write some non-trivial query. So having a consistent style guide is better than having any one specific style guide. This would be a good starting point and adjust with your team as needed.
    • jpnc 230 days ago
      > Also when doing joins its cleaner to use the table_name.id or alias.id then table_name.table_name_id or alias.table_name_id or whatever else besides id is used However, using 'table_name.table_name_id' and then having another table with an FK that references it with the same name i.e. 'table_2.table_name_id' allows you to use a shorthand 'USING' clause instead of 'ON' in databases that support it.
      • Sn0wCoder 230 days ago
        Great point thanks for calling USING out. Since you end up putting table_name_id for FKs it totally makes sense to just use that in the main table. Seems I am just so accustomed to having id as the default PK over the years it become habit (my DB professor was an old time IBM-er who preached all tables will have an ID). With auto complete in just about every tool these days and ORM limitations improving will need to update my thinking on this reality. 95% of the time living in the MSSQL world so USING is not something that can even be used (I don’t think).
    • psadri 231 days ago
      I have found that naming ids as <thing>_id helps downstream code when trying to figure out which thing's id you are dealing with. It also helps with avoiding renaming fields when a structure contains multiple ids.

      I do agree it makes joins more verbose.

      • otteromkram 230 days ago
        This isn't a great idea. Maybe it works for you, but you can alias it, too. The main identification column of a table should just be id. Any foreign keys can have a table prefix.

        Please don't prefix the main table id with the table name.

        • psadri 230 days ago
          Most of the schemas I have worked with used “id”. I agree it’s the default. But aliasing was inconsistent and made it hard to figure out which table id is being referred to later in the code.

          I know it is a matter of discipline but any tool that encourages consistency helps. A database schema is definitely one of those tools.

          One unrelated idea is to include the entity in the id itself. I have never done this but I’d imagine it would help with things like logging / observability. It would not play nice with indices though.

        • thiht 230 days ago
          The fact the "USING" keyword exists would disagree with you.

          I also use "id" but I would say SQL was designed with the opinion that ids should be prefixed with the table name.

  • jkubicek 231 days ago
    I've stopped using aliases in the SQL I write and it's dramatically increased the clarify.

    From this style guide, the aliases section would look like this in my style guide:

        SELECT first_name
        FROM staff
        JOIN students
          ON students.mentor_id = staff.staff_num;
    • Sn0wCoder 231 days ago
      If the table names are all short one word like that, they are already basically aliases. What do you do when you end up on some legacy project (you did not make the schema) where_the_table_names_look_like_this? Seems some sort of alias might be more appropriate. Also, when you are writing longer / sub queries or using Common Table Expressions it’s impossible to not use an alias.
      • jkubicek 230 days ago
        Actually, what I should have said was, "I've stopped using alias by default for every table reference".

        I'm not opposed to aliases, it's just no longer something I do by default.

    • asdfasvea 230 days ago
      Agree. In my experience people use aliases to use aliases. They know them but don't understand them.

      Aliases are the SQL equivalent of ternary operators. Are you trying to write clean code or are you trying to express to other people you know things?

    • otteromkram 230 days ago
      How do you know which table first_name came from?

      Also aliasing can be as much for legibility as anything.

        SELECT
            ZZ.first_name
        FROM staff AS ZZ
        INNER JOIN students AS Q ON Q.mentor_id = ZZ.staff_num;
      • Izkata 230 days ago
        Without aliases makes each part understandable without having to jump around and hold extra context in your head:

          SELECT
              staff.first_name
          FROM staff
          INNER JOIN students ON student.mentor_id = staff.staff_num;
        
        Whether or not you agree on the exact number [0], humans have limited working memory and aliases that don't group up concepts use it up unnecessarily.

        [0] https://en.m.wikipedia.org/wiki/The_Magical_Number_Seven,_Pl...

  • calrain 230 days ago
    I think that API URI naming conventions have impacted SQL table names.

    It's common to have endpoints like `/customers` and `/customers/{id}` which has a tendency to move SQL table names to match the API endpoint.

    I tend to name tables in the singular form if writing applications that interact directly with SQL tables, but where standing up a API in front of a SQL database, I tend to move to the plural naming of the table.

    It helps to have the table 'customers' be associated with an API endpoint of the same name.

  • croes 230 days ago
    I‘m not a fan of upper case keywords especially when there is also syntax highlighting that gives them a unique color.

    Shifts my focus away from the rest of the query.

    • jbverschoor 230 days ago
      SQL keywords have been upper case for decades. I prefer it because it is faster to match visually.

      Just like I don’t like uppercase paragraphs because of the same reason

      • dopfk09320k 230 days ago
        typing them is such a pain though.
  • pragmatic 230 days ago
    The obsession with portabilty is a red herring.

    Are you writing C code that's portable to Java, C++ an C#?

    No that would be stupid. Same here. Any heavy loaded, heavily used db is going toneed optimization sober or later.

    The In vs Or is dumb bc ORs blow up query plans and your better off with unions instead.

  • hackernewds 231 days ago
    Great document to feed to GPT while ensuring it writes code :)
  • pragmatic 230 days ago
    Who is Simon Holywell and wtf do I care what they think about SQL formatting?
  • Dkuku 230 days ago
    What'the best formatter for sql?