11 comments

  • bigshik 53 minutes ago
    Nice work—this hits a real pain point with Parquet. My main use case is debugging partitioned datasets on S3 with schema drift and skew, where I care about: which files/partitions have schema mismatches, weird row-group stats (all-null, out-of-range, huge skew), and doing that via metadata only.

    Right now parqeye looks mainly single-file focused. Do you have plans for a “dataset mode” that takes a dir/S3 prefix and surfaces per-file/row-group summaries (row counts, min/max, null %, schema diffs vs a reference file) using just Parquet stats so it scales to tens of GB? Or do you see parqeye intentionally staying a single-file inspector?

  • kylebarron 1 hour ago
    Looks great!

    Another seemingly extremely similar project released in the last few days: https://github.com/raulcd/datanomy

  • papers1010 3 hours ago
    It’s crazy how long we’ve gone without a tool like this. This is huge. Thank you for finally building this!
    • 0cf8612b2e1e 2 hours ago
      It is really incredible how poor the parquet tooling has been for years. The cornerstone of data engineering, yet just inspecting a file is needlessly clunky.
  • lolive 3 hours ago
    Can DuckDB be included in the tool, so you can run queries directly from the UI? [that would avoid opening DBeaver whenever you need that kind of feature]
  • jspanos2 1 hour ago
    This is very impressive. Look forward to using this
  • banga 3 hours ago
    Looks like a nice tool, but failed for me when reading a geoparquet file created using duckdb.
  • swety101 48 minutes ago
    Such a cool idea!! So helpful
  • dionian 25 minutes ago
    tried it out. love it.
  • lolive 3 hours ago
    Apart from some visual glitches, this is an INSTANT BUY !

    Note: must the Windows binary really be 78MB ?

    • ch2026 1 hour ago
      CLIs are bulky
  • WorldPeas 5 hours ago
    thank you so much! this was an annoyance of mine for so long. edit: any chance you make a brew package? if you'd like I'd be happy to PR it in.
    • kaushiksrini 5 hours ago
      yep! it’s available as a homebrew tap — you can install it with: `brew install kaushiksrini/parqeye/parqeye`
      • dacox 32 minutes ago
        awesome! i was just looking at a bucket full of parquet files from last year trying to recall some things about them.

        i tried to install with brew, but it told me my cli tools were "too out of date". Never seen that before! and also just upgraded.

        Will try again tomorrow

      • WorldPeas 5 hours ago
        wonderous.