NAME
    App::CSVUtils - CLI utilities related to CSV

VERSION
    This document describes version 1.022 of App::CSVUtils (from Perl
    distribution App-CSVUtils), released on 2023-03-10.

DESCRIPTION
    This distribution contains the following CLI utilities:

    1. csv-add-fields
    2. csv-avg
    3. csv-check-cell-values
    4. csv-check-field-names
    5. csv-check-field-values
    6. csv-check-rows
    7. csv-cmp
    8. csv-concat
    9. csv-convert-to-hash
    10. csv-csv
    11. csv-delete-fields
    12. csv-dump
    13. csv-each-row
    14. csv-fill-template
    15. csv-find-values
    16. csv-freqtable
    17. csv-gen
    18. csv-get-cells
    19. csv-grep
    20. csv-info
    21. csv-intrange
    22. csv-list-field-names
    23. csv-lookup-fields
    24. csv-ltrim
    25. csv-map
    26. csv-munge-field
    27. csv-munge-row
    28. csv-pick
    29. csv-pick-fields
    30. csv-pick-rows
    31. csv-quote
    32. csv-replace-newline
    33. csv-rtrim
    34. csv-select-fields
    35. csv-select-rows
    36. csv-setop
    37. csv-shuf
    38. csv-shuf-fields
    39. csv-shuf-rows
    40. csv-sort
    41. csv-sort-fields
    42. csv-sort-rows
    43. csv-sorted
    44. csv-sorted-fields
    45. csv-sorted-rows
    46. csv-split
    47. csv-sum
    48. csv-transpose
    49. csv-trim
    50. csv-uniq
    51. csv-unquote
    52. csv2ltsv
    53. csv2paras
    54. csv2td
    55. csv2tsv
    56. csv2vcf
    57. paras2csv
    58. tsv2csv

FUNCTIONS
  gen_csv_util
    Usage:

     gen_csv_util(%args) -> bool

    Generate a CSV utility.

    This routine is used to generate a CSV utility in the form of a Rinci
    function (code and metadata). You can then produce a CLI from the Rinci
    function simply using Perinci::CmdLine::Gen or, if you use Dist::Zilla,
    Dist::Zilla::Plugin::GenPericmdScript or, if on the command-line,
    gen-pericmd-script.

    Using this routine, by providing just one or a few hooks and setting
    some parameters like a couple of extra arguments, you will get a
    complete CLI with decent POD/manpage, ability to read one or multiple
    CSV's and write one or multiple CSV's, some command-line options to
    customize how the input CSV's should be parsed and how the output CSV's
    should be formatted and named. Your CLI also has tab completion, usage
    and help message, and other features.

    To create a CSV utility, you specify a "name" (e.g. "csv_dump"; must be
    a valid unqualified Perl identifier/function name) and optionally
    "summary", "description", and other metadata like "links" or even
    "add_meta_props". Then you specify one or more of "on_*" or "before_*"
    or "after_*" arguments to supply handlers (coderefs) for your CSV
    utility at various hook points.

    *THE HOOKS*

    All code for hooks should accept a single argument "r". "r" is a stash
    (hashref) of various data, the keys of which will depend on which hook
    point being called. You can also add more keys to store data or for flow
    control (see hook documentation below for more details).

    The order of the hooks, in processing chronological order:

    *   on_begin

        Called when utility begins, before reading CSV. You can use this
        hook e.g. to process arguments, set output filenames (if you allow
        custom output filenames).

    *   before_read_input

        Called before opening any input CSV file. This hook is *still*
        called even if your utility sets "reads_csv" to false.

        At this point, the "input_filenames" stash key (as well as other
        keys like "input_filename", "input_filenum", etc) has not been set.
        You can use this hook e.g. to set a custom "input_filenames".

    *   before_open_input_files

        Called before an input CSV file is about to be opened, including for
        stdin ("-"). You can use this hook e.g. to check/preprocess input
        file. Flow control is available by setting "$r->{wants_skip_files}"
        to skip reading all the input file and go directly to the
        "after_read_input" hook.

    *   before_open_input_file

        Called before an input CSV file is about to be opened, including for
        stdin ("-"). For the first file, called after
        "before_open_input_file" hook. You can use this hook e.g. to
        check/preprocess input file. Flow control is available by setting
        "$r->{wants_skip_file}" to skip reading a single input file and go
        to the next file, or "$r->{wants_skip_files}" to skip reading the
        rest of the files and go directly to the "after_read_input" hook.

    *   on_input_header_row

        Called when receiving header row. Will be called for every input
        file, and called even when user specify "--no-input-header", in
        which case the header row will be the generated "["field1",
        "field2", ...]". You can use this hook e.g. to add/remove/rearrange
        fields.

        You can set "$r->{wants_fill_rows}" to a defined false if you do not
        want "$r->{input_rows}" to be filled with empty string elements when
        it contains less than the number of fields (in case of sparse values
        at the end). Normally you only want to do this when you want to do
        checking, e.g. in csv-check-rows.

    *   on_input_data_row

        Called when receiving each data row. You can use this hook e.g. to
        modify the row or print output (for line-by-line transformation or
        filtering).

    *   after_close_input_file

        Called after each input file is closed, including for stdin ("-")
        (although for stdin, the handle is not actually closed). Flow
        control is possible by setting "$r->{wants_skip_files}" to skip
        reading the rest of the files and go straight to the
        "after_close_input_files" hook.

    *   after_close_input_files

        Called after the last input file is closed, after the last
        "after_close_input_file" hook, including for stdin ("-") (although
        for stdin, the handle is not actually closed).

    *   after_read_input

        Called after the last row of the last CSV file is read and the last
        file is closed. This hook is *still* called, if you set "reads_csv"
        option to false. At this point the stash keys related to CSV reading
        have all been cleared, including "input_filenames",
        "input_filename", "input_fh", etc.

        You can use this hook e.g. to print output if you buffer the output.

    *   on_end

        Called when utility is about to exit. You can use this hook e.g. to
        return the final result.

    *THE STASH*

    The common keys that "r" will contain:

    *   "gen_args", hash. The arguments used to generate the CSV utility.

    *   "util_args", hash. The arguments that your CSV utility accepts.
        Parsed from command-line arguments (or configuration files, or
        environment variables).

    *   "name", str. The name of the CSV utility. Which can also be
        retrieved via "gen_args".

    *   "code_print", coderef. Routine provided for you to print something.
        Accepts a string. Takes care of opening the output files for you.

    *   "code_print_row", coderef. Routine provided for you to print a data
        row. You pass the row (either arrayref or hashref). Takes care of
        opening the output files for you, as well as printing header row the
        first time, if needed.

    *   "code_print_header_row", coderef. Routine provided for you to print
        header row. You don't need to pass any arguments. Will only print
        the header row once per output file if output header is enabled,
        even if called multiple times.

    If you are accepting CSV data ("reads_csv" gen argument set to true),
    the following keys will also be available (in "on_input_header_row" and
    "on_input_data_row" hooks):

    *   "input_parser", a Text::CSV_XS instance for input parsing.

    *   "input_filenames", array of str.

    *   "input_filename", str. The name of the current input file being read
        ("-" if reading from stdin).

    *   "input_filenum", uint. The number of the current input file, 1 being
        the first file, 2 for the second, and so on.

    *   "input_fh", the handle to the current file being read.

    *   "input_rownum", uint. The number of rows that have been read (reset
        after each input file). In "on_input_header_row" phase, this will be
        1 since header row (including the generated one) is the first row.
        Then in "on_input_data_row" phase (called the first time for a
        file), it will be 2 for the first data row, even if physically it is
        the first row for CSV file that does not have a header.

    *   "input_data_rownum", uint. The number of data rows that have been
        read (reset after each input file). This will be equal to
        "input_rownum" less 1 if input file has header.

    *   "input_row", aos (array of str). The current input CSV row as an
        arrayref.

    *   "input_row_as_hashref", hos (hash of str). The current input CSV row
        as a hashref, with field names as hash keys and field values as hash
        values. This will only be calculated if utility wants it. Utility
        can express so by setting "$r->{wants_input_row_as_hashref}" to
        true, e.g. in the "on_begin" hook.

    *   "input_header_row_count", uint. Contains the number of actual header
        rows that have been read. If CLI user specifies "--no-input-header",
        this will stay at zero. Will be reset for each CSV file.

    *   "input_data_row_count", int. Contains the number of actual data rows
        that have read. Will be reset for each CSV file.

    If you are outputting CSV ("writes_csv" gen argument set to true), the
    following keys will be available:

    *   "output_emitter", a Text::CSV_XS instance for output.

    *   "output_filenames", array of str.

    *   "output_filename", str, name of current output file.

    *   "output_filenum", uint, the number of the current output file, 1
        being the first file, 2 for the second, and so on.

    *   "output_fh", handle to the current output file.

    *   "output_rownum", uint. The number of rows that have been outputted
        (reset after each output file).

    *   "output_data_rownum", uint. The number of data rows that have been
        outputted (reset after each output file). This will be equal to
        "input_rownum" less 1 if input file has header.

    For other hook-specific keys, see the documentation for associated hook
    point.

    *ACCEPTING ADDITIONAL COMMAND-LINE OPTIONS/ARGUMENTS*

    As mentioned above, you will get additional command-line
    options/arguments in "$r->{util_args}" hashref. Some options/arguments
    are already added by "gen_csv_util", e.g. "input_filename" or
    "input_filenames" along with "input_sep_char", etc (when your utility
    declares "reads_csv"), "output_filename" or "output_filenames" along
    with "overwrite", "output_sep_char", etc (when your utility declares
    "writes_csv").

    If you want to accept additional arguments/options, you specify them in
    "add_args" (hashref, with key being Each option/argument has to be
    specified first via "add_args" (as hashref, with key being argument name
    and value the argument specification as defined in Rinci::function)).
    Some argument specifications have been defined in App::CSVUtils and can
    be used. See existing utilities for examples.

    *READING CSV DATA*

    To read CSV data, normally your utility would provide handler for the
    "on_input_data_row" hook and sometimes additionally
    "on_input_header_row".

    *OUTPUTTING STRING OR RETURNING RESULT*

    To output string, usually you call the provided routine
    "$r->{code_print}". This routine will open the output files for you.

    You can also return enveloped result directly by setting "$r->{result}".

    *OUTPUTTING CSV DATA*

    To output CSV data, usually you call the provided routine
    "$r->{code_print_row}". This routine accepts a row (arrayref or
    hashref). This routine will open the output files for you when needed,
    as well as print header row automatically.

    You can also buffer rows from input to e.g. "$r->{output_rows}", then
    call "$r->{code_print_row}" repeatedly in the "after_read_input" hook to
    print all the rows.

    *READING MULTIPLE CSV FILES*

    To read multiple CSV files, you first specify "reads_multiple_csv".
    Then, you can supply handler for "on_input_header_row" and
    "on_input_data_row" as usual. If you want to do something before/after
    each input file, you can also supply handler for
    "before_open_input_file" or "after_close_input_file".

    *WRITING TO MULTIPLE CSV FILES*

    Similarly, to write to many CSv files, you first specify
    "writes_multiple_csv". Then, you can supply handler for
    "on_input_header_row" and "on_input_data_row" as usual. To switch to the
    next file, set "$r->{wants_switch_to_next_output_file}" to true, in
    which case the next call to "$r->{code_print_row}" will close the
    current file and open the next file.

    *CHANGING THE OUTPUT FIELDS*

    When calling "$r->{code_print_row}", you can output whatever fields you
    want. By convention, you can set "$r->{output_fields}" and
    "$r->{output_fields_idx}" to let other handlers know about the output
    fields. For example, see the implementation of csv-concat.

    This function is not exported by default, but exportable.

    Arguments ('*' denotes required arguments):

    *   add_args => *hash*

        (No description)

    *   add_args_rels => *hash*

        (No description)

    *   add_meta_props => *hash*

        Add additional Rinci function metadata properties.

    *   after_close_input_file => *code*

        (No description)

    *   after_close_input_files => *code*

        (No description)

    *   after_read_input => *code*

        (No description)

    *   before_open_input_file => *code*

        (No description)

    *   before_open_input_files => *code*

        (No description)

    *   before_read_input => *code*

        (No description)

    *   description => *str*

        (No description)

    *   examples => *array*

        (No description)

    *   links => *array[hash]*

        (No description)

    *   name* => *perl::identifier::unqualified_ascii*

        (No description)

    *   on_begin => *code*

        (No description)

    *   on_end => *code*

        (No description)

    *   on_input_data_row => *code*

        (No description)

    *   on_input_header_row => *code*

        (No description)

    *   reads_csv => *bool* (default: 1)

        Whether utility reads CSV data.

    *   reads_multiple_csv => *bool*

        Whether utility accepts CSV data.

        Setting this option to true will implicitly set the "reads_csv"
        option to true, obviously.

    *   summary => *str*

        (No description)

    *   writes_csv => *bool* (default: 1)

        Whether utility writes CSV data.

    *   writes_multiple_csv => *bool*

        Whether utility outputs CSV data.

        Setting this option to true will implicitly set the "writes_csv"
        option to true, obviously.

    Return value: (bool)

  compile_eval_code
    Usage:

     $coderef = compile_eval_code($str, $label);

    Compile string code $str to coderef in 'main' package, without "use
    strict" or "use warnings". Die on compile error.

  eval_code
    Usage:

     $res = eval_code($coderef, $r, $topic_var_value, $return_topic_var);

FAQ
  My CSV does not have a header?
    Use the "--no-header" option. Fields will be named "field1", "field2",
    and so on.

  My data is TSV, not CSV?
    Use the "--tsv" option.

  I have a big CSV and the utilities are too slow or eat too much RAM!
    These utilities are not (yet) optimized, patches welcome. If your CSV is
    very big, perhaps a C-based solution is what you need.

HOMEPAGE
    Please visit the project's homepage at
    <https://metacpan.org/release/App-CSVUtils>.

SOURCE
    Source repository is at
    <https://github.com/perlancar/perl-App-CSVUtils>.

SEE ALSO
  Similar CLI bundles for other format
    App::TSVUtils, App::LTSVUtils, App::SerializeUtils.

  Other CSV-related utilities
    xls2csv and xlsx2csv from Spreadsheet::Read

    import-csv-to-sqlite from App::SQLiteUtils

    Query CSV with SQL using fsql from App::fsql

    csvgrep from csvgrep

  Other non-Perl-based CSV utilities
   Python
    csvkit, <https://csvkit.readthedocs.io/en/latest/>

AUTHOR
    perlancar <perlancar@cpan.org>

CONTRIBUTOR
    Adam Hopkins <violapiratejunky@gmail.com>

CONTRIBUTING
    To contribute, you can send patches by email/via RT, or send pull
    requests on GitHub.

    Most of the time, you don't need to build the distribution yourself. You
    can simply modify the code, then test via:

     % prove -l

    If you want to build the distribution (e.g. to try to install it locally
    on your system), you can install Dist::Zilla,
    Dist::Zilla::PluginBundle::Author::PERLANCAR,
    Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two
    other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps
    required beyond that are considered a bug and can be reported to me.

COPYRIGHT AND LICENSE
    This software is copyright (c) 2023, 2022, 2021, 2020, 2019, 2018, 2017,
    2016 by perlancar <perlancar@cpan.org>.

    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.

BUGS
    Please report any bugs or feature requests on the bugtracker website
    <https://rt.cpan.org/Public/Dist/Display.html?Name=App-CSVUtils>

    When submitting a bug or request, please include a test-file or a patch
    to an existing test-file that illustrates the bug or desired feature.