comm(1p) — Linux manual page

PROLOG | NAME | SYNOPSIS | DESCRIPTION | OPTIONS | OPERANDS | STDIN | INPUT FILES | ENVIRONMENT VARIABLES | ASYNCHRONOUS EVENTS | STDOUT | STDERR | OUTPUT FILES | EXTENDED DESCRIPTION | EXIT STATUS | CONSEQUENCES OF ERRORS | APPLICATION USAGE | EXAMPLES | RATIONALE | FUTURE DIRECTIONS | SEE ALSO | COPYRIGHT

COMM(1P)                POSIX Programmer's Manual               COMM(1P)

PROLOG         top

       This manual page is part of the POSIX Programmer's Manual.  The
       Linux implementation of this interface may differ (consult the
       corresponding Linux manual page for details of Linux behavior),
       or the interface may not be implemented on Linux.

NAME         top

       comm — select or reject lines common to two files

SYNOPSIS         top

       comm [-123] file1 file2

DESCRIPTION         top

       The comm utility shall read file1 and file2, which should be
       ordered in the current collating sequence, and produce three text
       columns as output: lines only in file1, lines only in file2, and
       lines in both files.

       If the lines in both files are not ordered according to the
       collating sequence of the current locale, the results are
       unspecified.

       If the collating sequence of the current locale does not have a
       total ordering of all characters (see the Base Definitions volume
       of POSIX.1‐2017, Section 7.3.2, LC_COLLATE) and any lines from
       the input files collate equally but are not identical, comm
       should treat them as different lines but may treat them as being
       the same. If it treats them as different, comm should expect them
       to be ordered according to a further byte-by-byte comparison
       using the collating sequence for the POSIX locale and if they are
       not ordered in this way, the output of comm can identify such
       lines as being both unique to file1 and unique to file2 instead
       of being in both files.

OPTIONS         top

       The comm utility shall conform to the Base Definitions volume of
       POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines.

       The following options shall be supported:

       -1        Suppress the output column of lines unique to file1.

       -2        Suppress the output column of lines unique to file2.

       -3        Suppress the output column of lines duplicated in file1
                 and file2.

OPERANDS         top

       The following operands shall be supported:

       file1     A pathname of the first file to be compared. If file1
                 is '-', the standard input shall be used.

       file2     A pathname of the second file to be compared. If file2
                 is '-', the standard input shall be used.

       If both file1 and file2 refer to standard input or to the same
       FIFO special, block special, or character special file, the
       results are undefined.

STDIN         top

       The standard input shall be used only if one of the file1 or
       file2 operands refers to standard input. See the INPUT FILES
       section.

INPUT FILES         top

       The input files shall be text files.

ENVIRONMENT VARIABLES         top

       The following environment variables shall affect the execution of
       comm:

       LANG      Provide a default value for the internationalization
                 variables that are unset or null. (See the Base
                 Definitions volume of POSIX.1‐2017, Section 8.2,
                 Internationalization Variables for the precedence of
                 internationalization variables used to determine the
                 values of locale categories.)

       LC_ALL    If set to a non-empty string value, override the values
                 of all the other internationalization variables.

       LC_COLLATE
                 Determine the locale for the collating sequence comm
                 expects to have been used when the input files were
                 sorted.

       LC_CTYPE  Determine the locale for the interpretation of
                 sequences of bytes of text data as characters (for
                 example, single-byte as opposed to multi-byte
                 characters in arguments and input files).

       LC_MESSAGES
                 Determine the locale that should be used to affect the
                 format and contents of diagnostic messages written to
                 standard error.

       NLSPATH   Determine the location of message catalogs for the
                 processing of LC_MESSAGES.

ASYNCHRONOUS EVENTS         top

       Default.

STDOUT         top

       The comm utility shall produce output depending on the options
       selected. If the -1, -2, and -3 options are all selected, comm
       shall write nothing to standard output.

       If the -1 option is not selected, lines contained only in file1
       shall be written using the format:

           "%s\n", <line in file1>

       If the -2 option is not selected, lines contained only in file2
       are written using the format:

           "%s%s\n", <lead>, <line in file2>

       where the string <lead> is as follows:

       <tab>     The -1 option is not selected.

       null string
                 The -1 option is selected.

       If the -3 option is not selected, lines contained in both files
       shall be written using the format:

           "%s%s\n", <lead>, <line in both>

       where the string <lead> is as follows:

       <tab><tab>
                 Neither the -1 nor the -2 option is selected.

       <tab>     Exactly one of the -1 and -2 options is selected.

       null string
                 Both the -1 and -2 options are selected.

       If the input files were ordered according to the collating
       sequence of the current locale, the lines written shall be in the
       collating sequence of the current locale. If the input files
       contained any lines that collated equally but were not identical
       and within each file those lines were ordered according to a
       further byte-by-byte comparison using the collating sequence for
       the POSIX locale, and comm treated them as different lines, then
       lines written that collate equally but are not identical should
       be ordered according to a further byte-by-byte comparison using
       the collating sequence for the POSIX locale.

STDERR         top

       The standard error shall be used only for diagnostic messages.

OUTPUT FILES         top

       None.

EXTENDED DESCRIPTION         top

       None.

EXIT STATUS         top

       The following exit values shall be returned:

        0    All input files were successfully output as specified.

       >0    An error occurred.

CONSEQUENCES OF ERRORS         top

       Default.

       The following sections are informative.

APPLICATION USAGE         top

       If the input files are not properly presorted, the output of comm
       might not be useful.

       When using comm to process pathnames, it is recommended that
       LC_ALL, or at least LC_CTYPE and LC_COLLATE, are set to POSIX or
       C in the environment, since pathnames can contain byte sequences
       that do not form valid characters in some locales, in which case
       the utility's behavior would be undefined. In the POSIX locale
       each byte is a valid single-byte character, and therefore this
       problem is avoided.

       If the collating sequence of the current locale does not have a
       total ordering of all characters, this can affect the behavior of
       comm in the following ways:

        *  If comm treats lines as being the same only if they are
           identical, some lines can be misleadingly identified as being
           both unique to file1 and unique to file2.

        *  If comm treats lines as being the same if they collate
           equally and a line from file1 collates equally with a line
           from file2 but is not identical to it, one of the lines is
           misleadingly identified as being in both files and the other
           is not written to the output at all.

       Such problems can be avoided by forcing the use of the POSIX
       locale; for example, the following identifies lines in both file1
       and file2:

           LC_ALL=POSIX sort file1 > file1.posix
           LC_ALL=POSIX sort file2 > file2.posix
           LC_ALL=POSIX comm -12 file1.posix file2.posix | sort

       The final sort re-sorts the output of comm according to the
       collating sequence of the original locale. Doing this might be
       difficult if more than one column is output and leading <blank>s
       cannot be ignored.

EXAMPLES         top

       If a file named xcu contains a sorted list of the utilities in
       this volume of POSIX.1‐2017, a file named xpg3 contains a sorted
       list of the utilities specified in the X/Open Portability Guide,
       Issue 3, and a file named svid89 contains a sorted list of the
       utilities in the System V Interface Definition Third Edition:

           comm -23 xcu xpg3 | comm -23 - svid89

       would print a list of utilities in this volume of POSIX.1‐2017
       not specified by either of the other documents:

           comm -12 xcu xpg3 | comm -12 - svid89

       would print a list of utilities specified by all three documents,
       and:

           comm -12 xpg3 svid89 | comm -23 - xcu

       would print a list of utilities specified by both XPG3 and the
       SVID, but not specified in this volume of POSIX.1‐2017.

RATIONALE         top

       None.

FUTURE DIRECTIONS         top

       A future version of this standard may require that if any lines
       from the input files collate equally but are not identical, then
       comm treats them as different lines and expects them to be
       ordered according to a further byte-by-byte comparison using the
       collating sequence for the POSIX locale.

       A future version of this standard may require that if the input
       files contained any lines that collated equally but were not
       identical and within each file those lines were ordered according
       to a further byte-by-byte comparison using the collating sequence
       for the POSIX locale, then lines written that collate equally but
       are not identical are ordered according to a further byte-by-byte
       comparison using the collating sequence for the POSIX locale.

SEE ALSO         top

       cmp(1p), diff(1p), sort(1p), uniq(1p)

       The Base Definitions volume of POSIX.1‐2017, Section 7.3.2,
       LC_COLLATE, Chapter 8, Environment Variables, Section 12.2,
       Utility Syntax Guidelines

COPYRIGHT         top

       Portions of this text are reprinted and reproduced in electronic
       form from IEEE Std 1003.1-2017, Standard for Information
       Technology -- Portable Operating System Interface (POSIX), The
       Open Group Base Specifications Issue 7, 2018 Edition, Copyright
       (C) 2018 by the Institute of Electrical and Electronics
       Engineers, Inc and The Open Group.  In the event of any
       discrepancy between this version and the original IEEE and The
       Open Group Standard, the original IEEE and The Open Group
       Standard is the referee document. The original Standard can be
       obtained online at http://www.opengroup.org/unix/online.html .

       Any typographical or formatting errors that appear in this page
       are most likely to have been introduced during the conversion of
       the source files to man page format. To report such errors, see
       https://www.kernel.org/doc/man-pages/reporting_bugs.html .

IEEE/The Open Group               2017                          COMM(1P)

Pages that refer to this page: cmp(1p)diff(1p)join(1p)sort(1p)uniq(1p)