Igium.CommandLine

Location: work / Software / Igium.CommandLine

Status

This document describes exhaustively the behavior of fully confomant POSIX and GNU options parsers.

Igium Implementation Status

The POSIX and GNU options parsers have been fully implemented in the Igium JavaScript codebase (PrimordialBlocks) and partially in the Igium C# codebase (no help printout is implemented, and the high-level options class needs redesign; the JavaScript implementation might be used as a template for the redesign).

POSIX Conformance

Glossary

Command Line Argument Domain

Term Description
Argument List A list of strings, e.g. the value of the program's main function's arguments parameter.
Argument A single string item from the Argument List.
Option Candidate An Argument that consist of the '-' character followed by a single alpha-numerical character, e.g. "-o".
Option Group Candidate An Argument that consist of the '-' character, followed by a list of alpha-numerical characters, e.g. "-oabc".
Option String Candidate An Argument that consist of the '-' character, followed by a single alpha-numerical character and a list of arbitrary characters, e.g. "-lsoa+-b".
Operand List Start Candidate The "--" string.
Non-option Argument An Argument starting with any character other than '-', or consisting of the sole '-' character.

NOTE: The terms "Option" from the Command Line Argument Domain does not imply that the corresponding Argument must be interpreted as an Option in the Option Domain. For Ex. an Option Candidate can be parsed as a Value, if immediately following an Option with a Required Value. The naming of the terms in the Command Line Argument Domain reflects the most common use of the syntactic construct but imposes no further limitations on interpretation.

Option Domain

Term Description
Schema A list of Option names to be parsed as Assignables. For each name, indicates whether the expected value is an Optional Value or a Required Value, in the form (<name of Assignable>: char, valueRequired: boolean).
Element The atom of the domain; a common term for Options and Operands.
Option A named Element; the name consists of a single alpha-numerical character; may or may not accept, require or have a Value.
Flag An Option that does not accept a value. All Options not matched in the Schema as Assignables are treated as Flags. Flags can be parsed from Option Candidates, e.g. "-f" (f can be a flag), Option Group Candidates, e.g. "-lsf" (l, s, f can be flags), and Option String Candidates, e.g. "-lsfa+value" (l, s, f can be flags).
Assignable An Option that accepts a Value. Defined by Schema. Assignables can be parsed from Option Candidates, e.g. "-a", Option String Candidates, e.g. "-lsa", "-lsasome+value", Option Candidates combined with an immediately-following Argument, e.g. "-a some+value", and from the last character of an Option Group Candidates combined with an immediately-following Argument, e.g. "-lsa some+value".
Value The value part of an Assignable; parsed either from an Argument immediately following an Option Candidate, e.g. "-a some+value", or from an Option String Candidate, e.g. "-lsasome+value".
Optional Value An Assignable may or may not require a Value (specified by Schema). Optional Values are parsed strictly from Option String Candidates, e.g. "-asome+value", "-lsasome+value", as opposed to Required Values, which are parsed from separate Arguments as well, e.g. "-a some+value", "-lsa some+value".
Required Value A non-optional Value. See Optional Value.
Operand Can be any Argument that is not parsed as an Option or a Value. The Operand parsing is configuration- and context-specific (see Strict Mode).
Strict Mode A boolean value altering the Operand list start detection. In Strict Mode, the first Non-option Argument and all subsequent Arguments or all Arguments following the first Operand List Start Candidate are parsed as Operands, which comes first; in Non-Strict Mode, all Non-option Arguments that are not Values and all Arguments following the first Operand List Start Candidate are parsed as Operands.

NOTE: Although the POSIX Command Line Arguments Parser does handle Required Value validation, which is significant to the parsing process itself, it doesn't have any knowledge of which options are required or mutually exclusive. Option rules are a subject of subsequent validation of the output produced by the parser.

Further Notes

  • A single-character '-' Argument, as in "app -abc - zz", is parsed as an Operand.
  • An Argument List consisting of a single "--", as in "app --", produces a single Operand List Start Candidate in the Command Line Argument Domain and no Elements in the Option Domain.
  • Flag groups are allowed to end with one Assignable including its Value as a part of the same Argument (Optional Values and Required Values) or as the next Argument (Required Values only), as in "-bcavalue", "-bca value", "-bca" (in the last example a is an Assignable with an Optional Value that has been ommitted).
  • Values can start with the '-', as in "-a --", "-a--", "-a --xyz", "-a--xyz", "-a -value", "-a-value", "-lsa --", "-lsa--", "-lsa --xyz", "-lsa--xyz", "-lsa -value", "-lsa-value". When parsing an Assignable with a Required Value, the second Argument is always parsed as a Value regardless of it's subtype, be it a Non-option Argument, an Option Candidate, an Option Group Candidate, an Option String Candidate or an Operand List Start Candidate.
  • Assignables with Optional Values can be given only as a single Argument, as in "-avalue", "-bcavalue". Assignables with Required Values can be given both as a single Argument, as in "-avalue", "-bcavalue", and as two Arguments, as in "-a value", "-lsa value".
  • The definition of an alpha-numerical character depends on the implementation and the runnning environment. The getopt C implementation relies on the standard function isalnum. We recommend that the implementation in other laguages defaults to a language's built-in alpha-numeric testing method (e.g. System.Char.IsLetterOrDigit(char) for C#, which matches characters in UTF-8), and provides a method to override this behavior with a custom function.
  • The order of all Options and Operands is considered relevant. We recommend that the option parsers produce three separate collections: 1. an ordered list of all Options, 2. an ordered list of all Operands and 3. an ordered list of all Elements.
  • In Strict Mode, the first "--" Argument may (e.g. "app -o op -- op2") or may not (e.g. "app -o -- op op2") be parsed as an Operand. The first example will produce the Option 'o' and the Operands "op", "--", "op2"; the second example will produce the Option 'o' and the Operands "op", "op2".
  • The POSIX guidelines specify that "The -W (capital-W) option shall be reserved for vendor options.", which implies that the -W Option shall be treated as an Assignable. By default the parser treats "-W" as an Assignable with a Required Value and throws an exception if the Schema already contains an Assignable with the same name. Non-standard behaviors can be implemented and enforced via parser's configuration.

GNU Conformance

GNU options implement the POSIX standard, with the following extensions:

Glossary

Command Line Argument Domain

Term Description
Long Option Candidate An Argument that consist of the "--" string followed by a string of alpha-numerical characters and dashes ('-'), optionally followed by an equal sign ('=') and an arbitrary Long Option Value string, e.g. "--long-option", "--long-option=", "--long-option=value", "--long-option='option value+++'".

Option Domain

Term Description
Long Option Schema A list of Long Option names to be parsed as Long Assignables. For each name, indicates whether the expected value is an Long Option Optional Value or a Long Option Required Value, in the form (<long assignable name>: string, <value required>: boolean).
Long Option A named Element; parsed from a Long Option Candidate; may or may not accept, require or have a Long Option Value.
Long Flag A Long Option that does not accept a value. All Long Options not matched in the Long Option Schema as Long Assignables are treated as Long Flags. Long Flags can be parsed from Long Option Candidates, e.g. "--long-flag".
Long Assignable An Option that accepts a Long Option Value. Defined by Long Option Schema. Assignables can be parsed from Long Option Candidates, e.g. "--long-option", "--long-option=", "--long-option=value" and Long Option Candidates combined with an immediately-following Argument, e.g. "--long-option value".
Long Option Value The value part of a Long Assignable; parsed either from an Argument immediately following a Long Option Candidate, e.g. "--long-option value", or from the part following the first equal sign ('=') in an Long Option Candidate, e.g. "-long-option=value".
Long Option Optional Value A Long Assignable may or may not require a Long Option Value (specified by Long Option Schema). Long Option Optional Values are parsed strictly from Long Option Candidates, e.g. "--long-option=value", as opposed to Long Option Required Values, which are parsed from separate Arguments as well, e.g. "--long-option value".
Long Option Required Value A non-optional Long Option Value. See Long Option Optional Value.

Further Notes

  • Long Option Values can start with the '-', as in "--long-option=--", "--long-option --", "--long-option=-value", "--long-option -value", "--long-option=--value", "--long-option --value". When parsing a Long Assignable with a Long Option Required Value, the second Argument is always parsed as a Long Option Value regardless of it's subtype, be it a Non-option Argument, an Option Candidate, a Long Option Candidate, an Option Group Candidate, an Option String Candidate or an Operand List Start Candidate).
  • Long Assignables with Long Option Optional Values can be given only as a single Long Option Candidate, as in "--long-option=value". Long Assignables with Long Option Required Values can be given both as a single Long Option Candidate, as in "--long-option=value", and as two Arguments, as in "-long-option value".
  • A Long Option is matched as a Long Assignable defined by Long Option Schema if there is exactly one Long Assignable name in the Long Option Schema that starts with the Long Option name. Given the Schema [("long-option", ValueOptional), ("long-island", ValueOptional)], the following options will be parsed as Long Assignables: "--long-option", "--long-optio", "--long-o", "--long-l", and the following will not: "--long-", "--long", "--l", "--z".
  • While partial name matching is applicable to both Long Assignables and Long Flags, only the Long Assignable partial name matching is significant for the GNU parser, which needs to know whether an Long Option is a Long Assignable and whether its Long Option Value is required in order to parse the input correctly. A complete partial name matching that also considers Long Flags is a subject of subsequent analysis and processing.
  • The GNU arguments parser does not extend the "-W vendor-specific-options" treatment of the POSIX parser. Translating "-W long-option" to "--long-option" as per the GNU specification (GNU libc 2+) does not fall into the parser's scope and is a subject of subsequent analysis and processing.

Sources