Igium.Markup.FlowMarkup

Location: work / Software / Igium.Markup.FlowMarkup

The Flow Markup Language 1.0 (FML 1.0) Standard

Status

This document is an early working draft.

Some of the more advanced concepts like macro definitions and algorithms are flawed and require furthe reconsideration.

Abstract

The Flow Markup Language ( FML 1.0 ) provides a minimal syntax for on-the-fly formatting of Unicode streams to monospaced continuous Unicode media such as windows and unix terminals. The FML 1.0 standard defines the handling of all whitespace characters specified by Unicode save for the form feed character (U+000C) and the OGHAM SPACE MARK (U+1680), as well as indentation, tabulation, paragraphs and word wrap.

FML 1.0 guarantees uniform text printout on every monospaced continuous Unicode medium that supports the space character (' ') and the new line character sequence ('\n' or "\r\n", depending on the running environment), given that the input stream contains only Unicode whitespace characters save for the OGHAM SPACE MARK (U+1680) and symbols that are encoded with exactly one character, such as digits, latin and cyrillic letters.

Considerations

The Flow Markup Language is designed to meet the following goals:

  • Efficacy - the syntax allows for on-the-fly minimal-state sequential lexing, parsing and translation of input streams; no AST creation is required in the process.
  • Simpliciy - where sensible, FML 1.0 makes use of the Unicode whitespace characters as format specifiers; the tag syntax is minimal and the number of tags is limited to eleven; tag names are built-in and consist of a single character; entity names are defined as non-negative integers (in FML 1.0 the only entites are the tabstop definitions); tags start with a single dollar sign ('$') and a single-letter name, optionally followed by one to four required or optional non-negative integer arguments separated by comas (',') (e.g. "$d1,4", "$d1,4,4", "$t1", "$u"); tags can be instered anywhere in a text without any further considerations regarding preceding and following characters (e.g. "hello$t1world"); an alternative argument list syntax is available for resolving ambigous combinations of tag- and text-characters by enclosing the argument list in parenthesis (e.g. "$d(1,4),", "$d(1,4,4)12", "$t(1)12").
  • Interoperability - the syntax is based on the dollar sign ('$'), which avoids conflicts with the more traditional string formatting syntaxes like printf (C), std::format (C++), String.Format (C#, Java); the conflict with the ECMA Script 6.0 (JavaScript) template literal syntax `${}` and conflicts with other syntaxes potentially reserving the dollar sign ('$') for formatting are mitigated by gracefully handling unknown tag names as part of the text ("$u" is interpreted by FML 1.0 as a tag; "${color}" is interpreted as plain text).
  • Uniformity - FML 1.0 translates the multitude of possible Unicode whitespace input characters to a strictly limited set of whitespace characters, consisting of the space character (' ') and the new line character sequence ('\n' or "\r\n", depending on the running environment). Additionally, the interpretation of the input '\r', '\n' or "\r\n" character sequences is universal across all running environments (see below).

TODO

  • (Only in case that more complex constructs like algos and macro-definitions and references are included in FML) To make things easier, introduce a Preamble/Definitions Section, a syntactically distinct part of the document that is dedicated for definitions of tabs and macros and is excluded from the output; such approach will allow for introducing a better syntax for the definitions and will offload the $-syntax from unnecessary complexity.
  • Tags that accept non-negative integer parameters indicating size in characters are currently designed to use default values in case of an explicit value of 0. Whenever 0 has a meaning as a value, change this behavior to accept 0 as an actual value and preserve it only for ommitted values.
  • Force a word break in case of a line consisting of a single word that exceeds the viewport width; maybe allow configuration for this type of behavior (an additional parameter for the $w tag?).
  • For Unicode codepoints from the Space flow control character category that are wider than a single whitespace character ' ', print more whitespaces (' ').
  • Add support for: zero width space, zero width non-joiner?, zero width joiner? (see https://en.wikipedia.org/wiki/Whitespace_character#Unicode) - introduce a breaking point without printing a space.
  • For Integration, consider using three offsets instead of one: offset of the last word printout not followed by a line break (currently provisioned), the size of any whitespace printout following the last word printout not followed by a line break and the size of the last line break printout that is not followed either by a word printout or a whitespace printout.

Output

FML 1.0 defines chunk as one arbitrary output sequence of characters emitted by the printer in one iteration step of translation. A printout is defined as an uninterrupted part of the printed output consisting of homogenous characters.

FML 1.0 distinguishes between the following printout types:

Printout types

Printout type Allowed characters and character sequences Description
Whitespace Printout ' ' A printout consisting solely of spaces.
Line Break Printout '\n', "\r\n" A printout consisting either of new line characters '\n' or new line character sequences "\r\n", depending on the running environment.
Word Printout A printout consisting of the characters not covered in the other printout types, including spaces (' ') produced from non-break input spaces and excluding the rest of the characters listed in the Flow Control Characters table below.

Formatting

FML 1.0 provides four text formatting facilities: automatic word wrap, tabulation, explicit indentation and paragraphs.

Automatic Word Wrap

In FML 1.0, the automatic word wrap is done at a whitespace printout, immediately following a word printout that exceeds the current viewport width. Additionally, a line break might be automatically inserted on tabstop extrusion (see below).

Tabulation

FML 1.0 allows for two complementary types of tabulation: user-defined tabstops and automatic tabstops. The term next tabstop designates the item from the array of user-defined tabstops sorted by offset that is next to the currently selected user-defined tabstop. A printout is considered extruded if its last character is in the column immediately preceding the current tabstop's offset.

User-defined tabstops are created and altered as named entities via the "$dN[,O[,S[,X]]]" tag, where:

  • N is an arbitrary non-negative integer repesenting the name of the tabstop; tabstop names are used solely as identifiers and have no further significance, such as tabstop ordering, except for N == 0, which is always considered to be the first tabstop;
  • O is the tabstop offset from the line beginning in characters (optional, seed FML 1.0 Default Values below);
  • S is the minimum number of spaces between existing printout extruding over this tabstop and the text printed at this tabstop when on the same line (optional, seed FML 1.0 Default Values below), and
  • X is the maximum number of characters of text in the existing printout that are allowed to extrude over this tabstop before an automatic line break is created (for multiline text, only the text printed on the last line is considered) (optional). If this argument is ommitted, the extrusion line breaking algorithm is suppressed for this tabstop.

For N == 0, S and X are ignored because at the beginning of the line the "zero-" tabulation is always applied before any printout.

User-defined tabstops are referred to either by name or implicitly:

  • By name (e.g. "$tN", "$t") - if a tabstop with the name N is defined, the current tabstop parameters are set to that definition; if no tabstop exists with the given name N, the default tabstop values are used; if N is ommitted ("$t"), the default tabstop with name 0 is used ("$t" is an alias for "$t0").
  • Implicitly (e.g. '\t') - at tabulator character ('\t') occurrence, the next tabstop is selected; if the current tabstop is the last user-defined tabstop, the next automatic tabstop is selected.

Selecting a user-defined tabstop as current either pads the current printout with spaces up to the tabstop offset, or inserts as many spaces as defined by the S parameter of the previous tabstop, depending on the extrusion conditions.

The automatic tabstops are created as calculated entities after the last user-defined tabstop. Automatic tabstops can be referred to only implicitly (e.g. '\t'). Selecting an automatic tabstop causes the current printout to be space-padded to the right up to the closest multiple of a number of spaces as specified by the "$n" tag, counted from the begining of the line, much like the common tabulation works. If the "$n" tag has not occurred yet, the FML 1.0 default tabulator width is used.

At line break, the current tabstop is set to the default tabstop with name 0 ("$t0").

Explicit Indentation

In FML 1.0 explicit indentation is acheived through the "$i" tag. The "$i[O[,S[,X]]]" tag sets the current tabstop parameters in the same way as "$dN[,O[,S[,X]]]$tN" would but without changin the currently selected tabstop, and stays in effect until the next line break or the next occurrence of an "$i" or "$t" tag or the '\t' character, whichever comes first.

Paragraphs

Paragraphs represent an alternative way to insert new lines into the output. Unlike line breaks, which accumulate, any positive number of subsequent paragraphs is printed to the output as a fixed amount of line breaks. All characters from the Paragraph flow control character category are treated as paragraphs. The $pL tag specifies the number L of line breaks to be printed on paragraph sequence. If the argument L is 0 or ommitted, the FML 1.0 default value is assumed.

NOTE: For the default values of the optional arguments, see the FML 1.0 Defaults table below.

Input

FML 1.0 recognizes three input element types: flow control character, tag and word.

Flow Control Characters

FML 1.0 defines the following flow control character categories of input Unicode characters that don't have a visual glyph representaion and either control the print position for the next word printout or introduce a non-break whitespace to a word printout:

Category C Escape Codepoint Description
Space ' ' U+0020 ASCII/Unicode space.
Space U+2000 Unicode EN quad space.
Space U+2001 Unicode EM quad space.
Space U+2002 Unicode EN space.
Space U+2003 Unicode EM space.
Space U+2004 Unicode three-per-EM space.
Space U+2005 Unicode four-per-EM space.
Space U+2006 Unicode six-per-EM space.
Space U+2008 Unicode punctuation space.
Space U+2009 Unicode thin space.
Space U+200A Unicode hair space.
Space U+205F Unicode medium mathematical space.
Space U+3000 Unicode ideographic space.
No-Break Space U+00A0 Unicode non-break space.
No-Break Space U+202F Unicode narrow non-breake space.
No-Break Space U+2007 Unicode figure space.
Tabulator '\t' U+0009 ASCII/Unicode tabulation.
Line Break '\r' U+000D ASCII/Unicode carriage return.
Line Break '\n' U+000A ASCII/Unicode line feed.
Line Break U+2028 Unicode line separator.
Line Break U+0085 Unicode next line.
Paragraph '\v' U+000B ASCII/Unicode line tabulation.
Paragraph '\f' U+000C ASCII/Unicode form feed.
Paragraph U+2029 Unicode paragraph separator.

FML 1.0 defines a strict set of rules for processing the characters from each of the recognized flow control character categories:

Category Subcategory Processing rules
Space At the end of the line, at the end of the output and immediately before a tabstop, the characters from this category are ignored, otherwise a single input character is translated to a single space character (' ') and is appended to the current whitespace printout. The "$s" tag provides a shorthand for printing a sequence of one or more input space characters.<br /><br />The "$u" tag counts the spaces printed by this category and the corresponding tags.
No-Break Space A single character matching this category is translated to a single space character (' ') and is appended to the current word printout. The "$h" tag provides a shorthand for printing a sequence of one or more input non-break space characters.
Tabulator An implicit reference for user-defined and automatic tabstops.<br /><br />The "$u" tag counts the spaces printed by this category and the corresponding tags.
Line Break "\r\n" A single instance of the "\r\n" sequence is translated to a line break printout, which, depending on the running environment, is represented by '\n' or "\r\n".<br /><br />The "$u" tag counts the line breaks printed by this category.
Line Break A single character matching this category that is not part of a "\r\n" sequence is translated to a line break printout, which, depending on the running environment, is represented by '\n' or "\r\n".<br /><br />The "$u" tag counts the line breaks printed by this category.
Paragraph An uniterrupted sequence of characters matching this category is translated to a sequence of as many line breaks, as specified by the "$p" tag or set by the FML 1.0 default paragraph line break count if no "$p" tag has been yet encoutered.<br /><br />The "$u" tag counts the line breaks printed by this category.

NOTE: The "\r\n" sequence is recognized with priority as a line break. If a '\r' or '\n' character is encountered that has not been parsed as a part of a "\r\n" sequence, it's recognized as a separate line break.

NOTE: For default values see the FML 1.0 Defaults table below.

Tags

Here is a list of all tag names supported by FML 1.0 (the escape sequence "$$" is also included in the list for completeness):

Tag Name Syntax Description
i "$i", $iO", "$iO,X", "$iO,X,S"<br />"$i(O)", "$i(O,X)", "$i(O,X,S)" Explicit indentation.
d "$dN", "$dN,O", "$dN,O,S", "$dN,O,S,X"<br />"$d(N)", "$d(N,O)", "$d(N,O,S)", "$d(N,O,S,X)" Define a named tabstop entity. Repeated use of the "$d" tag with the same name changes the parameters associated with this name. The new parameters are set in effect the next time a tabstop is selected. Name 0 is reserved for the default tabstop and using the "$d" with it changes the tabstop parameters for the default tabstop ("$t", "$t0"), effectively defining the implicit indentation size. Ommitted arguments are assumed as their corresponding FML 1.0 default values.
t "$t", "$tN", "$t(N)" An explicit reference to a user-defined tabstop.
n "$n", "$nC", "$n(C)" Set the default tabulator width in characters to C for the processing of the Tabulator flow control character category. If C is 0 or is ommitted the FML 1.0 default value is used.
w "$w", "$wW", "$w(W)" Set viewport width in characters, where W is a positive integer; text will be wrapped at word boundary so that a line's length never exceeds the viewport width, except for the case when the line consists of a single word that is longer than the current viwport width. If W is ommitted, the FML 1.0 default viewport width is used. If W is 0, automatic text wrapping is disabled.
p "$p", "$pL", "$p(L)" Set paragraph line break count, where L is a positive integer; specify how many line breaks are printed when processing characters from the paragraph flow control character category. If L is 0 or is ommitted, the FML 1.0 default paragraph line break count is used.
s "$s", "$sC, "$s(C)" Whitespace shorthand; simulate an input sequence of C spaces (' '), where C is a non-negative integer. If C is 0 this tag has no effect.
h "$h", "$hP, "$h(C)" Hard space shorthand; simulate an input sequence of C non-break spaces, where C is a non-negative integer. If C is 0 this tag has no effect.
u "$u" Whitespace union; create a counter bucket instance for separate counting of the spaces (' ') and line breaks ('\n' or "\r\n", depending on the running environment) that were printed to the output following the last word printout (space count is reset on every line break); create another counter bucket for any spaces and line breaks following this tag, until the EOI (End Of Input), the next word printout or the next "$u" tag, whichever comes first (multiple "$u" tags within a continuous sequence of whitepsace and line break printouts create a new counter bucket instance for each "$u" tag). On EOI or word printout, merge all counter bucket instances by selecting the largest number for each count, then destroy all current counter bucket instances (any further "$u" tags will start over the counting). The resulting counts are printed, as follows: first, as many line breaks as counted are printed; next, as many spaces as counted are printed. An "$u" tag at the begining of the input or immediately after a word printout has no effect.
$ "$$" Dollar sign escape sequence; print a single dollar sign ('$').
! "$!" Parse the rest of the line as an FML 1.0 tag list. Ignore any words and whitespaces, including the new line characters at end of the line. Treat "$-" tag as end of line (terminate the tag list and switch to verbatim printout). Can be used to create line comments.
- "$-" Following this tag, print the rest ot the input to the output as a single, unmodified string (no processing of any kind, verbatim printout).

NOTE: For default values see the FML 1.0 Defaults table below.

FML 1.0 Defaults

Description Default values in FML 1.0 Explicit setter
Default viewport width W=80 Can be set explicitly with the "$w" tag, e.g. "$w60", "$w(60)".
Default paragraph line break count L=2 Can be set explicitly with the "$p" tag, e.g. "$p4", "$p(4)".
Default tabulator size (in characters) T=4 Can be set explicitly with the "$n" tag, e.g. "$n8", "$n(8)".
Default tabstop parameters O=0, S=1, X=-1 Can be set explicitly with the "$d" tag for the tabstop 0, e.g. "$d0,10,1,6", "$d(0,10,1,6)". The default value of -1 of X can be set by the user by ommitting the argument X of the $dN[,O[,S[,X]]] tag, e.g. "$d0,10,1", "$d(0,10,1)".

Words

A word is defined in FML 1.0 as any non-exhaustive character sequence that is not recognized as a tag and does not contain any non-non-break space flow control characters. Non-exhaustive means that the implementation of the FML 1.0 lexers and parsers is allowed to recognize longer sequences consisting of word characters as sequences of multiple word elements. The dollar sign ('$') produced from the dollar escape sequence ("$$") is considered a word character and can be recognized by the lexers and parsers as a single lexem/element or as a part of a word lexem/element.

Integration

As an optional input parameter, FML 1.0 printers are required to accept a starting offset within the current line and to handle the automatic word wrap and tabulation with regard of that offset. At EOI, FML 1.0 printers are required to provide to the user the offset within the line after the last printed character.

References

Scratchpad

FML 1.0 (Rev. 1) In Brief

FML 1.0 In Brief

NOTE: In the notation below, if a character category or a string is enclosed in parentheses it is not consumed but only detected (e.g. $: stands for a lexem that will be emitted with text "$:"; ($:) stands for a lexem that will be emitted with text "").

  • term sheet - a device capable of sequentially and uniformly printing non-whitespace ASCII/Unicode codepoints with single character representation and the ' ' and "\r\n" (alternatively '\n') ASCII/Unicode codepoint sequences in a 2-dimensional grid of columns and rows
  • term tabstop - a known offset from the left margin of the sheet (O)
  • term viewport - a horizontally-limited portion of the screen defined by a left offset from the left margin of the sheet (O), and a right offset from the left margin of the sheet (R) or, alternatively to R, by width in columns (W)
  • term viewport algo - a predefined <N>d algorithm of handling the following events: ViewportChanged, Extruding (occurs after viewport change if the current printout is extruding into the new viewport)
  • term FCCC - Flow Control Character Category
  • term FCCC:Backspace - '\b'
  • term EOI - end of input
  • term EOB - end of (macro) body - in macro lexer state FCCC:Line Break|FCCC:Paragraph|(EOI)|$/|($:)
  • term <T> - tag name, one of '#' '-' '$' '?' '~' 't' 'v' 'p' 'n'
  • term <P> - parameter name, one of 'N' 'O' 'R' 'W' 'L' 'C'
  • term <U> - a non-negative number
  • term <N> - any non-empty string starting with a non-digit character and consisting of alpha-numeric characters (char.IsLetterOrDigit) and any of the following characters: '.', '_', ':', '%'
  • term <TAG> - any qualifiable FML 1.0 full tag
  • term <A> - algo name - none | step
  • term tag chain - multiple FML 1.0 tags with a hierachical relationship separated by /
  • specify FML version (only parsed at input begin; default is FML 1.0) - $fml1,0 $fml(1,0) $fml(J=1,I=0)
  • switch to tag-only mode - $#...(EOB)|(EOI)|($-)|FCCC:Line Break|FCCC:Paragraph
  • switch to verbatim mode - $-...(EOB)|(EOI)
  • define a macro - $:<U>...EOB $:(<U>)...EOB $:(N=<U>)...EOB $:(<N>)...EOB $:(N=<N>)...EOB
    • define a multi-line macro - use the $:...FCCC:Line Break|FCCC:Paragraph syntax multiple times with the same <U>/<N>
  • define new tabstop - $t<U>,<U> $t(<U>,<U>) $t(N=<U>,O=<U>)
  • configure the viewport - $v[<U>[,<U>]] $v(<U>[,<U>]) $v([O=<U>],[R=<U>|W=<U>]) $v(t<U>,t<U>) ...
    • specify algo - $v/$a: $a<U>[,<U>]... $a(<U>[,<U>]) $a(<A>) $a(<A>[,<U>[,<U>]]...) ...
      • if the $a tag is present, <A> defaults to step
      • if the $a tag missing, <A> defaults to none
    • for the step algo, specify fml to append on viewport change if the extrusion is more than the corresponding number of columns, as specified in the $a tag - $v/$a[/$:...[/$:.../]]...
    • fire events: ViewportChanged, Extruding
  • configure the paragraph - $p[<U>[,<U>]] $p(<U>[,<U>]) $p([L=<U>],[O=<U>]) ...
  • configure the automatic tabstop character count - $n[<U>] $n(<U>) $n(C=<U>)
  • enforce whitespace union - FCCC:Backspace
  • move to next column - FCCC:Space
  • move to next tabstop - FCCC:Tabulator
  • move to next line - FCCC:Line Break
  • move as defined by paragraph - FCCC:Paragraph
  • print dollar - $$
  • print slash - $/, or '/' outside a tag chain
  • print space as a word character - FCCC:No-Break Space
  • print word
    • print all spaces and line breaks required to apply all accumulated move commands
    • print the accumulated word characters
  • print EOI - print all line breaks required to apply the accumulated vertical move commands
  • invoke a macro - $!<U> $!(<U>) $!(N=<U>) $!(<N>) $!(N=<N>)
    • macro invokation is allowed from within a macro body
    • recursive macro invokation causes an error

FML 1.0 Tag Syntax

Signature

$fml1,0 $fml(1,0) $fml(J=1,I=0) - parsed only if the input starts with it

Tag

$<T>[<U>[,<U>[,...]]]
$<T>[(<U>|<N>|<TAG>[,<U>|<N>|<TAG>[,...]])]                                                    - the ' ' and '\t' characters within the parenthesis are insignificant; no other whitespace is allowed
$<T>[(<U>|<N>|<TAG>[,<U>|<N>|<TAG>[,...][,<P>=<U>|<N>|<TAG>[,<P>=<U>|<N>|<TAG>[,...]]]])]    - mixed arguments syntax; until the first named argument the provided arguments are parsed in order, the rest can be specified as named arguments in arbitrary order
$<T>[(<P>=<U>|<N>[,<P>=<U>|<N>[,...]])]                                                        - the order of the arguments is insignificant

Tag chaining

$<Tag1>/$<Tag2>        - Tag2 extends Tag1
$<Tag>/$:...$/        - the macro body is treated as Tag's argument
/$:...$/$:...$/        - multiple macro-arguments

FML 1.0 Tag Summary

$fml        $fml1,0 $fml(1,0) $fml(J=1,I=0)
$#            $#
$-            $-
$$            $$
$:...$/        $:[<U>] $:(<U>) $:(N=<U>) $:(<N>) $:(N=<N>)
$!            $!<U> $!(<U>) $!(N=<U>) $!(<N>) $!(N=<N>)
$t            $t<U>,<U> $t(<U>,<U>) $t(N=<U>|<N>,O=<U>)
$v            $v[<U>[,<U>]] $v(<U>[,<U>]) $v([O=<U>],[R=<U>|W=<U>]) $v(t<U>,t<U>) ...
$v/$a         $a<U>[,<U>]... $a(<U>[,<U>]) $a(<A>) $a(<A>,<U>[,<U>]...) ...
$v/$a/$:...    $v/$a/[/$:...[/$:...]]...
$p            $p[<U>[,<U>]] $p(<U>[,<U>]) $p([L=<U>],[O=<U>]) ...
$n            $n[<U>] $n(<U>) $n(C=<U>)
 
$:(3.1)   $/
$:(3.2)
$/
$:3$#$v($t4,$t6)/$a(0,$!(3.1),4,$!(3.2))

FML 1.0 For The TAR Help Text

$fml(1,0)$# $t0,0 $t1,1 $t2,2 $t3,6 $t4,30 $t5,32 $t6,80
$:0$#$v($t0,$t6)
$:(e1)$#$v($t2,$t5)
$:(e2)$#$v($t5,$t6)
$:(s)$#$v($t1,$t6)
$:1$#$v($t2,$t3)
$:2$#$v($t3,$t4)
$:(3.1)   $/
$:(3.2)
$/
$:3$#$v($t4,$t6)/$a(0,$!(3.1),4,$!(3.2))
$!0
Usage: tar [OPTION...] [FILE]...
GNU 'tar' saves many files together into a single tape or disk archive, and can
restore individual files from the archive.
 
Examples:
$!(e1)tar -cf archive.tar foo bar$!(e2)# Create archive.tar from files foo and bar.
$!(e1)tar -tvf archive.tar$!(e2)# List all files in archive.tar verbosely.
$!(e1)tar -xf archive.tar$!(e2)# Extract all files from archive.tar.
 
$!(s)Local file name selection:
 
$!2--add-file=FILE$!(3)add given FILE to the archive (useful if its name starts with a dash)
$!1-C,$!2---directory=DIR$!(3)change to directory DIR
$!2--exclude=PATTERN$!(3)exclude files, given as a PATTERN
$!2--exclude-backups$!(3)exclude backup and lock files
$!2--exclude-caches$!(3)exclude contents of directories containing CACHEDIR.TAG, except for the tag file itself
$!2--exclude-caches-all$!(3)exclude directories containing CACHEDIR.TAG
$!2--exclude-caches-under$!(3)exclude everything under directories containing CACHEDIR.TAG
$!2--exclude-ignore=FILE$!(3)read exclude patterns for each directory from FILE, if it exists
$!2--exclude-ignore-recursive=FILE$!(3)read exclude patterns for each directory and its subdirectories from FILE, if it exists
$!2--exclude-tag=FILE$!(3)exclude contents of directories containing FILE, except for FILE itself
$!2--exclude-tag-all=FILE$!(3)exclude directories containing FILE
      --exclude-tag-under=FILE   exclude everything under directories
                             containing FILE
      --exclude-vcs          exclude version control system directories
      --exclude-vcs-ignores  read exclude patterns from the VCS ignore files
      --no-null              disable the effect of the previous --null option
      --no-recursion         avoid descending automatically in directories
      --no-unquote           do not unquote input file or member names
      --no-verbatim-files-from   -T treats file names starting with dash as
                             options (default)
      --null                 -T reads null-terminated names; implies
                             --verbatim-files-from
      --recursion            recurse into directories (default)
  -T, --files-from=FILE      get names to extract or create from FILE
      --unquote              unquote input file or member names (default)
      --verbatim-files-from  -T reads file names verbatim (no escape or option
                             handling)
  -X, --exclude-from=FILE    exclude patterns listed in FILE
 
 File name matching options (affect both exclude and include patterns):
 
      --anchored             patterns match file name start
      --ignore-case          ignore case
      --no-anchored          patterns match after any '/' (default for
                             exclusion)
      --no-ignore-case       case sensitive matching (default)
      --no-wildcards         verbatim string matching
      --no-wildcards-match-slash   wildcards do not match '/'
      --wildcards            use wildcards (default for exclusion)
      --wildcards-match-slash   wildcards match '/' (default for exclusion)
 
 Main operation mode:
 
  -A, --catenate, --concatenate   append tar files to an archive
  -c, --create               create a new archive
  -d, --diff, --compare      find differences between archive and file system
      --delete               delete from the archive (not on mag tapes!)
  -r, --append               append files to the end of an archive
  -t, --list                 list the contents of an archive
      --test-label           test the archive volume label and exit
  -u, --update               only append files newer than copy in archive
  -x, --extract, --get       extract files from an archive
 
 Operation modifiers:
 
      --check-device         check device numbers when creating incremental
                             archives (default)
  -g, --listed-incremental=FILE   handle new GNU-format incremental backup
  -G, --incremental          handle old GNU-format incremental backup
      --hole-detection=TYPE  technique to detect holes
      --ignore-failed-read   do not exit with nonzero on unreadable files
      --level=NUMBER         dump level for created listed-incremental archive
  -n, --seek                 archive is seekable
      --no-check-device      do not check device numbers when creating
                             incremental archives
      --no-seek              archive is not seekable
      --occurrence[=NUMBER]  process only the NUMBERth occurrence of each file
                             in the archive; this option is valid only in
                             conjunction with one of the subcommands --delete,
                             --diff, --extract or --list and when a list of
                             files is given either on the command line or via
                             the -T option; NUMBER defaults to 1
      --sparse-version=MAJOR[.MINOR]
                             set version of the sparse format to use (implies
                             --sparse)
  -S, --sparse               handle sparse files efficiently
 
 Overwrite control:
 
  -k, --keep-old-files       don't replace existing files when extracting,
                             treat them as errors
      --keep-directory-symlink   preserve existing symlinks to directories when
                             extracting
      --keep-newer-files     don't replace existing files that are newer than
                             their archive copies
      --no-overwrite-dir     preserve metadata of existing directories
      --one-top-level[=DIR]  create a subdirectory to avoid having loose files
                             extracted
      --overwrite            overwrite existing files when extracting
      --overwrite-dir        overwrite metadata of existing directories when
                             extracting (default)
      --recursive-unlink     empty hierarchies prior to extracting directory
      --remove-files         remove files after adding them to the archive
      --skip-old-files       don't replace existing files when extracting,
                             silently skip over them
  -U, --unlink-first         remove each file prior to extracting over it
  -W, --verify               attempt to verify the archive after writing it
 
 Select output stream:
 
      --ignore-command-error ignore exit codes of children
      --no-ignore-command-error   treat non-zero exit codes of children as
                             error
  -O, --to-stdout            extract files to standard output
      --to-command=COMMAND   pipe extracted files to another program
 
 Handling of file attributes:
 
      --atime-preserve[=METHOD]   preserve access times on dumped files, either
                             by restoring the times after reading
                             (METHOD='replace'; default) or by not setting the
                             times in the first place (METHOD='system')
      --clamp-mtime          only set time when the file is more recent than
                             what was given with --mtime
      --delay-directory-restore   delay setting modification times and
                             permissions of extracted directories until the end
                             of extraction
      --group=NAME           force NAME as group for added files
      --group-map=FILE       use FILE to map file owner GIDs and names
      --mode=CHANGES         force (symbolic) mode CHANGES for added files
      --mtime=DATE-OR-FILE   set mtime for added files from DATE-OR-FILE
  -m, --touch                don't extract file modified time
      --no-delay-directory-restore
                             cancel the effect of --delay-directory-restore
                             option
      --no-same-owner        extract files as yourself (default for ordinary
                             users)
      --no-same-permissions  apply the user's umask when extracting permissions
                             from the archive (default for ordinary users)
      --numeric-owner        always use numbers for user/group names
      --owner=NAME           force NAME as owner for added files
      --owner-map=FILE       use FILE to map file owner UIDs and names
  -p, --preserve-permissions, --same-permissions
                             extract information about file permissions
                             (default for superuser)
      --same-owner           try extracting files with the same ownership as
                             exists in the archive (default for superuser)
  -s, --preserve-order, --same-order
                             member arguments are listed in the same order as
                             the files in the archive
      --sort=ORDER           directory sorting order: none (default), name or
                             inode
 
 Handling of extended file attributes:
 
      --acls                 Enable the POSIX ACLs support
      --no-acls              Disable the POSIX ACLs support
      --no-selinux           Disable the SELinux context support
      --no-xattrs            Disable extended attributes support
      --selinux              Enable the SELinux context support
      --xattrs               Enable extended attributes support
      --xattrs-exclude=MASK  specify the exclude pattern for xattr keys
      --xattrs-include=MASK  specify the include pattern for xattr keys
 
 Device selection and switching:
 
  -f, --file=ARCHIVE         use archive file or device ARCHIVE
      --force-local          archive file is local even if it has a colon
  -F, --info-script=NAME, --new-volume-script=NAME
                             run script at end of each tape (implies -M)
  -L, --tape-length=NUMBER   change tape after writing NUMBER x 1024 bytes
  -M, --multi-volume         create/list/extract multi-volume archive
      --rmt-command=COMMAND  use given rmt COMMAND instead of rmt
      --rsh-command=COMMAND  use remote COMMAND instead of rsh
      --volno-file=FILE      use/update the volume number in FILE
 
 Device blocking:
 
  -b, --blocking-factor=BLOCKS   BLOCKS x 512 bytes per record
  -B, --read-full-records    reblock as we read (for 4.2BSD pipes)
  -i, --ignore-zeros         ignore zeroed blocks in archive (means EOF)
      --record-size=NUMBER   NUMBER of bytes per record, multiple of 512
 
 Archive format selection:
 
  -H, --format=FORMAT        create archive of the given format
 
 FORMAT is one of the following:
 
    gnu                      GNU tar 1.13.x format
    oldgnu                   GNU format as per tar <= 1.12
    pax                      POSIX 1003.1-2001 (pax) format
    posix                    same as pax
    ustar                    POSIX 1003.1-1988 (ustar) format
    v7                       old V7 tar format
 
      --old-archive, --portability
                             same as --format=v7
      --pax-option=keyword[:]=value][,keyword[[:]=value](/wiki/%3A%5D%3Dvalue%5D%5B%2Ckeyword%5B%5B%3A%5D%3Dvalue)...
                             control pax keywords
      --posix                same as --format=posix
  -V, --label=TEXT           create archive with volume name TEXT; at
                             list/extract time, use TEXT as a globbing pattern
                             for volume name
 
 Compression options:
 
  -a, --auto-compress        use archive suffix to determine the compression
                             program
  -I, --use-compress-program=PROG
                             filter through PROG (must accept -d)
  -j, --bzip2                filter the archive through bzip2
  -J, --xz                   filter the archive through xz
      --lzip                 filter the archive through lzip
      --lzma                 filter the archive through xz
      --lzop                 filter the archive through lzop
      --no-auto-compress     do not use archive suffix to determine the
                             compression program
  -z, --gzip, --gunzip, --ungzip   filter the archive through gzip
      --zstd                 filter the archive through zstd
  -Z, --compress, --uncompress   filter the archive through compress
 
 Local file selection:
 
      --backup[=CONTROL]     backup before removal, choose version CONTROL
  -h, --dereference          follow symlinks; archive and dump the files they
                             point to
      --hard-dereference     follow hard links; archive and dump the files they
                             refer to
  -K, --starting-file=MEMBER-NAME
                             begin at member MEMBER-NAME when reading the
                             archive
      --newer-mtime=DATE     compare date and time when data changed only
  -N, --newer=DATE-OR-FILE, --after-date=DATE-OR-FILE
                             only store files newer than DATE-OR-FILE
      --one-file-system      stay in local file system when creating archive
  -P, --absolute-names       don't strip leading '/'s from file names
      --suffix=STRING        backup before removal, override usual suffix ('~'
                             unless overridden by environment variable
                             SIMPLE_BACKUP_SUFFIX)
 
 File name transformations:
 
      --strip-components=NUMBER   strip NUMBER leading components from file
                             names on extraction
      --transform=EXPRESSION, --xform=EXPRESSION
                             use sed replace EXPRESSION to transform file
                             names
 
 Informative output:
 
      --checkpoint[=NUMBER]  display progress messages every NUMBERth record
                             (default 10)
      --checkpoint-action=ACTION   execute ACTION on each checkpoint
      --full-time            print file time to its full resolution
      --index-file=FILE      send verbose output to FILE
  -l, --check-links          print a message if not all links are dumped
      --no-quote-chars=STRING   disable quoting for characters from STRING
      --quote-chars=STRING   additionally quote characters from STRING
      --quoting-style=STYLE  set name quoting style; see below for valid STYLE
                             values
  -R, --block-number         show block number within archive with each message
 
      --show-defaults        show tar defaults
      --show-omitted-dirs    when listing or extracting, list each directory
                             that does not match search criteria
      --show-snapshot-field-ranges
                             show valid ranges for snapshot-file fields
      --show-transformed-names, --show-stored-names
                             show file or archive names after transformation
      --totals[=SIGNAL]      print total bytes after processing the archive;
                             with an argument - print total bytes when this
                             SIGNAL is delivered; Allowed signals are: SIGHUP,
                             SIGQUIT, SIGINT, SIGUSR1 and SIGUSR2; the names
                             without SIG prefix are also accepted
      --utc                  print file modification times in UTC
  -v, --verbose              verbosely list files processed
      --warning=KEYWORD      warning control
  -w, --interactive, --confirmation
                             ask for confirmation for every action
 
 Compatibility options:
 
  -o                         when creating, same as --old-archive; when
                             extracting, same as --no-same-owner
 
 Other options:
 
  -?, --help                 give this help list
      --restrict             disable use of some potentially harmful options
      --usage                give a short usage message
      --version              print program version
 
Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
 
The backup suffix is '~', unless set with --suffix or SIMPLE_BACKUP_SUFFIX.
The version control may be set with --backup or VERSION_CONTROL, values are:
 
  none, off       never make backups
  t, numbered     make numbered backups
  nil, existing   numbered if numbered backups exist, simple otherwise
  never, simple   always make simple backups
 
Valid arguments for the --quoting-style option are:
 
  literal
  shell
  shell-always
  shell-escape
  shell-escape-always
  c
  c-maybe
  escape
  locale
  clocale
 
*This* tar defaults to:
--format=gnu -f- -b20 --quoting-style=escape --rmt-command=/usr/sbin/rmt
--rsh-command=/usr/bin/rsh