• shlex —- Simple lexical analysis
    • shlex Objects
    • Parsing Rules
    • Improved Compatibility with Shells

    shlex —- Simple lexical analysis

    Source code:Lib/shlex.py


    The shlex class makes it easy to write lexical analyzers forsimple syntaxes resembling that of the Unix shell. This will often be usefulfor writing minilanguages, (for example, in run control files for Pythonapplications) or for parsing quoted strings.

    The shlex module defines the following functions:

    • shlex.split(s, comments=False, posix=True)
    • Split the string s using shell-like syntax. If comments is False(the default), the parsing of comments in the given string will be disabled(setting the commenters attribute of theshlex instance to the empty string). This function operatesin POSIX mode by default, but uses non-POSIX mode if the posix argument isfalse.

    注解

    Since the split() function instantiates a shlexinstance, passing None for s will read the string to split fromstandard input.

    • shlex.join(split_command)
    • Concatenate the tokens of the list split_command and return a string.This function is the inverse of split().
    1. >>> from shlex import join
    2. >>> print(join(['echo', '-n', 'Multiple words']))
    3. echo -n 'Multiple words'

    The returned value is shell-escaped to protect against injectionvulnerabilities (see quote()).

    3.8 新版功能.

    • shlex.quote(s)
    • Return a shell-escaped version of the string s. The returned value is astring that can safely be used as one token in a shell command line, forcases where you cannot use a list.

    This idiom would be unsafe:

    1. >>> filename = 'somefile; rm -rf ~'
    2. >>> command = 'ls -l {}'.format(filename)
    3. >>> print(command) # executed by a shell: boom!
    4. ls -l somefile; rm -rf ~

    quote() lets you plug the security hole:

    1. >>> from shlex import quote
    2. >>> command = 'ls -l {}'.format(quote(filename))
    3. >>> print(command)
    4. ls -l 'somefile; rm -rf ~'
    5. >>> remote_command = 'ssh home {}'.format(quote(command))
    6. >>> print(remote_command)
    7. ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"''

    The quoting is compatible with UNIX shells and with split():

    1. >>> from shlex import split
    2. >>> remote_command = split(remote_command)
    3. >>> remote_command
    4. ['ssh', 'home', "ls -l 'somefile; rm -rf ~'"]
    5. >>> command = split(remote_command[-1])
    6. >>> command
    7. ['ls', '-l', 'somefile; rm -rf ~']

    3.3 新版功能.

    The shlex module defines the following class:

    • class shlex.shlex(instream=None, infile=None, posix=False, punctuation_chars=False)
    • A shlex instance or subclass instance is a lexical analyzerobject. The initialization argument, if present, specifies where to readcharacters from. It must be a file-/stream-like object withread() and readline() methods, ora string. If no argument is given, input will be taken from sys.stdin.The second optional argument is a filename string, which sets the initialvalue of the infile attribute. If the instream_argument is omitted or equal to sys.stdin, this second argumentdefaults to "stdin". The _posix argument defines the operational mode:when posix is not true (default), the shlex instance willoperate in compatibility mode. When operating in POSIX mode,shlex will try to be as close as possible to the POSIX shellparsing rules. The punctuation_chars argument provides a way to make thebehaviour even closer to how real shells parse. This can take a number ofvalues: the default value, False, preserves the behaviour seen underPython 3.5 and earlier. If set to True, then parsing of the characters();<>|& is changed: any run of these characters (considered punctuationcharacters) is returned as a single token. If set to a non-empty string ofcharacters, those characters will be used as the punctuation characters. Anycharacters in the wordchars attribute that appear inpunctuation_chars will be removed from wordchars. SeeImproved Compatibility with Shells for more information. _punctuation_chars_can be set only upon shlex instance creation and can't bemodified later.

    在 3.6 版更改: The punctuation_chars parameter was added.

    参见

    • Module configparser
    • Parser for configuration files similar to the Windows .ini files.

    shlex Objects

    A shlex instance has the following methods:

    • shlex.get_token()
    • Return a token. If tokens have been stacked using push_token(), pop atoken off the stack. Otherwise, read one from the input stream. If readingencounters an immediate end-of-file, eof is returned (the emptystring ('') in non-POSIX mode, and None in POSIX mode).

    • shlex.pushtoken(_str)

    • Push the argument onto the token stack.

    • shlex.read_token()

    • Read a raw token. Ignore the pushback stack, and do not interpret sourcerequests. (This is not ordinarily a useful entry point, and is documented hereonly for the sake of completeness.)

    • shlex.sourcehook(filename)

    • When shlex detects a source request (see sourcebelow) this method is given the following token as argument, and expectedto return a tuple consisting of a filename and an open file-like object.

    Normally, this method first strips any quotes off the argument. If the resultis an absolute pathname, or there was no previous source request in effect, orthe previous source was a stream (such as sys.stdin), the result is leftalone. Otherwise, if the result is a relative pathname, the directory part ofthe name of the file immediately before it on the source inclusion stack isprepended (this behavior is like the way the C preprocessor handles #include"file.h").

    The result of the manipulations is treated as a filename, and returned as thefirst component of the tuple, with open() called on it to yield the secondcomponent. (Note: this is the reverse of the order of arguments in instanceinitialization!)

    This hook is exposed so that you can use it to implement directory search paths,addition of file extensions, and other namespace hacks. There is nocorresponding 'close' hook, but a shlex instance will call theclose() method of the sourced input stream when it returnsEOF.

    For more explicit control of source stacking, use the push_source() andpop_source() methods.

    • shlex.pushsource(_newstream, newfile=None)
    • Push an input source stream onto the input stack. If the filename argument isspecified it will later be available for use in error messages. This is thesame method used internally by the sourcehook() method.

    • shlex.pop_source()

    • Pop the last-pushed input source from the input stack. This is the same methodused internally when the lexer reaches EOF on a stacked input stream.

    • shlex.errorleader(_infile=None, lineno=None)

    • This method generates an error message leader in the format of a Unix C compilererror label; the format is '"%s", line %d: ', where the %s is replacedwith the name of the current source file and the %d with the current inputline number (the optional arguments can be used to override these).

    This convenience is provided to encourage shlex users to generate errormessages in the standard, parseable format understood by Emacs and other Unixtools.

    Instances of shlex subclasses have some public instancevariables which either control lexical analysis or can be used for debugging:

    • shlex.commenters
    • The string of characters that are recognized as comment beginners. Allcharacters from the comment beginner to end of line are ignored. Includes just'#' by default.

    • shlex.wordchars

    • The string of characters that will accumulate into multi-character tokens. Bydefault, includes all ASCII alphanumerics and underscore. In POSIX mode, theaccented characters in the Latin-1 set are also included. Ifpunctuation_chars is not empty, the characters ~-./*?=, which canappear in filename specifications and command line parameters, will also beincluded in this attribute, and any characters which appear inpunctuation_chars will be removed from wordchars if they are presentthere. If whitespace_split is set to True, this will have noeffect.

    • shlex.whitespace

    • Characters that will be considered whitespace and skipped. Whitespace boundstokens. By default, includes space, tab, linefeed and carriage-return.

    • shlex.escape

    • Characters that will be considered as escape. This will be only used in POSIXmode, and includes just '\' by default.

    • shlex.quotes

    • Characters that will be considered string quotes. The token accumulates untilthe same quote is encountered again (thus, different quote types protect eachother as in the shell.) By default, includes ASCII single and double quotes.

    • shlex.escapedquotes

    • Characters in quotes that will interpret escape characters defined inescape. This is only used in POSIX mode, and includes just '"' bydefault.

    • shlex.whitespace_split

    • If True, tokens will only be split in whitespaces. This is useful, forexample, for parsing command lines with shlex, gettingtokens in a similar way to shell arguments. When used in combination withpunctuation_chars, tokens will be split on whitespace in addition tothose characters.

    在 3.8 版更改: The punctuation_chars attribute was made compatible with thewhitespace_split attribute.

    • shlex.infile
    • The name of the current input file, as initially set at class instantiation timeor stacked by later source requests. It may be useful to examine this whenconstructing error messages.

    • shlex.instream

    • The input stream from which this shlex instance is readingcharacters.

    • shlex.source

    • This attribute is None by default. If you assign a string to it, thatstring will be recognized as a lexical-level inclusion request similar to thesource keyword in various shells. That is, the immediately following tokenwill be opened as a filename and input will be taken from that stream untilEOF, at which point the close() method of that stream will becalled and the input source will again become the original input stream. Sourcerequests may be stacked any number of levels deep.

    • shlex.debug

    • If this attribute is numeric and 1 or more, a shlexinstance will print verbose progress output on its behavior. If you needto use this, you can read the module source code to learn the details.

    • shlex.lineno

    • Source line number (count of newlines seen so far plus one).

    • shlex.token

    • The token buffer. It may be useful to examine this when catching exceptions.

    • shlex.eof

    • Token used to determine end of file. This will be set to the empty string(''), in non-POSIX mode, and to None in POSIX mode.

    • shlex.punctuation_chars

    • A read-only property. Characters that will be considered punctuation. Runs ofpunctuation characters will be returned as a single token. However, note that nosemantic validity checking will be performed: for example, '>>>' could bereturned as a token, even though it may not be recognised as such by shells.

    3.6 新版功能.

    Parsing Rules

    When operating in non-POSIX mode, shlex will try to obey to thefollowing rules.

    • Quote characters are not recognized within words (Do"Not"Separate isparsed as the single word Do"Not"Separate);

    • Escape characters are not recognized;

    • Enclosing characters in quotes preserve the literal value of all characterswithin the quotes;

    • Closing quotes separate words ("Do"Separate is parsed as "Do" andSeparate);

    • If whitespace_split is False, any character notdeclared to be a word character, whitespace, or a quote will be returned asa single-character token. If it is True, shlex will onlysplit words in whitespaces;

    • EOF is signaled with an empty string ('');

    • It's not possible to parse empty strings, even if quoted.

    When operating in POSIX mode, shlex will try to obey to thefollowing parsing rules.

    • Quotes are stripped out, and do not separate words ("Do"Not"Separate" isparsed as the single word DoNotSeparate);

    • Non-quoted escape characters (e.g. '\') preserve the literal value of thenext character that follows;

    • Enclosing characters in quotes which are not part ofescapedquotes (e.g. "'") preserve the literal valueof all characters within the quotes;

    • Enclosing characters in quotes which are part ofescapedquotes (e.g. '"') preserves the literal valueof all characters within the quotes, with the exception of the charactersmentioned in escape. The escape characters retain itsspecial meaning only when followed by the quote in use, or the escapecharacter itself. Otherwise the escape character will be considered anormal character.

    • EOF is signaled with a None value;

    • Quoted empty strings ('') are allowed.

    Improved Compatibility with Shells

    3.6 新版功能.

    The shlex class provides compatibility with the parsing performed bycommon Unix shells like bash, dash, and sh. To take advantage ofthis compatibility, specify the punctuation_chars argument in theconstructor. This defaults to False, which preserves pre-3.6 behaviour.However, if it is set to True, then parsing of the characters ();<>|&is changed: any run of these characters is returned as a single token. Whilethis is short of a full parser for shells (which would be out of scope for thestandard library, given the multiplicity of shells out there), it does allowyou to perform processing of command lines more easily than you couldotherwise. To illustrate, you can see the difference in the following snippet:

    1. >>> import shlex
    2. >>> text = "a && b; c && d || e; f >'abc'; (def \"ghi\")"
    3. >>> s = shlex.shlex(text, posix=True)
    4. >>> s.whitespace_split = True
    5. >>> list(s)
    6. ['a', '&&', 'b;', 'c', '&&', 'd', '||', 'e;', 'f', '>abc;', '(def', 'ghi)']
    7. >>> s = shlex.shlex(text, posix=True, punctuation_chars=True)
    8. >>> s.whitespace_split = True
    9. >>> list(s)
    10. ['a', '&&', 'b', ';', 'c', '&&', 'd', '||', 'e', ';', 'f', '>', 'abc', ';',
    11. '(', 'def', 'ghi', ')']

    Of course, tokens will be returned which are not valid for shells, and you'llneed to implement your own error checks on the returned tokens.

    Instead of passing True as the value for the punctuation_chars parameter,you can pass a string with specific characters, which will be used to determinewhich characters constitute punctuation. For example:

    1. >>> import shlex
    2. >>> s = shlex.shlex("a && b || c", punctuation_chars="|")
    3. >>> list(s)
    4. ['a', '&', '&', 'b', '||', 'c']

    注解

    When punctuation_chars is specified, the wordcharsattribute is augmented with the characters ~-./*?=. That is because thesecharacters can appear in file names (including wildcards) and command-linearguments (e.g. —color=auto). Hence:

    1. >>> import shlex
    2. >>> s = shlex.shlex('~/a && b-c --color=auto || d *.py?',
    3. ... punctuation_chars=True)
    4. >>> list(s)
    5. ['~/a', '&&', 'b-c', '--color=auto', '||', 'd', '*.py?']

    However, to match the shell as closely as possible, it is recommended toalways use posix and whitespace_split when usingpunctuation_chars, which will negatewordchars entirely.

    For best effect, punctuation_chars should be set in conjunction withposix=True. (Note that posix=False is the default forshlex.)