Executable formal semantics for the POSIX shell
Michael Greenberg, Austin J. Blatt

TL;DR
This paper introduces Smoosh, a formal, executable semantics for the POSIX shell, demonstrating high conformity and minimal bugs compared to existing shells, and providing a tool for shell analysis and development.
Contribution
We present Smoosh, the first formal, mechanized small-step semantics for the POSIX shell, enabling precise analysis and comparison of shell behaviors.
Findings
Smoosh is the most conformant shell according to POSIX standards.
Smoosh has the fewest bugs among tested shells.
A symbolic stepper was implemented to analyze shell behavior.
Abstract
The POSIX shell is a widely deployed, powerful tool for managing computer systems. The shell is the expert's control panel, a necessary tool for configuring, compiling, installing, maintaining, and deploying systems. Even though it is powerful, critical infrastructure, the POSIX shell is maligned and misunderstood. Its power and its subtlety are a dangerous combination. We define a formal, mechanized, executable small-step semantics for the POSIX shell, which we call Smoosh. We compared Smoosh against seven other shells that aim for some measure of POSIX compliance (bash, dash, zsh, OSH, mksh, ksh93, and yash). Using three test suites---the POSIX test suite, the Modernish test suite and shell diagnosis, and a test suite of our own device---we found Smoosh's semantics to be the most conformant to the POSIX standard. Modernish judges Smoosh to have the fewest bugs (just one, from using…
| Smoosh | bash∗ | dash | zsh† | mksh | ksh | yash∗ | |
| 0.1 | 4.4-12(1) | 0.5.8-2.4 | 5.3.1-4+b2 | 54-2+b4 | 93u+ | 2.43-1 | |
| POSIX test suite (418 tests) | |||||||
| Failing tests | 0 | 4 (8) | 20 | 35 | 23 | 22 (23) | |
| Time to run | 12m41s | 2m43s | 2m43s | 2m52s | 3m24s | 2m45 | |
| Modernish’s shell diagnosis (91 potential bugs, 22 potential quirks) and test suite (312 tests) | |||||||
| Bugs | 1 | 16 | 2 | 3 | 3 | 14 | 1 |
| Quirks | 0 | 4 | 2 | 5 | 2 | 3 | 8 |
| Failing tests | 1 | 20 | 3 | 3 | 3 | 17 | 1 |
| Time to run | 5.5s | 4.8s | 1.4s | 1.2s† | 3.2s | 2.2s | 2.4s |
| Smoosh’s test suite (161 tests) | |||||||
| Failing tests | 0 | 30 (35) | 42 | 52 | 34 | 41 | 43 (39) |
| Time to run | 23s | 28s | 1m13s | 42s | 21s | 28s | 29s |
| Smoosh | bash | dash | zsh | OSH | mksh | ksh | yash | |
| 0.1 | 4.4-12(1) | 0.5.8-2.4 | 5.3.1-4+b2 | 0.6.pre21 | 54-2+b4 | 93u+ | 2.43-1 | |
| nested scope (1) | ||||||||
| local (2) | special | builtin | special | reserved | builtin+ | special | ||
| readonly (3) | error | silent | error | override | override | override | ||
| initial (4) | unset | unset | unset | null | unset | unset | ||
| -p (5) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
1
Executable formal semantics for the POSIX shell
Smoosh: the Symbolic, Mechanized, Observable, Operational Shell
Michael Greenberg
Department of Computer SciencePomona CollegeClaremontCAUSA
and
Austin J. Blatt
Puppet LabsPortlandORUSA
Abstract.
The POSIX shell is a widely deployed, powerful tool for managing computer systems. The shell is the expert’s control panel, a necessary tool for configuring, compiling, installing, maintaining, and deploying systems. Even though it is powerful, critical infrastructure, the POSIX shell is maligned and misunderstood. Its power and its subtlety are a dangerous combination.
We define a formal, mechanized, executable small-step semantics for the POSIX shell, which we call Smoosh. We compared Smoosh against seven other shells that aim for some measure of POSIX compliance (bash, dash, zsh, OSH, mksh, ksh93, and yash). Using three test suites—the POSIX test suite, the Modernish test suite and shell diagnosis, and a test suite of our own device—we found Smoosh’s semantics to be the most conformant to the POSIX standard. Modernish judges Smoosh to have the fewest bugs (just one, from using dash’s parser) and no quirks. To show that our semantics is useful beyond yielding a conformant, executable shell, we also implemented a symbolic stepper to illuminate the subtle behavior of the shell.
Smoosh will serve as a foundation for formal study of the POSIX shell, supporting research on and development of new shells, new tooling for shells, and new shell designs.
command line interfaces, POSIX, formalization, small-step semantics
††journal: PACMPL††journalvolume: 1††journalnumber: CONF††article: 1††journalyear: 2018††publicationmonth: 1††copyright: none††ccs: Software and its engineering Scripting languages††ccs: Software and its engineering Command and control languages††ccs: Software and its engineering Language features††ccs: Software and its engineering Semantics††ccs: General and reference Design††ccs: Human-centered computing Command line interfaces
1. Introduction
The POSIX shell is a line-oriented, potentially interactive scripting language (Austin Group, 2018). The shell is widely used by experts on all three major platforms (Linux, OS X, and Windows); it is often the best—or only!—way to configure, compile, install, manage, deploy, or remove software systems. The shell is used for these tasks because the shell easily combines filesystem manipulations (using standard utilities like cp), system management features (like apt), and asynchronous job control (using pipelines , background jobs , and the wait builtin). Modern container systems rely on the shell: some use the shell extensively (e.g., Docker) while others use the shell as a necessary escape hatch (e.g., Vagrant, Puppet). The POSIX shell is critical software infrastructure.
Critical though it may be, the POSIX shell is not well understood.
It has unusual semantics: the general command language doesn’t
evaluate subexpressions, but rather treats them as strings and
expands the special control codes in them.
This process, called word expansion, is responsible for:
the translation of ~ into one’s home directory;
parameter, i.e., variable substitution;
command substitution (written c\texttt{}c\texttt{}$);
and globbing with * and ?.
Word expansion is simultaneously part of the shell’s power and
part of its danger (Greenberg, 2018b).
It is very easy for word expansion to generate too many or too few
arguments to a command… and commands play for keeps! One famous
example is “Steam
cleaning”,111A misunderstanding of expansion in a Steam script wiped hard drives: http://www.theregister.co.uk/2015/01/17/scary_code_of_the_week_steam_cleans_linux_pcs/.
though similar bugs abound.
The treacherous combination of powerful commands and abstruse semantics
has led to no small amount of disgust with the shell (see, for
example, the UNIX Hater’s Handbook (Garfinkel
et al., 1994)).
We aim to rehabilitate the shell. The POSIX shell is one of the oldest programming languages still in popular use; it ought to have tool support commensurate with the state of the art. In order to have effective tool support, we need to be able to soundly reason about the shell’s semantics: to summarize scripts’ behavior; to calculate scripts’ dependencies or preconditions; or to compile scripts to be faster, safer, or in another language.
To that end, we introduce Smoosh: the symbolic, mechanized, observable, operational shell. Smoosh serves as a mechanized reference semantics for the POSIX standard. Smoosh is, first and foremost, an operational shell: it can be used interactively or for scripting. Smoosh is also symbolic and observable, generating traces given in real and simulated environments.
Even though it is a completely usable shell, the mechanized small-step operational semantics for Smoosh is relatively compact: the two core stepping functions (Section 5) are a combined total of 1 034 SLOC of Lem (Mulligan et al., 2014). The small-step nature of the semantics makes it easy to capture and report information. Smoosh can, e.g., detect and report unspecified and undefined behaviors, trace how and when system calls are made, and log signal handling and traps.
Executability is critical in order to validate our semantics by testing. Smoosh passes all of the locale-independent parts of the POSIX test suite. We compared Smoosh against seven other shells that aim for some measure of POSIX compliance (bash, dash, zsh, OSH, mksh, ksh93, and yash). We used three test suites: the POSIX test suite, the Modernish test suite and shell diagnosis (Dekker, 2019), and a test suite of our own device. On all three suites, Smoosh’s semantics is the most conformant to the POSIX standard; Modernish judges Smoosh to be the least buggy or quirky. (See Section 7 for more detail.)
Smoosh isn’t just an executable shell, though: Smoosh is parameterized over the operating system state, highlighting the OS interface demanded by a POSIX shell and allowing for alternative uses of the semantics. We’ve implemented two possible choices of the OS state: a system mode where system calls actually occur, and a symbolic mode where the system calls are simulated. We use symbolic mode to implement a stepper for the shell (Section 6).
We claim the following contributions:
- •
Smoosh is a new implementation of the POSIX shell specification (Section 2). Its semantics is faithful while still being of manageable size (Section 4).
- •
We parameterize the Smoosh semantics over a suite of operating system functions (Section 5). The parameter is theoretically useful as it creates a barrier between the shell and the OS; it is practically useful in that it allows for alternative instantiations of OS primitives. As an example, we instantiate the OS with a symbolic mode to write a program stepper, which illuminates the shell’s obscure word expansion and evaluation semantics (Section 6).
- •
Smoosh is conformant to the POSIX standard and suitable as a canonical implementation. Smoosh is the most conformant and least buggy of the seven shells tested (Section 7, Table 1).
- •
The development of Smoosh led to the identification of numerous bugs in shells in active use (bash, dash, yash, OSH) and in the POSIX specification and its test suite (Section 7.1).
These contributions lay the foundation for serious tools for improving the shell. Without a semantics, what could we prove about an analysis or compiler for the shell? Smoosh’s semantics it can be a reference implementation of POSIX for desugarers (like CoLiS (Jeannerod, 2017)) and compilers to test and prove against, as well as a semantics against which to prove static analyses sound. Our tested semantics will be a baseline for work improving on the shell’s implementation and design.
2. What is the POSIX shell?
There are two perspectives on what defines the POSIX shell: one places the standard first, and the other places shell implementations first. We aim for Smoosh to accommodate both perspectives.
The first, bureaucratic perspective centers the POSIX spec ((Austin Group, 2018), Volume “Shell & Utilities”), which comprises 21 pages of introduction (§1), 98 pages of core definitions (§2 “Shell Command Language”), and additional documentation on 160 utilities (ranging in scale from awk to true).222We write out ‘Section’ when referring to sections in this paper and use § to refer to sections of the POSIX specification. The part most relevant to shell implementors and programmers is Volume “Shell & Utilities” §2 “Shell Command Language”, which weaves together explanation of the POSIX shell’s lexer, parser, and semantics. Some important information is left to the description of the sh command in §4. The spec is written in the typical style, with specific definitions of words like ‘can’, ‘may’, and that troublesome pair, ‘unspecified’ (“a value or behavior not specified by POSIX.1-2017 which results from use of a valid program construct or valid data input”) and ‘undefined’ (“…which results from use of an invalid program construct or invalid data input”). There are 70 uses of ‘unspecified’ and 17 uses of ‘undefined’ relevant to the shell.
The second, pragmatic perspective is that the POSIX shell is a collection of language implementations (e.g., bash, dash, yash) that more or less agree on a core set of features; the POSIX specification is a document that tries to track what that core set is, knowing that few implementations will conform in all respects.
We consider both of these points of view when we discuss conformance (Section 7). When we refer to shell implementations, we mean the versions in Debian 9 stable: bash 4.4-12(1), dash 0.5.6-2.4, zsh 5.3.1-4+b2, OSH 0.6.pre21, mksh 54-2+b4, ksh93 20120801-3.1, and yash 2.43-1.
There is already precedent for taking something like the POSIX spec and producing a formal model, with recent work on C being a good example (Blazy and Leroy, 2009; Ellison and Rosu, 2012; Krebbers et al., 2014; Kang et al., 2015; Memarian et al., 2016). There is similar precedent for taking a collection of implementations and producing a formal model, with being a good example (Guha et al., 2010). What is special about doing this process for the POSIX shell? The shell has two distinctive features that make its semantics more interesting:
- •
Word expansion. While typical languages evaluate an expression by evaluating its subexpressions, many shell expressions evaluate their parts by performing word expansion rather than recursive evaluation. Many familiar shell features are implemented via expansion: ~ expanding to your home directory; variables () expanding to the captured STDOUT of the inner command. (See Section 2.1.)
- •
System calls. While typical languages make system calls via library functions, the shell makes system calls in its semantics. Core operations depend on forking a new process (fork), replacing the current process with an executable (execve), waiting on processes (wait and waitpid), handling and sending signals (signal, kill), working with files (open, close, access, stat, lstat, write, read), and creating and manipulating file descriptors (pipe, fcntl, dup2).
These two features present different problems when trying to understand the shell well enough to construct a sound working model. Modeling expansion is tricky because it (a) is mutually recursive with evaluation; (b) is a subtly structured, four-stage process; and (c) is used by the evaluation semantics in several slightly different configurations. Modeling system calls is tricky because (a) there are many of them, (b) their behavior is subtle, and (c) giving faithful meaning to them can bring arbitrarily much of the operating system into the formal model.
2.1. What is word expansion?
The POSIX spec indicates that there are four stages of expansion with a total of seven components: (1) tilde expansion, parameter expansion, command substitution, and arithmetic expansion; (2) field splitting; (3) pathname expansion; and (4) quote removal. Shells perform this word expansion left-to-right in stage order (see Section 4.2 for Smoosh’s semantics). These seven components are best illustrated by brief examples (Figure 1). For a more detailed description of these phases, we refer readers to Greenberg’s arguments surrounding word expansion (2018b) and to the POSIX specification (Austin Group, 2018) §2.6. The examples in Figures 1(b) and 1(d) take place in a directory with three files (a, b, and c); the example in Figure 1(e) takes place in a different directory, where the result of the first echo indicates which files begin with the letter a.
2.2. Smoosh: a foundational, formal interpretation of the POSIX specification
Our work aims to offer (a) a formal interpretation of the specification that is also (b) a conforming implementation. Such a compromise is imperfect: we don’t model all possible behaviors in the specification, but rather interpret the specification deterministically. Our formal model is in code and not a tiny core calculus. Nevertheless, we believe our model is interesting and our implementation is useful. Analogously, the CompCert compiler doesn’t conform to MSVC, ICC, GCC, or Clang exactly, and yet its model of the C language can be seen as canonical (Blazy and Leroy, 2009; Krebbers et al., 2014). Building such a foundational model is a necessary first step towards more minimal calculi and more useful tools in a setting where implementations (currently and unyieldingly) disagree on a number of issues (see Section 8).
Our paper has three parts: a formal description, to show that while the shell is complex, it is amenable to conventional techniques (Sections 3 and 4); a description of our implementation, to give more concrete detail on our semantics (Sections 5 and 6); and a discussion of conformance, showing that we have adequately modeled the POSIX shell (Section 7) and extensions (Section 8).
3. Syntax
The shell has an idiosyncratic concrete syntax, which we mirror in our abstract syntax (Figure 2) but do not follow strictly. Since we’re giving abstract syntax, we ignore the details resolved during parsing, particularly backslash escapes of reserved symbols and single quoting. A few conventions: we use fixed-width fonts for system concepts (e.g., fd) and for abstract syntax that mirrors the shell’s concrete syntax (e.g., >|); we use fonts for Smoosh concepts and for metavariables. We occasionally use subscript “hints” to translate English names into concrete shell syntax (e.g., ). We write for optional nonterminals. We use Kleene star to represent lists and to represent non-empty lists. We use to refer to natural numbers and to refer to booleans, tagging such variables with a descriptive name. For example, is a number representing an exit status and is a boolean that indicates whether a command substitution has been performed (which helps determine the exit status; see CmdAssignDoneNoCmd in Figure LABEL:fig:eval-simple).
The two primary AST constructs are the command, , and the words, . Commands are the top-level construct: a user enters a command which has words ; the words are expanded and the command is eventually evaluated.
The base case for the command AST is the simple command. Simple commands model command invocations (i.e., builtins, functions, and executables), plain assignments, and plain redirections. All other command forms are composite. First, there are some “modifiers” of simple commands: pipelines, , which may be in the background; redirected commands, ; background commands a/k/a asynchronous commands, ; and subshells, , noting that curly braces are used for disambiguating parsing. There are sequencing commands and logic: sequence, ; short-circuiting conjunction, ; short-circuiting disjunction, ; and negation, . There are two iteration constructs—while loops, ; and for loops, —and two conditional constructs—the conventional conditional, ; and string pattern matching, , where each case branch pairs one or more patterns (as words ) with a command to run when a pattern matches. Function definition, , is also a command.
Redirections come in three fundamental forms: to a file (), to an existing file descriptor (), and from a given string a/k/a heredoc (). File redirections, , specify a source file descriptor, a file mode (e.g., to write to a file), and a file target . File descriptor redirections, , can copy and close file descriptors. Finally, heredoc redirections, , expand words and make the resulting string available for reading on a given file descriptor fd. Redirections are typically scoped—that is, they take effect only for a given simple command or composite redirection command; the exec special builtin suppresses scoping, causing the redirection to take permanent, global effect in the shell (see discussion of in Section 4.3).
A word is the primary sub-part of commands. Words are subject to expansion, showing up in several places: in the assignments and arguments of simple commands; as the targets of redirections; as the strings iterated over in for loops; and as both the scrutinees and patterns of case conditionals. A word is a possibly empty list of one of: a user string—a non-empty string from some character set, e.g., UTF-8, which we designate ; a statically-parsed field separator ␣; or, a control code, . In our abstract syntax, escaped control codes—like the literal $—are ordinary strings. In the initial stages of word expansion, only control codes are expanded. You might expect * to appear as a control code, but pathname expansion is a dynamic search on the results of expansion and is not statically parsed.
There are five control codes: tilde-and-prefix, , is used to refer to users’ home directories (noting that the tilde prefix may be empty); parameters {s\phi} take a variable name s\phis() hold commands that ought to be run with their output captured; arithmetic substitutions www$" inhibits field splitting and pathname expansion.
Parameter formats come in two flavors: they allow for convenient defaulting when variables are unset or null (i.e., hold the empty string); and they enable post-processing of the result of lookup. The format is the default and performs standard variable lookup. There are four defaulting parameter formats: , , , and . All four take words and a flag : when , then a variable set to the empty string is still considered set; when (where : is a concrete syntax hint), then a variable set to the empty string is considered unset. When is used on a variable considered unset (because it’s actually unset or because ), then the words are returned for further expansion (rather than the empty string). When is used on a variable considered unset, the words are returned for further expansion—and the result of expanding those words is assigned to . When is used on a variable considered unset, the words are returned for further expansion—and then used as an error message. The format is the opposite of : when it is used on a variable considered unset, the null string is returned; when it is used on a variable considered set, the words are expanded and returned. The two post-processing parameter formats are and . The former calculates the length of the value of the given parameter; the latter does or removal using either a or match policy on a given pattern .
Beyond simplifying the concrete syntax and escaping, several shell features are omitted from our abstract syntax: our parser desugars until loops into while loops; the tab-stripping heredoc redirection <<- is handled in the parser.
4. Semantics
POSIX specifies a broad set of behaviors for the shell; our semantics is not small. We show excerpts here of the shell’s semantics to (a) show that the shell is nevertheless amenable to standard techniques, and (b) give a sense of the level of detail of our semantics. This section is a selective transcription of Smoosh’s implementation (Section 5), meant to communicate the core ideas in a concise mathematical form without minutae (like logging or symbolic value manipulation). We do show all of our state definitions (Section 4.1, Figure 3) and all of the intermediate forms used by our small step semantics (Figure 4). We take care lay out these forms of state plainly, even if we do not give our entire semantics in mathematical notation.
We explain expansion (Section 4.2) before evaluation (Section 4.3). Rather than give an up-front description of every helper function and system call, we introduce them as needed.
4.1. State
The POSIX shell has to track a significant amount of state (Figure 3). Each line of the description of Smoosh’s shell state characterizes different levels of detail. The first line is about high-level process info: what is the root process ID (, to be used for $$, even in subshells)? Is this the outermost shell or a subshell ()? What shell options are set ()? What jobs are running ()? Smoosh tracks not only the current shell’s signal handlers a/k/a traps (), but also any traps from a supershell () in order to properly implement the trap special builtin.
The second line of the shell’s state characterizes the environment: is the global, dynamically scoped environment; the current positional parameters are stored in a list . In addition to environment and positional parameters, Smoosh also tracks a stack of local environments, . These local environments exist not only to support the non-POSIX builtin local, but also for scoped assignments for function calls (see Section 8.3). The shell tracks the read-only and exported variables in via and , respectively. Aliases and the current locale information are also tracked. (Smoosh only supports the ambient locale, though; see Section 5.1.)
The third and final line of the shell’s state holds on to finer grained information: the current working directory (), the PID of the last background command (), the exit status of the last command (), how deeply nested we are in the current loop (), and the offset into the argument for parsing in getopts () (see Section 8.2).
Smoosh’s state is just one way to keep track of everything a shell needs to know. For example, the entire last line of the shell state could instead be kept in the shell’s environment, . Some shell state in fact must be kept there, e.g., the OPTIND variable used by the getopts builtin. We find it more convenient to manually track the last exit status in as a number than to pretend it is an actual environment variable in . The POSIX standard specifies the minimum state for a shell in §2.12. Confusingly, some of the specified state is kept by the OS, not the shell (e.g., “open files inherited upon invocation of the shell, plus open files controlled by exec”).
Taking Smoosh’s general approach as a given, there are still alternative designs. For example, we track an explicit stack of local variables in the shell state , but we don’t have such a stack for function parameters: instead we use stack frames to save the old positional parameters (Figure 4). Not having a stack of positional parameters makes it easy to ensure that we never have an underflow of the positional parameter stack; having an explicit stack of locals makes it easier to do scoped lookup. Our variable lookup routine works as follows: it immediately returns the value for a special variable (e.g., the value of \sigma.n_{\texttt{\textdollar{}?}}\sigma.s^{}_{\mathsf{\texttt{\textdollar{}}}}\sigma.\ell^{*}\sigma.\rho$.
4.2. Word expansion
Word expansion occurs when evaluating assignments, commands, redirections, the iteratee of a for loop, and both the scrutinee and the patterns of case conditionals. The general outline of the process takes a word and expands it to a list of strings, called fields . We sketched the features of word expansion by example already (Section 2.1); we now give a formal model of word expansion.
We model expansion as a transition system between expansion states (Figure 4). There are seven such expansion states (Figure LABEL:fig:expansion), not to be confused with POSIX’s seven components of its four stages. The states are: the initial state, ; initial word expansion, , where we perform the first stage of POSIX word expansion (tildes, parameters, command substitution, and arithmetic); field splitting, ; pathname expansion, ; quote removal, ; and two terminal states, one for errors, , and one for successful completion, . A diagram of the possible transitions (Figure LABEL:fig:expansion, top) shows that errors can only occur during control code expansion; that field splitting can be skipped wholesale (dashed line); and that pathname expansion is always run, but sometimes it doesn’t actually glob but rather unescapes globbing characters (dashed line). The colors follow the legend in Figure 11. We give a lower level understanding of the possible transitions (Figure LABEL:fig:expansion, bottom) with inference rules for a small-step semantics, where means that in shell state , the expansion state transitions to a new shell state and a new expansion state . The boolean flag indicates whether or not a command substitution was run. The transition system conditions on the expansion options .
Initial word expansion in the expansion state is defined as a small-step operational semantics , where the initial and represent the current progress expanding the words into the expanded words (or an error, as recorded in the intial word expansion result ). We track whether or not a command substitution is performed (), along with a variety of options particular to initial word expansion (): whether splitting should happen (for a special case of $*), whether the current control codes are quoted, and whether the current control codes are from user input or are indirectly generated.
The rules map over each component of the words under expansion, pulling an element off the front, processing it, and adding the result to the already expanded words . We offer here only an excerpt of the control code expansion rules: those dealing with command substitution (Figure LABEL:fig:word-expansion). Command substitution is exemplary because (a) it ties the recursive knot between evaluation and expansion, (b) it makes several system calls, and (c) its evaluation produces more control codes.
How does command substitution work? Given a control code cc\texttt{fd}{r}\texttt{fd}{w}c\texttt{fd}{w}\texttt{fd}{r}\mathsf{cmdsubst}\texttt{fd}{r}\mathsf{cmdwait}\mathsf{exp}~{}sb{\texttt{\textdollar{}()}}$ to true.
4.3. Evaluation
Shell programs are evaluated in a line-oriented fashion: a command is read and then immediately evaluated. The most interesting bits of evaluation are simple commands , which make a variety of system calls. We give a detailed, formal semantics for them in Section 4.3.1, but two other command forms merit discussion: the case conditional and the for loop.
The case conditional has the form case in () ;; esac, where are words to expand and then match against the patterns . The ‘branch-on-pattern’ style of conditional is a cousin of the switch conditional seen in imperative languages like C, though the shell uses string pattern matching instead of equality. Case analysis doesn’t involve so many steps, but it involves a restricted form of expansion (stopping at arithmetic expansion) and pattern matching; the quoting rules for patterns have subtle interactions with the way dynamic pattern escaping (as opposed to parse-time escaping) is treated in the shell.
The for loop has the form for in ; do ; done, where is a variable name (without the wcwcxIFS.
We give a semantics to commands as a small-step relation , which we read as “in state , when indicates whether someone else is responsible for checking our exit status, the command steps to a new state and a new command ”. Much of the evaluation relation is standard: control (;, if, &&, ||, !, while, function definition) works more or less the usual way, though rather than having actual boolean values, the last exit status is looked up in the shell state.
As a warmup, we give the four rules for negation (Figure LABEL:fig:eval-negation). The first rule is a standard congruence rule: the term takes a step by stepping itself (Not). Note that we set in the premise. When the shell has the flag errexit set, we will exit on a non-zero exit status, but that behavior is proscribed “when executing the compound list following the while, until, if, or elif reserved word, a pipeline beginning with the ! reserved word, or any command of an AND-OR list other than the last” ((Austin Group, 2018), §4, set). Setting tells subsidiary rules that someone else will be checking the exit status, and the shell should not exit on error. The second rule propagates control: if the command is one of the internal AST nodes used to represent control, then we must propagate the control command through the negation (NotCtrl). The final two rules do the actual work of negation, turning a successful exit status of 0 into 1 (NotSuccess) and an unsuccessful exit status other than 0 into a 0 (NotFail). We show these four rules to assuage concerned readers: much of the semantics is standard.
Before offering a complete formal semantics for simple commands (Section 4.3.1), we comment on some of the more interesting runtime forms necessary in Smoosh’s semantics (Figure 4). The shell has a variety of line-oriented read/eval loops: the outermost shell itself, the ‘dot’ (.) a/k/a source command, and the eval command. In order to nest such loops, read/eval loops are part of Smoosh’s AST: nodes representing an eval loop reading commands and nodes representing an eval loop executing a command. There are several other interesting AST forms: holds on to information for restoring redirections; turns into a call to execve; turns into a call to waitpid; and and are stack frames representing the current trap being processed and the current function being called, respectively.
4.3.1. Simple Commands
Here we highlight one of the key parts of the shell’s semantics: the evaluation of simple commands (Figures LABEL:fig:eval-simple-expand, 9, and LABEL:fig:eval-simple). Assignments, commands, and redirections are agglomerated in the POSIX specification into a single form: a simple command of the form , where the are variable names, each is a word for expansion either in an assignment or as the command and arguments itself (), and each is a redirection. How does the specification explain them? Simple commands expand arguments, redirections, and then assignments—though assignments may be expanded before redirections if there are no arguments (i.e., ). A five-step, twelve-point outline determines how to perform assignments; a two-step, eleven-point outline determines how to find out if a command should be resolved as a special builtin, a builtin, a function, or an executable. Simple commands hardly live up to their name: they are not simple—in fact, they are the most complex part of command evaluation—and they may not even run a command!
In our model, evaluation of the simple command proceeds as follows. First, the words of the command arguments () are expanded from left to right (CmdStart, CmdArgExp, CmdArgErr, CmdArgDone). Next, redirections are expanded (CmdRedir, CmdRedirDoneErr) and applied (CmdRedirDone), saving file descriptor information for later unwinding. Then we expand the words in the assignments (CmdAssign, CmdAssignErr) and store the results in a fresh local environment (CmdAssignSet). At thus point, we are already in dangerous territory: the POSIX specification allows assignments to be expanded before redirections when there are no arguments. Our deterministic semantics always does redirections first; bash and ksh will run assignments first when there is no command.
Even in defining the relatively simple behavior of these expansions, there’s been call for a variety of helper functions (Figure 9). Some of these helpers are quite simple— just updates the command options to reflect whether expansion performed command substitution. Others have significant logic and make system calls: is 65 SLOC of Lem code that includes executable resolution using \mathsf{exec}\mathsf{redir}\mathsf{unredir}$ make a variety of system calls for manipulating files (fileRedir, closeAndSaveFD, renumberFD, and heredoc).
Resuming our explanation of simple commands: after all of the expansion has been done (Figure LABEL:fig:eval-simple-expand), it is time to try to actually run the command (Figure LABEL:fig:eval-simple). It may be the case that our simple command has no command at all, in which case we pop the local assignments off the stack and add them to the global environment (CmdAssignDoneNoCmd). The redirections are undone via . Now pays off: if a command substitution was performed, we should return its exit status in ; if not, we should succeed with exit status zero. If, on the other hand, there is a command, then we step to a ‘ready’ state, capturing the assignments accumulated in the local environment. Our use of is a particular choice, not at all mandated by POSIX, which doesn’t have local variables. We use local environments because (a) we need quasi-local things to track assignments, and (b) it is convenient to reuse the architecture we built for local variables (see Section 8.3).
When faced with , there are three possibilities: we could have disabled execution (CmdRunNoexec); we could be running a special builtin, in which case POSIX mandates that our assignment be global (CmdRunSpecial); or we could be running an ordinary command (CmdRun). In the latter two cases, we step to the form to actually run a command.
Finally, the form triggers a call to the helper function. One of three things happens: we encounter an error of some kind, in which case noninteractive shells may exit (RunFail); the command runs and produces a non-zero exit status, in which case the may exit due to the errexit option (RunErr); or, the command may run with no need to exit, in which case we continue with the command returned from (Run). The new command yielded by will be one of three things: , because a builtin was run to completion; a form, because a function was called; or a form, because an executable was forked but should run in the foreground, and so it needs to be waited on. These forms will in almost all cases be wrapped with a form; the exec special builtin is an exception, since its redirections should have global effect. The form will call to restore the saved file descriptors , as was done in CmdAssignDoneNoCmd.
It is the shell’s responsibility to (a) receive and record signals as they arrive so that (b) those pending signals can be periodically checked and handled. We check for traps after completing the evaluation of commands. The helper function uses the pendingSignal system call to track these signals and interrupt the main line of execution with trap handlers when necessary. The field of the shell state tracks a partial function from signals to strings. Signals not in the domain of have default dispositions; signals that map to empty strings are ignored; otherwise, the strings are parsed and interpreted when signals are handled. Of the three Run* rules, only one rule actually checks traps: RunFail. RunErr need not check traps because the shell is exiting. Run need not check traps because all three of the possible forms from will check traps when they complete.
5. Smoosh’s implementation
Having sketched in mathematical notation the Smoosh semantics, we turn our attention to Smoosh’s actual executable semantics, i.e., the code. Smoosh is written primarily in Lem (Mulligan et al., 2014), an ML-like language that extracts to OCaml and Coq, among others. Smoosh uses dash’s parser via libdash,333 https://github.com/mgree/libdash a library that tracks dash’s main repository but has hooks for accessing the parser and a few other functions, along with OCaml bindings to the dash AST. Smoosh is implemented across 20 files comprising 10 814 SLOC, of which 9 305 lines are in Lem and 1 509 are in OCaml. By way of comparison, dash has 14 633 SLOC, of which 13 318 are C code, 122 are shell scripts, and 1 193 are header files. The Smoosh code contains considerable extra material (about 1k SLOC): shim code for reading the dash AST and for rendering Smoosh ASTs in JSON for the stepper (Section 6). Smoosh is open source under the MIT license.444 https://github.com/mgree/smoosh
Smoosh’s architecture consists of a core semantics with two configurable ‘ports’: one for the driver, which determines how the Smoosh semantics are used; and one for the OS implementation, which determines what system calls do and how the filesystem behaves. The core semantics is a small-step operational semantics, as sketched in Section 4. The semantics itself is not outrageously large (4 880 SLOC for the semantics and builtins, 2 605 SLOC for the OS interface, and 3 252 for AST definitions), with a roughly 3:1 ratio of supporting code to code in the two core stepping functions:
val step_expansion : forall . OS =>
os_state * expansion_state expansion_step * os_state * expansion_state
val step_eval : forall . OS =>
os_state checking_mode stmt evaluation_step * os_state * stmt
The step_expansion function (328 SLOC, the core expansion semantics) corresponds to the relation (Section 4.2); the step_eval function (706 SLOC, the core evaluation semantics) corresponds to the relation (Section 4.3). In addition to semantics for the POSIX shell language itself, we also provide implementations for all of the special builtins,555 break, :, continue, . a/k/a source, eval, exec, exit, export, readonly, return, set, shift, times, trap, and unset (Austin Group, 2018) §2.14 mandatory builtins,666 alias, bg, cd, command, false, fc, fg, getopts, hash, jobs, kill, newgrp, pwd, read, true, umask, unalias, and wait (Austin Group, 2018) §2.9.1 part 1(d) and several others.777echo, help, history, local, printf, test a/k/a [, type.
The OS typeclass
The most interesting aspect of the Smoosh implementation is its OS typeclass.We have already introduced several system calls in the formal model (Figures LABEL:fig:word-expansion and 9). The system calls described there are part of 40 different calls that can be made to the operating system. We can break the OS interface into three areas of interest: true system calls (14 functions), used to work with processes, signals, and job control; file system calls (24 functions), used to traverse and manipulate the file system, where 10 of these calls correspond to POSIX stat and lstat; and parser interactions (2 functions), used to communicate values of PS1 and PS2 to the libdash parser. Note that the calls described here don’t necessarily correspond to POSIX-defined functions or any particular operating system’s interface. Some system calls correspond clearly (e.g., execve); other Smoosh system calls (e.g., heredoc) correspond to several system calls (create a pipe, possibly spawn a process if the heredoc string is bigger than the OS buffer size, write the heredoc text to the pipe). Some other system calls don’t correspond to true “system calls” in any sense at all: pendingSignal doesn’t actually make any system calls, but merely looks at a data structure. We put pendingSignal in the OS typeclass because the nature of that signal-tracking data structure depends on the OS typeclass instance.
We have implemented two instances of the OS typeclass: the system implementation makes real system calls to the host OS, allowing us to use Smoosh as a real shell; the symbolic implementation makes no real system calls to the host OS but instead simulates a POSIX OS and filesystem. We use the symbolic instance in our stepper (Section 6). The OS typeclass forces us to confront difficult issues in program structure and API design. We can see the root of the issue in waitpid. Different implementations of waitpid must do drastically different things. In system mode, we expect waitpid to actually call wait4 or some similar host OS function, blocking until the given PID has terminated. In symbolic mode, we expect the current symbolic process to suspend while another symbolic process proceeds. That is, in system mode the OS typeclass is just a shim between Smoosh and the host OS, and we can rely on the host OS’s scheduler entirely. In symbolic mode, the OS typeclass must actually implement the scheduler itself; see Section 6.1, “Scheduling”.
5.1. Limitations
We have not attempted to implement any of the POSIX locale functionality. It is a longer term goal to give a precise formal account of locales. For now, however, Smoosh uses OCaml functions for ordering strings and formatting numbers, which are locale-independent as of 2018. Smoosh’s handling of the various terminal/TTY functions is incomplete.
Parsing
Smoosh currently supports only the libdash parser, as extracted from dash. The dash parser is certainly “good enough”, as dash is the default /bin/sh on Debian and Ubuntu systems. Using its parser in Smoosh has some drawbacks, though. First, prompting using PS1 and PS2 is built in to dash’s parser. These prompt variables ought to be subject to variable expansion each time they are displayed—and dash will (incorrectly) use its own, internal expansion routine and environment to expand these variables. Second, the dash parser doesn’t support common extensions to the POSIX spec, like statically parsed tests with the [[ form. Third, dash’s lexer doesn’t correctly support multi-byte characters, because dash’s parser doesn’t use structured terms to represent words but instead inserts literal control codes with negative signed character values. Fourth and finally, dash’s parser is written in C and may therefore result in memory errors or security vulnerabilities. We are interested in connecting Smoosh to other parsers, like Morbig (Régis-Gianas et al., 2018) and the bash-compatible parser from OSH (Chu, 2019).
6. Smoosh’s shell stepper
We have used the Smoosh semantics to implement a program stepper, which we call the Shtepper. The Shtepper traces shell programs in a simulated environment. We use lightweight symbolic execution: symbolic values are treated symbolically as much as possible, but we don’t support branches, i.e., we only explore a single path. The Shtepper is implemented in two parts: a tracer and a visualizer. The tracer takes a shell program and a description of the environment in which to run it and produces a trace in JSON. The visualizer takes a JSON trace and displays it in a browser using JavaScript. The Shtepper is publicly available online.888 http://shell.cs.pomona.edu/shtepper
The tracer comprises 1.2k SLOC of Lem and OCaml, implementing a synthetic POSIX environment (processes and filesystem), JSON mappings for the Smoosh AST, and a driver for the semantics. The visualizer comprises 1.5k SLOC of Ruby, ERB templates, JavaScript, and CSS; the bulk of the code (1.2k SLOC) is in the JavaScript trace rendering logic.
The visualizer
The Shtepper shows the main shell’s thread of execution, highlighting evaluation steps in blue and expansion steps in pink (Figure 11). The shell contains a great deal of state and performs its work in many small steps. It is difficult to know what to highlight, what to merely show, and what to hide. By default, we show each step of expansion and evaluation, showing the environment, STDOUT, and STDERR at the beginning, after changes, and at the end of execution. We plan to study which information best helps novices learn the shell and avoid pitfalls.
6.1. Simulating POSIX
Our stepper runs in a simulated POSIX environment, using the symbolic OS instance. We use ‘fuel’ to limit the extent of symbolic execution. The web interface puts a conservative limit on fuel; at the risk of nontermination, the fuel can be made infinite in the tracer.
The symbolic shell state tracks the entirety of the POSIX environment: a list of FIFO pipes; a list of processes; the root of the symbolic filesystem; the current shell’s mapping of file descriptors to pipes and files; whether or not the current shell has exited; the current shell’s umask; and, the contents of /etc/passwd to simulate calls to getpwnam during tilde expansion. Our symbolic model is sufficient to run interesting pipes between subshells, but insufficient to run executables.
Scheduling
When simulating a symbolic POSIX system, which processes get to run when? There is a tension between faithfully modeling all possible interleavings of processes and offering concise, legible information to the user of the stepper. To motivate the question, consider the following two pipelines: while true; do echo 5; done | true and while true; do echo 5; done | { read x; echo $((x+42)); }. Both pipelines spawn two processes, both of which use shell builtins exclusively: neither of these pipelines needs to make an execve system call (though some systems may implement true or echo as executables, Smoosh and most shells build them in). In both cases, the first process will send 5 on STDOUT infinitely many times. In the first pipeline, the second process ignores its input entirely, terminating immediately. In the second pipeline, the second process reads a line of input, emits the line 47, and terminates.
In a real POSIX system, both processes will be scheduled concurrently in both pipelines. The while loop will write as fast as it can until the pipe between the two processes is full, at which point the looping process will block. The first pipeline’s second process will terminate nearly immediately; the second pipeline’s second process will terminate right after it’s been able to read one line of input from the pipe. In either case, when the second process terminates, the pipe between the two processes has no more readers, and a SIGPIPE will be sent to the first process, terminating it.
How should we simulate these pipelines? We could use real threads to run the simulation, resulting in a different schedule each time. Such an approach would be error prone and nondeterministic; it is more appealing to use deterministic, simulated threads. How, then, do we schedule our simulated threads? If we always run the pipeline left to right, then the pipelines above will block when the pipe buffers fill up (or diverge if they’re unbounded). If we always run the pipeline right to left, then the first pipeline above will terminate immediately but the second one will block, waiting to read from a process that’s never scheduled. We need to do some scheduling, but how?
We prioritize determinism and clarity over faithfully exploring all possible schedules. Scheduling takes place in rounds; every process in the system will take a step every round (which may mean blocking). Our scheduler will try to step each process in creation order: the root shell will get to go first, and then the first subshell, and so on. But: whenever a shell waits on another process (e.g., because of a foreground command or a command substitution) or would block reading from a pipe, we take the opportunity to step the waited on or writing process, making the scheduling order demand-driven. Scheduling according to demand keeps our scheduling deterministic without rigidly following creation order.
When waiting for a command to terminate, we put no bound on the number of steps we’re willing to wait—blocking is the right behavior.999Note that this waiting still counts as fuel usage, and our symbolic execution may give up while waiting. When a process calls waitpid in symbolic mode, the waited-on process is looked up and stepped. For a blocking wait—like waitpid or readAll—our scheduling works well. When waiting for a writer to supply input, we allow batches of 10 steps at a time, since one need not read all of the input: the read builtin only needs to read a line, and we’ll get a shorter trace if we step the processes only enough to produce as much output as we need.
Our scheduler is not a completely realistic model of POSIX scheduling—we don’t get to see every possible interleaving. Our scheduling does, however, model the way pipes and processes interact well enough for us to simulate interesting shell pipelines. Our scheduler is acceptable in light of its benefits: simple engineering; API compatibility between system and symbolic modes; a straightforward, linear visualization of shell stepping; and determinism without rigidity.
Filesystem
Our model of the filesystem is quite simplistic: we track a hierarchy of files and directories, but not file contents. Such a simple model suffices to simulate pathname expansion and file descriptor redirections () and heredoc redirections (), but not file redirections (). It is a matter of engineering effort to produce a better symbolic filesystem. It would be interesting to link our symbolic system up with SibylFS (Ridge et al., 2015) or Forest (Fisher et al., 2011), or to use Ntzik et al.’s reasoning (2017; 2018). We can also imagine implementing a read-only filesystem that allows access to the real, underlying filesystem, but treats writes (and other dangerous operations, like execve) as noops. We consider platform-specific filesystems, like /proc, as out of scope; we could in theory apply platform-dependent reasoning (Nita et al., 2008).
7. POSIX conformance
Smoosh is meant to serve as a formal foundation for the POSIX shell. But is Smoosh a good model? We use three execution-based test suites to test Smoosh for conformance: the official POSIX test suite, the Modernish test suite, and a suite of our own devising. What are these test suites, why does using them support our claim, and how well do we conform?
The POSIX test suite
The Open Group VSC test suite101010https://www.opengroup.org/testing/testsuites/vscpcts2003.htm is a broad set of tests for POSIX conformance systems. We use VSC-PCTS2016 version 2.15. The POSIX shell test suite has 494 tests: 1 is ‘not in use’, 31 are ‘untestable’ and are checked by hand during certification, and 44 are locale-dependent and marked ‘unresolved’ or ‘unsupported’. There are 418 locale-independent tests; these are the tests we use to compare against other shells.
The Modernish test suite
Modernish is a substantial library for the POSIX shell (Dekker, 2019) aimed at simplifying the shell language. It uses existing shell features and compatibility testing to construct a new, shell-like language on top of an existing shell. Modernish is implemented in 12k SLOC of shell scripts. It is very portable. To achieve this portability, Modernish runs a series of diagnostics against its host shell, detecting a variety of bugs and quirks.
Our test suite
Our third test suite is our own. Our test suite has two parts: internal unit tests and external system tests. Both parts serve as regression tests, but we can only use the external system tests on other shells, as the internal unit tests run Smoosh subsystems in symbolic mode. Each external system test pairs a short shell scripts with expected output and exit status. These are “whole system tests” but often test obscure corners of the shell’s behavior. A test harness runs the scripts and checks outputs for a given shell executable. There are 161 external system tests, in five categories: 2 speed tests, 67 tests of builtin commands, 2 parsing tests, 82 tests of shell semantics, and 8 tests of the sh executable’s interface. Some of the external system tests are adapted from bugs found by the POSIX test suite and Modernish. Our test suite is occasionally too picky, demanding particular exit status when any non-zero exit status would be conformant.
Why use these tests?
We use the POSIX test suite because it is the de facto standard of what POSIX compliance is. It is imperfect (see Section 7.1), but covers a great deal of ground. It is also a significant stress test of the shell, comprising 29k SLOC of tests in addition to 58k SLOC of shell code in the harness (which, by default, runs in the shell under test). The Modernish test suite condenses a great deal of knowledge and experience with real shells into a very small package. Modernish’s shell scripts also rely on detailed characteristics of shell behavior—during development, the Modernish test suite exercised several bugs in Smoosh’s semantics that the POSIX test suite did not detect. Finally, our test suite not only tests against our own regressions from development, but also highlights corner cases not covered in other test suites.
Conformance
We summarize the results of our various test suites in Table 1. Of all of the shells tested, Smoosh is the only one to have no failing tests in the POSIX test suite or in our own. In Modernish, Smoosh has no quirks and one bug: Smoosh’s parser (borrowed from dash) cannot handle multibyte characters or characters with codes over 128; this bug triggers the one Modernish test that Smoosh fails.
7.1. Bugs found
We found several POSIX compliance bugs in our two primary reference shells, dash and yash, rediscovering a subtlety in the semantics for printf that had already been independently addressed. We also identified a typographical error in the POSIX spec and several bugs in the POSIX test suite. Identifying and reporting these bugs makes up only a small part of the dividends of formal semantics, but we mention them to highlight that even before we’ve significantly applied the semantics, the process of development has been useful.
Bugs in shells
In dash, there were several issues: in arithmetic expansion, variables that were unset or empty were improperly treated; the times command reported incorrect numbers; and the empty alias was mishandled. We submitted patches for these bugs; the first was superseded by a different, independent fix of the same bug a year later; the second and third are under review. In yash, asynchronous commands (e.g., curl ... &) do not have their STDIN redirected to /dev/null and fg issues too much output. Other shells clearly have significant bugs, but we have not had the time to track them down in detail and report them. Neither zsh nor OSH can run the POSIX test suite; OSH cannot run the Modernish suite, either, but it can run our suite, where it fails 77 of our tests.
Bugs in the POSIX test suite and specification
We found ten bugs in the POSIX test suite, all of which have been confirmed as true bugs and will be fixed in the next version of the test suite. We also found typographical errors in the POSIX specification and in the POSIX test suite. We also discovered a number of important shell behaviors that were not being tested. We are planning to submit our new tests to be added to the POSIX test suite.
7.2. Performance
Smoosh is substantially slower than existing implementations—about 4x slower on the POSIX test suite, slightly better in Modernish (Table 1).111111Tests were run in Docker on an 2.8 GHz Intel Core i7 with 16GB RAM. Timings are from a single run, but there is little variance between runs. The timings in our own test suite should not be taken too seriously: the only way to fail some of our tests is to time out. Why is Smoosh slow? Most shells are implemented as recursive evaluators in C, performing expansion by mutation of compact data structures, whereas Smoosh is implemented as iterated small-step semantics in OCaml (via Lem), performing expansion on immutable, non-compact ASTs. Smoosh’s slowness is not perceptible to us at the command line—most interactive sessions spend the majority of their wall clock time running executables, not scripts.
We speculated that some of Smoosh’s slowness was because we weren’t hashing the filesystem calls in command name $PATH resolution. We implemented hashing, but our performance on the POSIX test suite was unchanged. Hashing may not be an optimization on modern systems.
8. Beyond the POSIX specification
The POSIX specification defines a common core for shells to implement, but every shell has to make decisions about what is left unspecified. Every shell we encountered implements extensions, too. In order for Smoosh’s semantics to be a good model of the practical, implementation-oriented interpretation of the POSIX shell, we must understand how shells handle unspecified behavior, underspecified behavior, and extensions. We offer examples of each below.
When evaluating , Guha et al. use the Mozilla test suite—they ensure that they get the same answers as Rhino, V8, and SpiderMonkey (Guha et al., 2010). Implementations of the POSIX shell do not have nearly the same level of agreement as JavaScript interpreters do. When developing Smoosh, we compared to many other shells, which frequently disagree in corner cases. All of the shells implement unspecified extensions of the POSIX standard, too. Since none of these shells is perfectly POSIX compliant, we declined to precisely match any of them. We strived instead for POSIX conformance, clear semantics, and a lack of quirks.
8.1. Unspecified behavior: non-local break and continue
Consider the case of non-lexical control. Does the following program print hi or not?
f() { break; echo hi; }; while true; do f; break; done
According to the POSIX specification, the break command in the function f is not lexically enclosed in the loop. Non-lexical use of the control builtins break and continue is left unspecified. Shells behave differently! The more sensible option is to forbid such non-lexical control, as most shells do, printing hi (and possibly a diagnostic message from the non-lexically enclosed break in f). But bash 3.2.57(1) (which comes with OS X) and zsh both allow the break to pass through the function call and exit the non-lexically enclosing loop. In Modernish, zsh and the old bash’s behavior is called a quirk. There is also ambiguity around the permissible behaviors of break and continue when used without a non-lexically or with no enclosing loop at all: the standard seems to forbid warning the user when control commands have no effect. The issue is under discussion with The Open Group.
Prioritizing safety and avoiding quirks means that Smoosh uses lexical control (and will print hi), though non-lexical control can be enabled with a flag.
8.2. Underspecified behavior: getopts and hidden state
The POSIX specification mandates that the getopts builtin should use the user-visible shell variables OPTIND and OPTARG to parse command-line and function arguments. The rules are subtle, but the general idea is that OPTIND tracks the index of the current option argument; the getopts utility also takes the name of a variable to set with the current option. If getopts finds an option that takes an argument, the argument value is stored in OPTARG. There’s a problem in the POSIX specification, though: getopts needs more information than OPTIND and OPTARG to work properly. In particular, getopts needs to keep track of the offset into the current argument to handle ‘grouped’ arguments.
As a concrete example, consider getopts "ab:c:" opt -ab hi -c hello, where "ab:c:" is the optstring specifying that -a is an option without arguments and that -b and -c are options that take arguments, opt is the variable name to set with the current option, and -ab hi -c hello is the argument list to process. After parsing the first option, opt ought to be a, but what should OPTIND be? And where should the shell record the fact that the a has already been processed? The choice we’ve seen taken in shells is to keep some extra state to record the offset of the next character to process inside of a grouped option. Every shell but yash keeps this state hidden; Smoosh stores it in the variable. yash makes this state visible by adding the current offset as in OPTIND=1:2. Even so, shells differ slightly in their behavior, with some incrementing OPTIND earlier than others.
The issue has two root causes. First, the POSIX specification only implies that more state is needed. Second, the specification is silent on how to handle the implicit state, leading to divergent behaviors. The issue is under discussion on the Open Group mailing list.
8.3. Unspecified behavior and extensions: scope and the local builtin
The POSIX shell has dynamic scope. Variable assignments on a command line, as in LD_PRELOAD=... cmd bar baz have three possible behaviors, depending on the nature of cmd (per Section 4.3). If cmd is a special builtin, then the assignments are globally visible. If cmd is a program or a non-special builtin, then the assignments are visible only to that program or builtin. Finally, if cmd is a function, then the assignments are visible during the dynamic extent of the function, but it is unspecified whether or not the assignments are globally visible afterwards: scope may or may not be ‘nested’. Half of the shells we considered implement nested scope for functions (Table 2).
Many shells implement a local builtin, which has syntax analogous to export and readonly (Table 2). That is, one can write local x=5 inside of a function to declare a variable x that will be (globally) bound to 5 until the function returns, when x will revert to whatever value it had before local was used. Considering that the facilities to implement local line up nearly exactly with those needed for nested scope, it is unsurprising that of the four shells with nested scope, only mksh doesn’t have local. OSH implements nested scope; our tests, however, revealed a bug in getopts along with a “serious bug” with scoping. These bugs will be fixed in OSH’s next release.
We implemented local in Smoosh, erring on the side of featureful safety. By analogy to export and readonly, we’ve treated local as a special builtin. Since erroneous conditions in special builtins cause shell scripts to abort, making this choice assures early failure in case local is misused. Like in dash, it is an error in Smoosh to create a local variable with the same name as a readonly variable. It may seem unduly restrictive, since the variable will be locally scoped, but triggering an error makes readonly variables unforgeable, even locally.
9. Related work
Research on the shell
The POSIX shell has seen relatively little academic attention. There are only two recent works that take the shell’s semantics seriously: ABash (Mazurak and Zdancewic, 2007) and CoLiS (Jeannerod et al., 2017b; Régis-Gianas et al., 2018; Jeannerod, 2017; Jeannerod et al., 2017a). ABash is a static analysis for bash scripts in particular; it checks for common expansion bugs along with taint tracking. ABash has some limitations in its parser and in its model of expansion; we hope that Smoosh’s semantics can combine with ABash’s approach to provide a more precise analysis that can cover more of the shell. The CoLiS project takes a core calculus approach to the shell, not unlike (Guha et al., 2010). Jeannerod et al. define an interpreter for a tiny shell-like language (2017; 2017a) to which they elaborate shell using their hand-built parser, Morbig (Régis-Gianas et al., 2018). The CoLiS interpreter is not yet a usable shell—it passed only 8 of our 161 tests, mostly due to a variety of unsupported shell features (e.g., assignments in commands, heredocs, break); even so, its symbolic evaluator has already found numerous bugs in Debian maintainer scripts (Jeannerod et al., 2017b). Smoosh has taken the “full semantics” approach rather than the “elaborate to a core calculus” approach; by implementing the full semantics, we have a baseline against which we can prove an elaboration correct. We believe it would have been much more challenging for us to achieve Smoosh’s level of conformance if we had started with elaboration (see “Whole-language semantics” below). We have left parsing out of our scope for now (see Section 5.1); we hope to extend Smoosh to use Morbig and other parsers. Greenberg has made arguments about why certain features of the shell are useful for concurrency (2018a) and interactivity (2018b); see the latter paper for references on much older research on command-line interfaces, interactivity, and live programming (Collins et al., 2003). NoFAQ (D’Antoni et al., 2016) uses machine learning to suggest repairs to commands, but treats shell syntax as unstructured text.
Tools for the shell
Outside of academia, programmers have created a variety of tools to support shell programming. Modernish is a library that rebuilds the shell language using its own features (Dekker, 2019). ShellCheck (koalaman, 2016) is a linter that can process a variety of shell extensions; it is purely syntactic, though, and could be improved by having it reason using Smoosh’s semantics. ExplainShell (Kamara, 2016) patches together a parser for bash121212https://github.com/idank/bashlex and some post-processed man pages to explain each part of a shell structure. ExplainShell isn’t about semantic insight: e.g., asking it about {#*} doesn’t mention that its expansion is unspecified; nothing indicates the difference between @ and "$@". Finally, maybe allows for tentative use of shell commands (Weidmann, 2016).
Replacement shells, scripting languages, and shell libraries
A variety of replacement shells have been proposed, both in academia, with projects like Scsh, Shill, and Rash (Shivers, 2006; Moore et al., 2014; Hatch and Flatt, 2018) and outside of academia, with popular interactive shells like fish and zsh, along with less well known replacement shells like Xonsh and OSH (Scopatz et al., 2019; Chu, 2019). Only zsh and OSH aim for any real POSIX compatibility; Xonsh strives to have some measure of compatibility with bash. Others are more scripting languages than shells: Shivers planned but never developed an interactive mode for Scsh; Shill isn’t meant to be interactive, either. Shell libraries, like Plumbum, Turtle, and Shcaml aim to bring shell-like idioms into conventional programming languages (Schreiner et al., 2018; Gonzalez, 2018; Heller and Tov, 2008). None of these quite match our scope: POSIX shells that can be used both interactively and programmatically.
Whole-language semantics
There have been a variety of efforts at building language semantics for whole, real languages: for JavaScript (Maffeis et al., 2008; Guha et al., 2010), for C (Blazy and Leroy, 2009; Ellison and Rosu, 2012; Krebbers et al., 2014; Kang et al., 2015; Memarian et al., 2016), and for Rust (Weiss et al., 2018; Jung et al., 2017; Weiss et al., 2019), to name a few. Approaches to such whole-language semantics fall broadly into two styles of modeling: large semantics that cover a broad range of language constructs, using light syntactic sugar; and small semantics that cover a narrow range of language constructs, using heavy syntactic sugar/elaboration. K and Lem epitomize the former style (Roşu and Şerbănuţă, 2010; Mulligan et al., 2014), while is emblematic of the latter style, defining a small calculus to which it elaborates JavaScript (Guha et al., 2010). Why did we choose the large semantics route? First, the shell’s complexity makes it hard to precisely encode every aspect of a shell construct in terms of other, simpler ones; moreover, using a large AST lets us better match parts of the semantics to the POSIX spec. Second, leaves out eval—to include it requires including not only a parser in the desugarer or semantics, but also the desugarer itself. We cannot do without eval (Richards et al., 2011): it’s used in a variety of testing frameworks and is the only way to achieve certain forms of indirection in the shell (e.g., a tilde expansion with a variable). Now that we have much more experience with the shell—and a reference semantics against which to prove our elaboration correct—we can think about desugaring and compilation.
10. Conclusion
Smoosh is a small-step operational semantics for a new shell implementing the POSIX shell standard. The executable semantics is implemented in Lem code but corresponds well to legible, practicably complex inference rules. Of the shells we have considered, Smoosh best conforms to the POSIX specification and has the fewest bugs and quirks. In the process of developing and testing Smoosh, we found numerous bugs, confusions, and underspecifications in existing shells used in production, in the POSIX test suite, and in the POSIX specification itself.
Smoosh cleanly separates OS functionality from the core shell semantics, allowing us to use our tested semantics in symbolic settings; we demonstrate this capability with a stepper. The semantics we’ve developed here promises to support research on and development of shells and tools for shells. Our semantics will also form the basis of design innovations and revisions in the shell and, we hope, interactive programming in general.
Acknowledgements.
Much of this work was done while the first author was on sabbatical at Harvard University; he gratefully acknowledges the Harvard CS department in general and Steve Chong in particular for their support. Brian Selves of the Open Group provided invaluable help with understanding the POSIX specification and the POSIX test suite, the latter of which the Open Group helpfully provided. The Austin Group mailing list offered helpful, timely feedback and discussion. Discussions with Ralf Treinen, Yann Régis-Gianas, and Andy Chu were helpful and interesting. Aaron Bembenek and David Holland had helpful comments. Hannah de Keijzer proofread.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Austin Group (2018) The Austin Group. 2018. POSIX.1 2017: The Open Group Base Specifications Issue 7 (IEEE Std 1003.1-2008).
- 3Blazy and Leroy (2009) Sandrine Blazy and Xavier Leroy. 2009. Mechanized Semantics for the Clight Subset of the C Language. Journal of Automated Reasoning 43, 3 (01 Oct 2009), 263–288. https://doi.org/10.1007/s 10817-009-9148-3 · doi ↗
- 4Chu (2019) Andy Chu. 2019. Oil Shell. https://www.oilshell.org/
- 5Collins et al . (2003) Nick Collins, Alex Mc Lean, Julian Rohrhuber, and Adrian Ward. 2003. Live coding in laptop performance. Organised sound 8, 3 (2003), 321–330.
- 6D’Antoni et al . (2016) Loris D’Antoni, Rishabh Singh, and Michael Vaughn. 2016. No FAQ: Synthesizing Command Repairs from Examples. Co RR abs/1608.08219 (2016). http://arxiv.org/abs/1608.08219
- 7Dekker (2019) Martijn Dekker. 2019. Modernish. https://github.com/modernish/modernish
- 8Ellison and Rosu (2012) Chucky Ellison and Grigore Rosu. 2012. An Executable Formal Semantics of C with Applications. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’12) . ACM, New York, NY, USA, 533–544. https://doi.org/10.1145/2103656.2103719 · doi ↗
