docs: add language/string-literals.md

This commit is contained in:
Ryan Hendrickson 2024-07-31 19:07:57 -04:00
parent 6ed67d35ed
commit 9e8afc68e5
7 changed files with 199 additions and 172 deletions

View file

@ -344,6 +344,7 @@ const redirects = {
}, },
"language/syntax.html": { "language/syntax.html": {
"scoping-rules": "scoping.html", "scoping-rules": "scoping.html",
"string-literal": "string-literals.html",
}, },
"installation/installing-binary.html": { "installation/installing-binary.html": {
"linux": "uninstall.html#linux", "linux": "uninstall.html#linux",

View file

@ -29,6 +29,7 @@
- [String context](language/string-context.md) - [String context](language/string-context.md)
- [Syntax and semantics](language/syntax.md) - [Syntax and semantics](language/syntax.md)
- [Variables](language/variables.md) - [Variables](language/variables.md)
- [String literals](language/string-literals.md)
- [Identifiers](language/identifiers.md) - [Identifiers](language/identifiers.md)
- [Scoping rules](language/scope.md) - [Scoping rules](language/scope.md)
- [String interpolation](language/string-interpolation.md) - [String interpolation](language/string-interpolation.md)

View file

@ -16,7 +16,7 @@ An *identifier* is an [ASCII](https://en.wikipedia.org/wiki/ASCII) character seq
# Names # Names
A name can be an [identifier](#identifier) or a [string literal](./syntax.md#string-literal). A name can be an [identifier](#identifier) or a [string literal](string-literals.md).
> **Syntax** > **Syntax**
> >

View file

@ -8,6 +8,10 @@ Such a construct is called *interpolated string*, and the expression inside is a
[path]: ./types.md#type-path [path]: ./types.md#type-path
[attribute set]: ./types.md#attribute-set [attribute set]: ./types.md#attribute-set
> **Syntax**
>
> *interpolation_element*`${` *expression* `}`
## Examples ## Examples
### String ### String

View file

@ -0,0 +1,189 @@
# String literals
A *string literal* represents a [string](types.md#type-string) value.
> **Syntax**
>
> *expression* → *string*
>
> *string*`"` ( *string_char*\* [*interpolation_element*][string interpolation] )* *string_char*\* `"`
>
> *string*`''` ( *indented_string_char*\* [*interpolation_element*][string interpolation] )* *indented_string_char*\* `''`
>
> *string* → *uri*
>
> *string_char* ~ `[^"$\\]|\$(?!\{)|\\.`
>
> *indented_string_char* ~ `[^$']|\$\$|\$(?!\{)|''[$']|''\\.|'(?!')`
>
> *uri* ~ `[A-Za-z][+\-.0-9A-Za-z]*:[!$%&'*+,\-./0-9:=?@A-Z_a-z~]+`
Strings can be written in three ways.
The most common way is to enclose the string between double quotes, e.g., `"foo bar"`.
Strings can span multiple lines.
The results of other expressions can be included into a string by enclosing them in `${ }`, a feature known as [string interpolation].
[string interpolation]: ./string-interpolation.md
The following must be escaped to represent them within a string, by prefixing with a backslash (`\`):
- Double quote (`"`)
> **Example**
>
> ```nix
> "\""
> ```
>
> "\""
- Backslash (`\`)
> **Example**
>
> ```nix
> "\\"
> ```
>
> "\\"
- Dollar sign followed by an opening curly bracket (`${`) "dollar-curly"
> **Example**
>
> ```nix
> "\${"
> ```
>
> "\${"
The newline, carriage return, and tab characters can be written as `\n`, `\r` and `\t`, respectively.
A "double-dollar-curly" (`$${`) can be written literally.
> **Example**
>
> ```nix
> "$${"
> ```
>
> "$\${"
String values are output on the terminal with Nix-specific escaping.
Strings written to files will contain the characters encoded by the escaping.
The second way to write string literals is as an *indented string*, which is enclosed between pairs of *double single-quotes* (`''`), like so:
```nix
''
This is the first line.
This is the second line.
This is the third line.
''
```
This kind of string literal intelligently strips indentation from
the start of each line. To be precise, it strips from each line a
number of spaces equal to the minimal indentation of the string as a
whole (disregarding the indentation of empty lines). For instance,
the first and second line are indented two spaces, while the third
line is indented four spaces. Thus, two spaces are stripped from
each line, so the resulting string is
```nix
"This is the first line.\nThis is the second line.\n This is the third line.\n"
```
> **Note**
>
> Whitespace and newline following the opening `''` is ignored if there is no non-whitespace text on the initial line.
> **Warning**
>
> Prefixed tab characters are not stripped.
>
> > **Example**
> >
> > The following indented string is prefixed with tabs:
> >
> > ''
> > all:
> > @echo hello
> > ''
> >
> > "\tall:\n\t\t@echo hello\n"
Indented strings support [string interpolation].
The following must be escaped to represent them in an indented string:
- `$` is escaped by prefixing it with two single quotes (`''`)
> **Example**
>
> ```nix
> ''
> ''$
> ''
> ```
>
> "$\n"
- `''` is escaped by prefixing it with one single quote (`'`)
> **Example**
>
> ```nix
> ''
> '''
> ''
> ```
>
> "''\n"
These special characters are escaped as follows:
- Linefeed (`\n`): `''\n`
- Carriage return (`\r`): `''\r`
- Tab (`\t`): `''\t`
`''\` escapes any other character.
A "double-dollar-curly" (`$${`) can be written literally.
> **Example**
>
> ```nix
> ''
> $${
> ''
> ```
>
> "$\${\n"
Indented strings are primarily useful in that they allow multi-line
string literals to follow the indentation of the enclosing Nix
expression, and that less escaping is typically necessary for
strings representing languages such as shell scripts and
configuration files because `''` is much less common than `"`.
Example:
```nix
stdenv.mkDerivation {
...
postInstall =
''
mkdir $out/bin $out/etc
cp foo $out/bin
echo "Hello World" > $out/etc/foo.conf
${if enableBar then "cp bar $out/bin" else ""}
'';
...
}
```
Finally, as a convenience, *URIs* as defined in appendix B of
[RFC 2396](http://www.ietf.org/rfc/rfc2396.txt) can be written *as
is*, without quotes. For instance, the string
`"http://example.org/foo.tar.bz2"` can also be written as
`http://example.org/foo.tar.bz2`.

View file

@ -6,175 +6,7 @@ This section covers syntax and semantics of the Nix language.
### String {#string-literal} ### String {#string-literal}
*Strings* can be written in three ways. See [String literals](string-literals.md).
The most common way is to enclose the string between double quotes, e.g., `"foo bar"`.
Strings can span multiple lines.
The results of other expressions can be included into a string by enclosing them in `${ }`, a feature known as [string interpolation].
[string interpolation]: ./string-interpolation.md
The following must be escaped to represent them within a string, by prefixing with a backslash (`\`):
- Double quote (`"`)
> **Example**
>
> ```nix
> "\""
> ```
>
> "\""
- Backslash (`\`)
> **Example**
>
> ```nix
> "\\"
> ```
>
> "\\"
- Dollar sign followed by an opening curly bracket (`${`) "dollar-curly"
> **Example**
>
> ```nix
> "\${"
> ```
>
> "\${"
The newline, carriage return, and tab characters can be written as `\n`, `\r` and `\t`, respectively.
A "double-dollar-curly" (`$${`) can be written literally.
> **Example**
>
> ```nix
> "$${"
> ```
>
> "$\${"
String values are output on the terminal with Nix-specific escaping.
Strings written to files will contain the characters encoded by the escaping.
The second way to write string literals is as an *indented string*, which is enclosed between pairs of *double single-quotes* (`''`), like so:
```nix
''
This is the first line.
This is the second line.
This is the third line.
''
```
This kind of string literal intelligently strips indentation from
the start of each line. To be precise, it strips from each line a
number of spaces equal to the minimal indentation of the string as a
whole (disregarding the indentation of empty lines). For instance,
the first and second line are indented two spaces, while the third
line is indented four spaces. Thus, two spaces are stripped from
each line, so the resulting string is
```nix
"This is the first line.\nThis is the second line.\n This is the third line.\n"
```
> **Note**
>
> Whitespace and newline following the opening `''` is ignored if there is no non-whitespace text on the initial line.
> **Warning**
>
> Prefixed tab characters are not stripped.
>
> > **Example**
> >
> > The following indented string is prefixed with tabs:
> >
> > ''
> > all:
> > @echo hello
> > ''
> >
> > "\tall:\n\t\t@echo hello\n"
Indented strings support [string interpolation].
The following must be escaped to represent them in an indented string:
- `$` is escaped by prefixing it with two single quotes (`''`)
> **Example**
>
> ```nix
> ''
> ''$
> ''
> ```
>
> "$\n"
- `''` is escaped by prefixing it with one single quote (`'`)
> **Example**
>
> ```nix
> ''
> '''
> ''
> ```
>
> "''\n"
These special characters are escaped as follows:
- Linefeed (`\n`): `''\n`
- Carriage return (`\r`): `''\r`
- Tab (`\t`): `''\t`
`''\` escapes any other character.
A "double-dollar-curly" (`$${`) can be written literally.
> **Example**
>
> ```nix
> ''
> $${
> ''
> ```
>
> "$\${\n"
Indented strings are primarily useful in that they allow multi-line
string literals to follow the indentation of the enclosing Nix
expression, and that less escaping is typically necessary for
strings representing languages such as shell scripts and
configuration files because `''` is much less common than `"`.
Example:
```nix
stdenv.mkDerivation {
...
postInstall =
''
mkdir $out/bin $out/etc
cp foo $out/bin
echo "Hello World" > $out/etc/foo.conf
${if enableBar then "cp bar $out/bin" else ""}
'';
...
}
```
Finally, as a convenience, *URIs* as defined in appendix B of
[RFC 2396](http://www.ietf.org/rfc/rfc2396.txt) can be written *as
is*, without quotes. For instance, the string
`"http://example.org/foo.tar.bz2"` can also be written as
`http://example.org/foo.tar.bz2`.
### Number {#number-literal} ### Number {#number-literal}
@ -253,7 +85,7 @@ Attribute sets are written enclosed in curly brackets (`{ }`).
Attribute names and attribute values are separated by an equal sign (`=`). Attribute names and attribute values are separated by an equal sign (`=`).
Each value can be an arbitrary expression, terminated by a semicolon (`;`) Each value can be an arbitrary expression, terminated by a semicolon (`;`)
An attribute name is a string without context, and is denoted by a [name] (an [identifier](./identifiers.md#identifiers) or [string literal](#string-literal)). An attribute name is a string without context, and is denoted by a [name] (an [identifier](./identifiers.md#identifiers) or [string literal](string-literals.md)).
[name]: ./identifiers.md#names [name]: ./identifiers.md#names

View file

@ -45,7 +45,7 @@ The function [`builtins.isBool`](builtins.md#builtins-isBool) can be used to det
A _string_ in the Nix language is an immutable, finite-length sequence of bytes, along with a [string context](string-context.md). A _string_ in the Nix language is an immutable, finite-length sequence of bytes, along with a [string context](string-context.md).
Nix does not assume or support working natively with character encodings. Nix does not assume or support working natively with character encodings.
String values without string context can be expressed as [string literals](syntax.md#string-literal). String values without string context can be expressed as [string literals](string-literals.md).
The function [`builtins.isString`](builtins.md#builtins-isString) can be used to determine if a value is a string. The function [`builtins.isString`](builtins.md#builtins-isString) can be used to determine if a value is a string.
### Path {#type-path} ### Path {#type-path}