depot/third_party/nixpkgs/pkgs/development/tools/build-managers/gradle/README.md

# Gradle Setup Hook

## Introduction

Gradle build scripts are written in a DSL, computing the list of Gradle
dependencies is a turing-complete task, not just in theory but in
practice. Fetching all of the dependencies often requires building some
native code, running some commands to check the host platform, or just
fetching some files using either JVM code or commands like `curl` or
`wget`.

This practice is widespread and isn't considered a bad practice in the
Java world, so all we can do is run Gradle to check what dependencies
end up being fetched, and allow derivation authors to apply workarounds
so they can run the code necessary for fetching the dependencies our
script doesn't fetch.

"Run Gradle to check what dependencies end up being fetched" isn't a
straightforward task. For example, Gradle usually uses Maven
repositories, which have features such as "snapshots", a way to always
use the latest version of a dependency as opposed to a fixed version.
Obviously, this is horrible for reproducibility. Additionally, Gradle
doesn't offer a way to export the list of dependency URLs and hashes (it
does in a way, but it's far from being complete, and as such is useless
for nixpkgs). Even if did, it would be annoying to use considering
fetching non-Gradle dependencies in Gradle scripts is commonplace.

That's why the setup hook uses mitm-cache, a program designed for
intercepting all HTTP requests, recording all the files that were
accessed, creating a Nix derivation with all of them, and then allowing
the Gradle derivation to access these files.

## Maven Repositories

(Reference: [Repository
Layout](https://cwiki.apache.org/confluence/display/MAVENOLD/Repository+Layout+-+Final))

Most of Gradle dependencies are fetched from Maven repositories. For
each dependency, Gradle finds the first repo where it can successfully
fetch that dependency, and uses that repo for it. Different repos might
actually return different files for the same artifact because of e.g.
pom normalization. Different repos may be used for the same artifact
even across a single package (for example, if two build scripts define
repositories in a different order).

The artifact metadata is specified in a .pom file, and the artifacts
themselves are typically .jar files. The URL format is as follows:

`<repo>/<group-id>/<artifact-id>/<base-version>/<artifact-id>-<version>[-<classifier>].<ext>`

For example:

- `https://repo.maven.apache.org/maven2/org/slf4j/slf4j-api/2.0.9/slf4j-api-2.0.9.pom`
- `https://oss.sonatype.org/content/groups/public/com/tobiasdiez/easybind/2.2.1-SNAPSHOT/easybind-2.2.1-20230117.075740-16.pom`

Where:

- `<repo>` is the repo base (`https://repo.maven.apache.org/maven2`)
- `<group-id>` is the group ID with dots replaced with slashes
  (`org.slf4j` -> `org/slf4j`)
- `<artifact-id>` is the artifact ID (`slf4j-api`)
- `<base-version>` is the artifact version (`2.0.9` for normal
  artifacts, `2.2.1-SNAPSHOT` for snapshots)
- `<version>` is the artifact version - can be either `<base-version>`
  or `<version-base>-<timestamp>-<build-num>` (`2.0.9` for normal
  artifacts, and either `2.2.1-SNAPSHOT` or `2.2.1-20230117.075740-16`
  for snapshots)
  - `<version-base>` - `<base-version>` without the `-SNAPSHOT` suffix
  - `<timestamp>` - artifact build timestamp in the `YYYYMMDD.HHMMSS`
    format (UTC)
  - `<build-num>` - a counter that's incremented by 1 for each new
    snapshot build
- `<classifier>` is an optional classifier for allowing a single .pom to
  refer to multiple .jar files. .pom files don't have classifiers, as
  they describe metadata.
- `<ext>` is the extension. .pom

Note that the artifact ID can contain `-`, so you can't extract the
artifact ID and version from just the file name.

Additionally, the files in the repository may have associated signature
files, formed by appending `.asc` to the filename, and hashsum files,
formed by appending `.md5` or `.sha1` to the filename. The signatures
are harmless, but the `.md5`/`.sha1` files are rejected.

The reasoning is as follows - consider two files `a.jar` and `b.jar`,
that have the same hash. Gradle will fetch `a.jar.sha1`, find out that
it hasn't yet downloaded a file with this hash, and then fetch `a.jar`,
and finally download `b.jar.sha1`, locate it in its cache, and then
*not* download `b.jar`. This means `b.jar` won't be stored in the MITM
cache. Then, consider that on a later invocation, the fetching order
changed, whether it was because of a running on different system,
changed behavior after a Gradle update, or any other source of
nondeterminism - `b.jar` is fetched before `a.jar`. Gradle will first
fetch `b.jar.sha1`, not find it in its cache, attempt to fetch `b.jar`,
and fail, as the cache doesn't have that file.

For the same reason, the proxy strips all checksum/etag headers. An
alternative would be to make the proxy remember previous checksums and
etags, but that would complicate the implementation - however, such a
feature can be implemented if necessary. Note that checksum/etag header
stripping is hardcoded, but `.md5/.sha1` file rejection is configured
via CLI arguments.

**Caveat**: Gradle .module files also contain file hashes, in md5, sha1,
sha256, sha512 formats. It posed no problem as of yet, but it might in
the future. If it does pose problems, the deps derivation code can be
extended to find all checksums in .module files and copy existing files
there if their hash matches.

## Snapshots

Snapshots are a way to publish the very latest, unstable version of a
dependency that constantly changes. Any project that depends on a
snapshot will depend on this rolling version, rather than a fixed
version. It's easy to understand why this is a bad idea for reproducible
builds. Still, they can be dealt with by the logic in `gradle.fetchDeps`
and `gradle.updateDeps`.

First, as you can see above, while normal artifacts have the same
`base-version` and `version`, for snapshots it usually (but not
necessarily) differs.

Second, for figuring out where to download the snapshot, Gradle consults
`maven-metadata.xml`. With that in mind...

## Maven Metadata

(Reference: [Maven
Metadata](https://maven.apache.org/repositories/metadata.html),
[Metadata](https://maven.apache.org/ref/3.9.8/maven-repository-metadata/repository-metadata.html)

Maven metadata files are called `maven-metadata.xml`.

There are three levels of metadata: "G level", "A level", "V level",
representing group, artifact, or version metadata.

G level metadata is currently unsupported. It's only used for Maven
plugins, which Gradle presumably doesn't use.

A level metadata is used for getting the version list for an artifact.
It's an xml with the following items:

- `<groupId>` - group ID
- `<artifactId>` - artifact ID
- `<versioning>`
  - `<latest>` - the very latest base version (e.g. `2.2.1-SNAPSHOT`)
  - `<release>` - the latest non-snapshot version
  - `<versions>` - the version list, each in a `<version>` tag
  - `<lastUpdated>` - the metadata update timestamp (UTC,
    `YYYYMMDDHHMMSS`)

V level metadata is used for listing the snapshot versions. It has the
following items:

- `<groupId>` - group ID
- `<artifactId>` - artifact ID
- `<versioning>`
  - `<lastUpdated>` - the metadata update timestamp (UTC,
    `YYYYMMDDHHMMSS`)
  - `<snapshot>` - info about the latest snapshot version
    - `<timestamp>` - build timestamp (UTC, `YYYYMMDD.HHMMSS`)
    - `<buildNumber>` - build number
  - `<snapshotVersions>` - the list of all available snapshot file info,
    each info is enclosed in a `<snapshotVersion>`
    - `<classifier>` - classifier (optional)
    - `<extension>` - file extension
    - `<value>` - snapshot version (as opposed to base version)
    - `<updated>` - snapshot build timestamp (UTC, `YYYYMMDDHHMMSS`)

## Lockfile Format

The mitm-cache lockfile format is described in the [mitm-cache
README](https://github.com/chayleaf/mitm-cache#readme).

The nixpkgs Gradle lockfile format is more complicated:

```json
{
  "!comment": "This is a nixpkgs Gradle dependency lockfile. For more details, refer to the Gradle section in the nixpkgs manual.",
  "!version": 1,
  "https://oss.sonatype.org/content/repositories/snapshots/com/badlogicgames/gdx-controllers": {
    "gdx-controllers#gdx-controllers-core/2.2.4-20231021.200112-6/SNAPSHOT": {

      "jar": "sha256-Gdz2J1IvDJFktUD2XeGNS0SIrOyym19X/+dCbbbe3/U=",
      "pom": "sha256-90QW/Mtz1jbDUhKjdJ88ekhulZR2a7eCaEJoswmeny4="
    },
    "gdx-controllers-core/2.2.4-SNAPSHOT/maven-metadata": {
      "xml": {
        "groupId": "com.badlogicgames.gdx-controllers"
      }
    }
  },
  "https://repo.maven.apache.org/maven2": {
    "com/badlogicgames/gdx#gdx-backend-lwjgl3/1.12.1": {
      "jar": "sha256-B3OwjHfBoHcJPFlyy4u2WJuRe4ZF/+tKh7gKsDg41o0=",
      "module": "sha256-9O7d2ip5+E6OiwN47WWxC8XqSX/mT+b0iDioCRTTyqc=",
      "pom": "sha256-IRSihaCUPC2d0QzB0MVDoOWM1DXjcisTYtnaaxR9SRo="
    }
  }
}
```

`!comment` is a human-readable description explaining what the file is,
`!version` is the lockfile version (note that while it shares the name
with mitm-cache's `!version`, they don't actually have to be in sync and
can be bumped separately).

The other keys are parts of a URL. Each URL is split into three parts.
They are joined like this: `<part1>/<part2>.<part3>`.

Some URLs may have a `#` in them. In that case, the part after `#` is
parsed as `#<artifact-id>/<version>[/SNAPSHOT][/<classifier>].<ext>` and
expanded into
`<artifact-id>/<base-version>/<artifact-id>-<version>[-<classifier>].<ext>`.

Each URL has a value associated with it. The value may be:

- an SRI hash (string)
- for `maven-metadata.xml` - an attrset containing the parts of the
  metadata that can't be generated in Nix code (e.g. `groupId`, which is
  challenging to parse from a URL because it's not always possible to
  discern where the repo base ends and the group ID begins).

`compress-deps-json.py` converts the JSON from mitm-cache format into
nixpkgs Gradle lockfile format. `fetch.nix` does the opposite.

## Security Considerations

Lockfiles won't be human-reviewed. They must be tampering-resistant.
That's why it's imperative that nobody can inject their own contents
into the lockfiles.

This is achieved in a very simple way - the `deps.json` only contains
the following:

- `maven-metadata.xml` URLs and small pieces of the contained metadata
  (most of it will be generated in Nix, i.e. the area of injection is
  minimal, and the parts that aren't generated in Nix are validated).
- artifact/other file URLs and associated hashes (Nix will complain if
  the hash doesn't match, and Gradle won't even access the URL if it
  doesn't match)

Please be mindful of the above when working on Gradle support for
nixpkgs.
Project import generated by Copybara. GitOrigin-RevId: b73c2221a46c13557b1b3be9c2070cc42cf01eb3 2024-07-27 06:49:29 +00:00			`# Gradle Setup Hook`

			`## Introduction`

			`Gradle build scripts are written in a DSL, computing the list of Gradle`
			`dependencies is a turing-complete task, not just in theory but in`
			`practice. Fetching all of the dependencies often requires building some`
			`native code, running some commands to check the host platform, or just`
			fetching some files using either JVM code or commands like `curl` or
			`wget`.

			`This practice is widespread and isn't considered a bad practice in the`
			`Java world, so all we can do is run Gradle to check what dependencies`
			`end up being fetched, and allow derivation authors to apply workarounds`
			`so they can run the code necessary for fetching the dependencies our`
			`script doesn't fetch.`

			`"Run Gradle to check what dependencies end up being fetched" isn't a`
			`straightforward task. For example, Gradle usually uses Maven`
			`repositories, which have features such as "snapshots", a way to always`
			`use the latest version of a dependency as opposed to a fixed version.`
			`Obviously, this is horrible for reproducibility. Additionally, Gradle`
			`doesn't offer a way to export the list of dependency URLs and hashes (it`
			`does in a way, but it's far from being complete, and as such is useless`
			`for nixpkgs). Even if did, it would be annoying to use considering`
			`fetching non-Gradle dependencies in Gradle scripts is commonplace.`

			`That's why the setup hook uses mitm-cache, a program designed for`
			`intercepting all HTTP requests, recording all the files that were`
			`accessed, creating a Nix derivation with all of them, and then allowing`
			`the Gradle derivation to access these files.`

			`## Maven Repositories`

			`(Reference: [Repository`
			`Layout](https://cwiki.apache.org/confluence/display/MAVENOLD/Repository+Layout+-+Final))`

			`Most of Gradle dependencies are fetched from Maven repositories. For`
			`each dependency, Gradle finds the first repo where it can successfully`
			`fetch that dependency, and uses that repo for it. Different repos might`
			`actually return different files for the same artifact because of e.g.`
			`pom normalization. Different repos may be used for the same artifact`
			`even across a single package (for example, if two build scripts define`
			`repositories in a different order).`

			`The artifact metadata is specified in a .pom file, and the artifacts`
			`themselves are typically .jar files. The URL format is as follows:`

			`<repo>/<group-id>/<artifact-id>/<base-version>/<artifact-id>-<version>[-<classifier>].<ext>`

			`For example:`

			- `https://repo.maven.apache.org/maven2/org/slf4j/slf4j-api/2.0.9/slf4j-api-2.0.9.pom`
			- `https://oss.sonatype.org/content/groups/public/com/tobiasdiez/easybind/2.2.1-SNAPSHOT/easybind-2.2.1-20230117.075740-16.pom`

			`Where:`

			- `<repo>` is the repo base (`https://repo.maven.apache.org/maven2`)
			- `<group-id>` is the group ID with dots replaced with slashes
			(`org.slf4j` -> `org/slf4j`)
			- `<artifact-id>` is the artifact ID (`slf4j-api`)
			- `<base-version>` is the artifact version (`2.0.9` for normal
			artifacts, `2.2.1-SNAPSHOT` for snapshots)
			- `<version>` is the artifact version - can be either `<base-version>`
			or `<version-base>-<timestamp>-<build-num>` (`2.0.9` for normal
			artifacts, and either `2.2.1-SNAPSHOT` or `2.2.1-20230117.075740-16`
			`for snapshots)`
			- `<version-base>` - `<base-version>` without the `-SNAPSHOT` suffix
			- `<timestamp>` - artifact build timestamp in the `YYYYMMDD.HHMMSS`
			`format (UTC)`
			- `<build-num>` - a counter that's incremented by 1 for each new
			`snapshot build`
			- `<classifier>` is an optional classifier for allowing a single .pom to
			`refer to multiple .jar files. .pom files don't have classifiers, as`
			`they describe metadata.`
			- `<ext>` is the extension. .pom

			Note that the artifact ID can contain `-`, so you can't extract the
			`artifact ID and version from just the file name.`

			`Additionally, the files in the repository may have associated signature`
			files, formed by appending `.asc` to the filename, and hashsum files,
			formed by appending `.md5` or `.sha1` to the filename. The signatures
			are harmless, but the `.md5`/`.sha1` files are rejected.

			The reasoning is as follows - consider two files `a.jar` and `b.jar`,
			that have the same hash. Gradle will fetch `a.jar.sha1`, find out that
			it hasn't yet downloaded a file with this hash, and then fetch `a.jar`,
			and finally download `b.jar.sha1`, locate it in its cache, and then
			not download `b.jar`. This means `b.jar` won't be stored in the MITM
			`cache. Then, consider that on a later invocation, the fetching order`
			`changed, whether it was because of a running on different system,`
			`changed behavior after a Gradle update, or any other source of`
			nondeterminism - `b.jar` is fetched before `a.jar`. Gradle will first
			fetch `b.jar.sha1`, not find it in its cache, attempt to fetch `b.jar`,
			`and fail, as the cache doesn't have that file.`

			`For the same reason, the proxy strips all checksum/etag headers. An`
			`alternative would be to make the proxy remember previous checksums and`
			`etags, but that would complicate the implementation - however, such a`
			`feature can be implemented if necessary. Note that checksum/etag header`
			stripping is hardcoded, but `.md5/.sha1` file rejection is configured
			`via CLI arguments.`

			`Caveat: Gradle .module files also contain file hashes, in md5, sha1,`
			`sha256, sha512 formats. It posed no problem as of yet, but it might in`
			`the future. If it does pose problems, the deps derivation code can be`
			`extended to find all checksums in .module files and copy existing files`
			`there if their hash matches.`

			`## Snapshots`

			`Snapshots are a way to publish the very latest, unstable version of a`
			`dependency that constantly changes. Any project that depends on a`
			`snapshot will depend on this rolling version, rather than a fixed`
			`version. It's easy to understand why this is a bad idea for reproducible`
			builds. Still, they can be dealt with by the logic in `gradle.fetchDeps`
			and `gradle.updateDeps`.

			`First, as you can see above, while normal artifacts have the same`
			`base-version` and `version`, for snapshots it usually (but not
			`necessarily) differs.`

			`Second, for figuring out where to download the snapshot, Gradle consults`
			`maven-metadata.xml`. With that in mind...

			`## Maven Metadata`

			`(Reference: [Maven`
			`Metadata](https://maven.apache.org/repositories/metadata.html),`
			`[Metadata](https://maven.apache.org/ref/3.9.8/maven-repository-metadata/repository-metadata.html)`

			Maven metadata files are called `maven-metadata.xml`.

			`There are three levels of metadata: "G level", "A level", "V level",`
			`representing group, artifact, or version metadata.`

			`G level metadata is currently unsupported. It's only used for Maven`
			`plugins, which Gradle presumably doesn't use.`

			`A level metadata is used for getting the version list for an artifact.`
			`It's an xml with the following items:`

			- `<groupId>` - group ID
			- `<artifactId>` - artifact ID
			- `<versioning>`
			- `<latest>` - the very latest base version (e.g. `2.2.1-SNAPSHOT`)
			- `<release>` - the latest non-snapshot version
			- `<versions>` - the version list, each in a `<version>` tag
			- `<lastUpdated>` - the metadata update timestamp (UTC,
			`YYYYMMDDHHMMSS`)

			`V level metadata is used for listing the snapshot versions. It has the`
			`following items:`

			- `<groupId>` - group ID
			- `<artifactId>` - artifact ID
			- `<versioning>`
			- `<lastUpdated>` - the metadata update timestamp (UTC,
			`YYYYMMDDHHMMSS`)
			- `<snapshot>` - info about the latest snapshot version
			- `<timestamp>` - build timestamp (UTC, `YYYYMMDD.HHMMSS`)
			- `<buildNumber>` - build number
			- `<snapshotVersions>` - the list of all available snapshot file info,
			each info is enclosed in a `<snapshotVersion>`
			- `<classifier>` - classifier (optional)
			- `<extension>` - file extension
			- `<value>` - snapshot version (as opposed to base version)
			- `<updated>` - snapshot build timestamp (UTC, `YYYYMMDDHHMMSS`)

			`## Lockfile Format`

			`The mitm-cache lockfile format is described in the [mitm-cache`
			`README](https://github.com/chayleaf/mitm-cache#readme).`

			`The nixpkgs Gradle lockfile format is more complicated:`

			```json
			`{`
			`"!comment": "This is a nixpkgs Gradle dependency lockfile. For more details, refer to the Gradle section in the nixpkgs manual.",`
			`"!version": 1,`
			`"https://oss.sonatype.org/content/repositories/snapshots/com/badlogicgames/gdx-controllers": {`
			`"gdx-controllers#gdx-controllers-core/2.2.4-20231021.200112-6/SNAPSHOT": {`

			`"jar": "sha256-Gdz2J1IvDJFktUD2XeGNS0SIrOyym19X/+dCbbbe3/U=",`
			`"pom": "sha256-90QW/Mtz1jbDUhKjdJ88ekhulZR2a7eCaEJoswmeny4="`
			`},`
			`"gdx-controllers-core/2.2.4-SNAPSHOT/maven-metadata": {`
			`"xml": {`
			`"groupId": "com.badlogicgames.gdx-controllers"`
			`}`
			`}`
			`},`
			`"https://repo.maven.apache.org/maven2": {`
			`"com/badlogicgames/gdx#gdx-backend-lwjgl3/1.12.1": {`
			`"jar": "sha256-B3OwjHfBoHcJPFlyy4u2WJuRe4ZF/+tKh7gKsDg41o0=",`
			`"module": "sha256-9O7d2ip5+E6OiwN47WWxC8XqSX/mT+b0iDioCRTTyqc=",`
			`"pom": "sha256-IRSihaCUPC2d0QzB0MVDoOWM1DXjcisTYtnaaxR9SRo="`
			`}`
			`}`
			`}`
			```

			`!comment` is a human-readable description explaining what the file is,
			`!version` is the lockfile version (note that while it shares the name
			with mitm-cache's `!version`, they don't actually have to be in sync and
			`can be bumped separately).`

			`The other keys are parts of a URL. Each URL is split into three parts.`
			They are joined like this: `<part1>/<part2>.<part3>`.

			Some URLs may have a `#` in them. In that case, the part after `#` is
			parsed as `#<artifact-id>/<version>[/SNAPSHOT][/<classifier>].<ext>` and
			`expanded into`
			`<artifact-id>/<base-version>/<artifact-id>-<version>[-<classifier>].<ext>`.

			`Each URL has a value associated with it. The value may be:`

			`- an SRI hash (string)`
			- for `maven-metadata.xml` - an attrset containing the parts of the
			metadata that can't be generated in Nix code (e.g. `groupId`, which is
			`challenging to parse from a URL because it's not always possible to`
			`discern where the repo base ends and the group ID begins).`

			`compress-deps-json.py` converts the JSON from mitm-cache format into
			nixpkgs Gradle lockfile format. `fetch.nix` does the opposite.

			`## Security Considerations`

			`Lockfiles won't be human-reviewed. They must be tampering-resistant.`
			`That's why it's imperative that nobody can inject their own contents`
			`into the lockfiles.`

			This is achieved in a very simple way - the `deps.json` only contains
			`the following:`

			- `maven-metadata.xml` URLs and small pieces of the contained metadata
			`(most of it will be generated in Nix, i.e. the area of injection is`
			`minimal, and the parts that aren't generated in Nix are validated).`
			`- artifact/other file URLs and associated hashes (Nix will complain if`
			`the hash doesn't match, and Gradle won't even access the URL if it`
			`doesn't match)`

			`Please be mindful of the above when working on Gradle support for`
			`nixpkgs.`