Using Renovate bot to detect available upgrades, there are some dependencies which couldn’t be tracked to repository managers and are only exposed on web pages.

Renovate allows to use a custom “plain” datasource for that:

  1. It fetches the HTML
  2. It transforms it to JSON, with each line mapped to a “version”: {"releases": [{"version": "raw line 1 content"}, {"version": "next line..."}]}
  3. It applies JSONata query expected to filter/transform the JSON into the same schema, but with each version being an actual version
  4. Usual version upgrades follow

Those datasources will be quite fragile because page structure can change quite often, but in practice it is not too bad, and Renovate will add a warning on every merge request it creates if a custom datasource causes an error.

This is configured in renovate.json by a customDatasources entry, for example to fetch Digital Ocean Kubernetes versions, parsing the RSS (slightly more stable than changelog). The query iterates in releases, filters lines that do not match a release (<title> in my case) and on filtered lines extract just the version with a regular expression.

{
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
  "extends": [
    "config:base"
  ],
  "customDatasources": {
    "doks": {
      "defaultRegistryUrlTemplate": "https://docs.digitalocean.com/products/kubernetes/details/changelog/index.xml",
      "format": "plain",
      "transformTemplates": [
        "{\"releases\": $map($.releases[version ~> /<title>(\\d+\\.\\d+\\.\\d+(?:\\.|-)do(?:\\.|-)\\d+)<\\/title>/], function ($v) { {\"version\": $replace($v.version, /<title>(\\d+\\.\\d+\\.\\d+(?:\\.|-)do(?:\\.|-)\\d+)<\\/title>/, \"$1\")} })}"
      ]
    }
  }
}

This can then be enabled on customManagers, for example detecting a regular expression pattern in Terraform resources that links the occurrence to the custom datasource name (custom datasource doks => datasource custom.doks):

resource "digitalocean_kubernetes_cluster" "cluster" {
  # renovate: datasource=custom.doks depName=doks versioning=regex:^(?<major>\d+)\.(?<minor>\d+)\.(?<patch>\d+)(?:\.|-)do(?:\.|-)(?<build>\d+)$
  version = "1.31.1-do.3"
}
{
  "customManagers": [{
    "customType": "regex",
    "description": "Update Terraform resources properties",
    "fileMatch": [
      "\\.tf$"
    ],
    "matchStrings": [
      "# renovate: datasource=(?<datasource>[a-z-.]+?) depName=(?<depName>.+?) versioning=(?<versioning>[^ ]+?)\\s+[a-z_]+\\s*=\\s*\"(?<currentValue>.+?)\""
    ]
  }],
}

Understanding how plain text parsing actually works was time consuming, and the query needs to be updated whenever page structure changes. But with small shell scripts it becomes easy to troubleshoot it in JSONata explorer:

  • Download the page in expected JSON format: curl https://example.com/page | jq --raw-input '.' | jq --slurp '{"releases": (. | map({"version": .}))}'
  • Extract the JSON-unescaped query from configuration: cat renovate.json | jq -r '.customDatasources.eks.transformTemplates[0]'
  • JSON-escape the query once modified to paste it in configuration: echo '%updated query%' | jq --raw-input '.'

Scripts use curl and jq.