Message ID | 20240927190650.128263-1-alexander.heinisch@siemens.com |
---|---|
Headers | show |
Series | Added support for apt caching | expand |
On Fri, 2024-09-27 at 21:06 +0200, alexander.heinisch via isar-users wrote: > From: Alexander Heinisch <alexander.heinisch@siemens.com> > > Added DISTRO_APT_SNAPSHOT_PREMIRROR_BASE to specify the base-url of > the mirror used. > > This enables the use of local caches like apt-cache-ng when using > isar's snapshot facility. We also added Kconfig support for such > and a small (non-exhaustive) documentation how to use it in `doc/apt- > cache.md`. > > > Alexander Heinisch (3): > Added DISTRO_APT_SNAPSHOT_PREMIRROR_BASE to specify the base-url of > the mirror used. > Added Kconfig for cached snapshot mirror > Added doc to setup apt cache. > > doc/apt-cache.md | 55 > +++++++++++++++++++++++++ > kas/opt/Kconfig | 25 ++++++++++- > kas/opt/mirror-snapshot.yaml | 3 ++ > meta-isar/conf/distro/ubuntu-common.inc | 3 +- > meta/conf/distro/debian-common.conf | 3 +- > 5 files changed, 86 insertions(+), 3 deletions(-) > create mode 100644 doc/apt-cache.md > > -- > 2.43.0 > Hello all. The series has passed CI multiple times and is ready to be merged, but p1 is not still addressed. Is v2 expected?
On 08.10.24 08:43, Heinisch, Alexander (FT RPD CED SES-AT) wrote: >> Hello all. >> >> The series has passed CI multiple times and is ready to be merged, but >> p1 is not still addressed. Is v2 expected? > > Repost for new readers: > >>> This enables the use of local caches like apt-cache-ng when using >>> isar's snapshot facility. >>> e.g. DISTRO_APT_SNAPSHOT_PREMIRROR_BASE=localhost:3142/snapshot.debian.org >> >> Why "BASE"? Also with regular PREMIRROR, you do not need to rewrite the >> whole URL, thus this is also with some "BASE" semantic. Just trying to >> make the name shorter. > > I was struggeling with the naming, too, and still am finding a better one! > What about `ISAR_APT_SNAPSHOT_MIRROR`? > > Any suggestions for naming welcome :-) > MIRROR = used as fallback if official URL does not work PREMIRROR = tried before official URL, falling back to that one on failure Jan
On Tue, 2024-10-08 at 14:38 +0200, 'Jan Kiszka' via isar-users wrote: > On 08.10.24 08:43, Heinisch, Alexander (FT RPD CED SES-AT) wrote: > > > Hello all. > > > > > > The series has passed CI multiple times and is ready to be > > > merged, but > > > p1 is not still addressed. Is v2 expected? > > > > Repost for new readers: > > > > > > This enables the use of local caches like apt-cache-ng when > > > > using > > > > isar's snapshot facility. > > > > e.g. > > > > DISTRO_APT_SNAPSHOT_PREMIRROR_BASE=localhost:3142/snapshot.debi > > > > an.org > > > > > > Why "BASE"? Also with regular PREMIRROR, you do not need to > > > rewrite the > > > whole URL, thus this is also with some "BASE" semantic. Just > > > trying to > > > make the name shorter. > > > > I was struggeling with the naming, too, and still am finding a > > better one! > > What about `ISAR_APT_SNAPSHOT_MIRROR`? > > > > Any suggestions for naming welcome :-) > > > > MIRROR = used as fallback if official URL does not work > PREMIRROR = tried before official URL, falling back to that one on > failure Hi, this series is much needed to work with the still unreliable snapshot mirrors. @Alexander: Do you plan to send a v2? At the same time I'm working on adding internal apt-cacher-ng support to kas to let the build pass the initial bootstrapping. Best regards, Felix > > Jan > > -- > Siemens AG, Technology > Linux Expert Center >
> Hi, this series is much needed to work with the still unreliable snapshot mirrors. > > @Alexander: Do you plan to send a v2? > > At the same time I'm working on adding internal apt-cacher-ng support to kas to let the build pass the initial bootstrapping. > > Best regards, > Felix Hi Felix Thank you for coming back. Even when using apt-cacher-ng index files oftentimes got updated from snapshot.debian.org which caused problems when our company was on a blacklist for some time again. Unfortunately, I didn't find the time to analyze why that was the case. I did a tcpdump during one of our builds, but didn't analyze it for 2 weeks or so :-( But I suspect either apt client sends a reload request or the expiry date returned from upstream is to limited. While this could be relevant when fetching packages from "main" mirrors, it should not have much impact on snapshot mirrors. To mitigate that issue, since then we switched to squid as a proxy for snapshot.debian.org Squid has an offline mode, which says, no matter what happens, cach entries once seen are never updated upstream. As stated above, while this could have drastic impacts when using main mirrors, it shouldn't cause issues on snapshots, by definition. Thus, I dropped apt-cacher-ng in our project in favour of squid. I also prepared documentation for such, but during preparing the patch, I was not sure if that is worth a separate doc/ file or if we should merge that with doc/offline.md. I was struggling with that decision since it does not really solve an offline case, as it only caches packages already seen once, and further, only solves the offline case for apt and not for other sources like git, ... What is your opinion? BR Alexander PS: Appended the patch, I was referring to: From cf64db474c2f2477633bfe3fd111156d2ac7495a Mon Sep 17 00:00:00 2001 From: Alexander Heinisch <alexander.heinisch@siemens.com> Date: Thu, 24 Oct 2024 20:06:23 +0200 Subject: [PATCH] doc: Added setup guide for squid as an caching proxy for apt (snapshot) mirrors. Signed-off-by: Alexander Heinisch <alexander.heinisch@siemens.com> --- doc/apt-caching-proxy.md | 142 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 142 insertions(+) create mode 100644 doc/apt-caching-proxy.md diff --git a/doc/apt-caching-proxy.md b/doc/apt-caching-proxy.md new file mode 100644 index 00000000..2a23a313 --- /dev/null +++ b/doc/apt-caching-proxy.md @@ -0,0 +1,142 @@ +# Setup Squid as APT Caching Proxy + +Limited download bandwitdth oftentimes is an issue, and increases the build times drastically. Further, large corporate networks could get rate limited by debian mirrors, as many people / pipelines / aso. fetch huge amounts of packets from there. + +In such cases a proxy caching the packages is quite useful as it reduces download times and reduces pressure on debian mirrors. + +## Install Squid Proxy +``` +apt install squid +``` + +## Configure Proxy for Caching (with APT in mind) + +1. /etc/squid/squid.conf +This file contains the main configuration for `squid`. +We configure it to listen to port `4242` and cache all requests from sites listed in `/etc/squid/mirror-dstdomain.acl`. Further, to enable, offline usecases (or usecases where your ip got temporarily blacklisted by `snapshot.debian.org` or similar) we set `offline_mode on` +to not fetch already cached packages from upstream. + +> Note: While `offline_mode on` is totally fine for `snapshot.debian.org` when using a timestamp to fix your package archive version, this could cause unintended behaviour (most probably outdated packages) when used against a non archive mirror. + +> Hint: If you are planning to work against non archive mirrors, and you are not sure, it's recommended to set `offline_mode off` and probably tweak cache behaviour with a `refresh_pattern`. + +### /etc/squid/squid.conf: +``` +# File: /etc/squid/squid.conf + +# default to a different port than stock squid +http_port 4242 + +# user visible name +visible_hostname squid-apt-caching-proxy + +# do not fetch already cached packages from upstream +offline_mode on + +# we need a big cache, some debs are huge +maximum_object_size 512 MB + +# increase available disk space for cache dir to 40G +cache_dir aufs /var/cache/squid 40000 16 256 + +# logs +access_log /var/log/squid/access.log +cache_log /var/log/squid/cache.log +cache_store_log /var/log/squid/store.log + +# tweaks to speed things up +cache_mem 256 MB +maximum_object_size_in_memory 10240 KB + +# only allow ports we trust +acl Safe_ports port 80 +acl Safe_ports port 443 + +http_access deny !Safe_ports + +# Deny access to blacklisted sites +acl blockedpkgs urlpath_regex "/etc/squid/pkg-blacklist-regexp.acl" +http_access deny blockedpkgs + +# List of domains to cache +acl to_archive_mirrors dstdomain "/etc/squid/mirror-dstdomain.acl" +# don't cache domains not listed in the mirrors file +cache deny !to_archive_mirrors + +# Allow access to the proxy only from networks listed in allowed-networks-src.acl +acl allowed_networks src "/etc/squid/allowed-networks-src.acl" +http_access allow allowed_networks + +# And finally deny all other access to this proxy +http_access deny all +``` + +### /etc/squid/mirror-dstdomain.acl: +``` +# File: /etc/squid/mirror-dstdomain.acl + +snapshot.debian.org +``` + +### /etc/squid/pkg-blacklist-regexp.acl: +``` +# File: /etc/squid/pkg-blacklist-regexp.acl +# Empty for now +``` + +### /etc/squid/allowed-networks-src.acl: +``` +# File: /etc/squid/allowed-networks-src.acl + +# network sources that you want to allow access to the cache + +# private networks +10.0.0.0/8 +172.16.0.0/12 +192.168.0.0/16 +127.0.0.1 + +# IPv6 private addresses +fe80::/64 +::1/128 + +# IPv6 mesh local +fd00::/8 +``` + +Restart `systemctl restart squid` + +## Use the Proxy in ISAR Build System + +To forward the proxy settings to apt inside the ISAR build system just export `http_proxy` +as follows: + +``` +export http_proxy=http://<proxy-server-ip>:4242 +``` + +> Hint: Consider also setting `https_proxy`. + +### Validation + +The first time you build your image the cache will fetch all packages from upstream. +During that phase you will see log entries, like + +``` +... TCP_MISS/200 1574478 GET http://snapshot.debian.org/file/7cfaf... +``` +in `/var/log/squid/access.log`. + +From that time on for existing packages only + +``` +... TCP_OFFLINE_HIT/200 1574480 GET http://snapshot.debian.org/file/7cfaf... +... TCP_MEM_HIT/200 1574480 GET http://snapshot.debian.org/file/7cfaf... +``` + +> Note: When you add new packages to your image, these have to be fetched first, so you will encounter `TCP_MISS`es whenever you add packages you didn't fetched before. Same holds true when upgrading the snapshot timestamp (`ISAR_APT_SNAPSHOT_TIMESTAMP` or `ISAR_APT_SNAPSHOT_DATE`). + +> Hint: You can observe your cache misses using: +> ``` +> tail -f /var/log/squid/access.log | grep -e TCP_MEM_HIT -e TCP_OFFLINE_HIT -v +> ```
On Thu, 2024-10-31 at 15:40 +0000, Heinisch, Alexander (FT RPD CED SES- AT) wrote: > > Hi, this series is much needed to work with the still unreliable > > snapshot mirrors. > > > > @Alexander: Do you plan to send a v2? > > > > At the same time I'm working on adding internal apt-cacher-ng > > support to kas to let the build pass the initial bootstrapping. > > > > Best regards, > > Felix > > Hi Felix > > Thank you for coming back. > > Even when using apt-cacher-ng index files oftentimes got updated from > snapshot.debian.org which caused problems when our company was on a > blacklist for some time again. I know, but this is partially also due to bugs in the apt-cacher-ng implementation. Today it completely broke after an upstream change on snapshot.d.o requiring a backport of the fix in [1]. I just sent an email to the snapshot ML requesting the backport. [1] https://bugs-devel.debian.org/cgi-bin/bugreport.cgi?bug=1074404 > > Unfortunately, I didn't find the time to analyze why that was the > case. > I did a tcpdump during one of our builds, but didn't analyze it for 2 > weeks or so :-( > > But I suspect either apt client sends a reload request or the expiry > date > returned from upstream is to limited. It could also simply due to incorrect parsing of the expiry dates in apt-cacher-ng. Recently there were a lot of fixes regarding time parsing. Tricky to debug, though... > While this could be relevant when fetching packages from "main" > mirrors, > it should not have much impact on snapshot mirrors. > > To mitigate that issue, since then we switched to squid as a proxy > for snapshot.debian.org > Squid has an offline mode, which says, no matter what happens, cach > entries once > seen are never updated upstream. As stated above, while this could > have drastic > impacts when using main mirrors, it shouldn't cause issues on > snapshots, by definition. > > Thus, I dropped apt-cacher-ng in our project in favour of squid. > I also prepared documentation for such, but during preparing the > patch, I > was not sure if that is worth a separate doc/ file or if we should > merge that with > doc/offline.md. I was struggling with that decision since it does not > really > solve an offline case, as it only caches packages already seen once, > and further, > only solves the offline case for apt and not for other sources like > git, ... Actually I'm more interested in having stable builds against snapshot.d.o, not so much in 100% offline builds. The situation upstream also got a bit better by rate-limiting on HTTP basis instead of TCP basis, so clients (including apt-cacher-ng and squid) should be able to correctly backoff. But I also did not check if the rate- limiting is implemented correctly, so that the client knows when to retry... Anyways, we have a dilemma here: We need a stable baseline to build against (both due to product requirements, as well as for the SState cache). But currently it is REALLY hard to get this working in CI builds. > > What is your opinion? Probably we need both, until it is not clear which solution is long- term stable. Felix > > BR Alexander > > PS: Appended the patch, I was referring to: > > From cf64db474c2f2477633bfe3fd111156d2ac7495a Mon Sep 17 00:00:00 > 2001 > From: Alexander Heinisch <alexander.heinisch@siemens.com> > Date: Thu, 24 Oct 2024 20:06:23 +0200 > Subject: [PATCH] doc: Added setup guide for squid as an caching proxy > for apt > (snapshot) mirrors. > > Signed-off-by: Alexander Heinisch <alexander.heinisch@siemens.com> > --- > doc/apt-caching-proxy.md | 142 > +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 142 insertions(+) > create mode 100644 doc/apt-caching-proxy.md > > diff --git a/doc/apt-caching-proxy.md b/doc/apt-caching-proxy.md > new file mode 100644 > index 00000000..2a23a313 > --- /dev/null > +++ b/doc/apt-caching-proxy.md > @@ -0,0 +1,142 @@ > +# Setup Squid as APT Caching Proxy > + > +Limited download bandwitdth oftentimes is an issue, and increases > the build times drastically. Further, large corporate networks could > get rate limited by debian mirrors, as many people / pipelines / aso. > fetch huge amounts of packets from there. > + > +In such cases a proxy caching the packages is quite useful as it > reduces download times and reduces pressure on debian mirrors. > + > +## Install Squid Proxy > +``` > +apt install squid > +``` > + > +## Configure Proxy for Caching (with APT in mind) > + > +1. /etc/squid/squid.conf > +This file contains the main configuration for `squid`. > +We configure it to listen to port `4242` and cache all requests from > sites listed in `/etc/squid/mirror-dstdomain.acl`. Further, to > enable, offline usecases (or usecases where your ip got temporarily > blacklisted by `snapshot.debian.org` or similar) we set `offline_mode > on` > +to not fetch already cached packages from upstream. > + > +> Note: While `offline_mode on` is totally fine for > `snapshot.debian.org` when using a timestamp to fix your package > archive version, this could cause unintended behaviour (most probably > outdated packages) when used against a non archive mirror. > + > +> Hint: If you are planning to work against non archive mirrors, and > you are not sure, it's recommended to set `offline_mode off` and > probably tweak cache behaviour with a `refresh_pattern`. > + > +### /etc/squid/squid.conf: > +``` > +# File: /etc/squid/squid.conf > + > +# default to a different port than stock squid > +http_port 4242 > + > +# user visible name > +visible_hostname squid-apt-caching-proxy > + > +# do not fetch already cached packages from upstream > +offline_mode on > + > +# we need a big cache, some debs are huge > +maximum_object_size 512 MB > + > +# increase available disk space for cache dir to 40G > +cache_dir aufs /var/cache/squid 40000 16 256 > + > +# logs > +access_log /var/log/squid/access.log > +cache_log /var/log/squid/cache.log > +cache_store_log /var/log/squid/store.log > + > +# tweaks to speed things up > +cache_mem 256 MB > +maximum_object_size_in_memory 10240 KB > + > +# only allow ports we trust > +acl Safe_ports port 80 > +acl Safe_ports port 443 > + > +http_access deny !Safe_ports > + > +# Deny access to blacklisted sites > +acl blockedpkgs urlpath_regex "/etc/squid/pkg-blacklist-regexp.acl" > +http_access deny blockedpkgs > + > +# List of domains to cache > +acl to_archive_mirrors dstdomain "/etc/squid/mirror-dstdomain.acl" > +# don't cache domains not listed in the mirrors file > +cache deny !to_archive_mirrors > + > +# Allow access to the proxy only from networks listed in allowed- > networks-src.acl > +acl allowed_networks src "/etc/squid/allowed-networks-src.acl" > +http_access allow allowed_networks > + > +# And finally deny all other access to this proxy > +http_access deny all > +``` > + > +### /etc/squid/mirror-dstdomain.acl: > +``` > +# File: /etc/squid/mirror-dstdomain.acl > + > +snapshot.debian.org > +``` > + > +### /etc/squid/pkg-blacklist-regexp.acl: > +``` > +# File: /etc/squid/pkg-blacklist-regexp.acl > +# Empty for now > +``` > + > +### /etc/squid/allowed-networks-src.acl: > +``` > +# File: /etc/squid/allowed-networks-src.acl > + > +# network sources that you want to allow access to the cache > + > +# private networks > +10.0.0.0/8 > +172.16.0.0/12 > +192.168.0.0/16 > +127.0.0.1 > + > +# IPv6 private addresses > +fe80::/64 > +::1/128 > + > +# IPv6 mesh local > +fd00::/8 > +``` > + > +Restart `systemctl restart squid` > + > +## Use the Proxy in ISAR Build System > + > +To forward the proxy settings to apt inside the ISAR build system > just export `http_proxy` > +as follows: > + > +``` > +export http_proxy=http://<proxy-server-ip>:4242 > +``` > + > +> Hint: Consider also setting `https_proxy`. > + > +### Validation > + > +The first time you build your image the cache will fetch all > packages from upstream. > +During that phase you will see log entries, like > + > +``` > +... TCP_MISS/200 1574478 GET > http://snapshot.debian.org/file/7cfaf... > +``` > +in `/var/log/squid/access.log`. > + > +From that time on for existing packages only > + > +``` > +... TCP_OFFLINE_HIT/200 1574480 GET > http://snapshot.debian.org/file/7cfaf... > +... TCP_MEM_HIT/200 1574480 GET > http://snapshot.debian.org/file/7cfaf... > +``` > + > +> Note: When you add new packages to your image, these have to be > fetched first, so you will encounter `TCP_MISS`es whenever you add > packages you didn't fetched before. Same holds true when upgrading > the snapshot timestamp (`ISAR_APT_SNAPSHOT_TIMESTAMP` or > `ISAR_APT_SNAPSHOT_DATE`). > + > +> Hint: You can observe your cache misses using: > +> ``` > +> tail -f /var/log/squid/access.log | grep -e TCP_MEM_HIT -e > TCP_OFFLINE_HIT -v > +> ``` > -- > 2.43.0 >
> On Thu, 2024-10-31 at 15:40 +0000, Heinisch, Alexander (FT RPD CED SES- > AT) wrote: > > > Hi, this series is much needed to work with the still unreliable > > > snapshot mirrors. > > > > > > @Alexander: Do you plan to send a v2? > > > > > > At the same time I'm working on adding internal apt-cacher-ng > > > support to kas to let the build pass the initial bootstrapping. > > > > > > Best regards, > > > Felix > > > > Hi Felix > > > > Thank you for coming back. > > > > Even when using apt-cacher-ng index files oftentimes got updated from > > snapshot.debian.org which caused problems when our company was on a > > blacklist for some time again. > > I know, but this is partially also due to bugs in the apt-cacher-ng implementation. Today it completely broke after an upstream change on snapshot.d.o requiring a backport of the fix in [1]. I just sent an email to the snapshot ML requesting the backport. Yes, apt-cacher-ng by far is not the most stable software! > > [1] https://bugs-devel.debian.org/cgi-bin/bugreport.cgi?bug=1074404 > > > > > Unfortunately, I didn't find the time to analyze why that was the > > case. > > I did a tcpdump during one of our builds, but didn't analyze it for 2 > > weeks or so :-( > > > > But I suspect either apt client sends a reload request or the expiry > > date returned from upstream is to limited. > > It could also simply due to incorrect parsing of the expiry dates in apt-cacher-ng. Recently there were a lot of fixes regarding time parsing. Tricky to debug, though... > > > While this could be relevant when fetching packages from "main" > > mirrors, > > it should not have much impact on snapshot mirrors. > > > > To mitigate that issue, since then we switched to squid as a proxy for > > snapshot.debian.org Squid has an offline mode, which says, no matter > > what happens, cach entries once seen are never updated upstream. As > > stated above, while this could have drastic impacts when using main > > mirrors, it shouldn't cause issues on snapshots, by definition. > > > > Thus, I dropped apt-cacher-ng in our project in favour of squid. > > I also prepared documentation for such, but during preparing the > > patch, I was not sure if that is worth a separate doc/ file or if we > > should merge that with doc/offline.md. I was struggling with that > > decision since it does not really solve an offline case, as it only > > caches packages already seen once, and further, only solves the > > offline case for apt and not for other sources like git, ... > > Actually I'm more interested in having stable builds against snapshot.d.o, not so much in 100% offline builds. The situation upstream also got a bit better by rate-limiting on HTTP basis instead of TCP basis, so clients (including apt-cacher-ng and squid) should be able to correctly backoff. But I also did not check if the rate- limiting is implemented correctly, so that the client knows when to retry... Stable builds is prio 1 for us. But, also when using squid as a proxy ("non-offline mode") we experienced some troubles with snapshot.d.o when downloading the index file [1] Even though we were already caching most of the data, snapshot.debian.org denied downloading that file, most probably because whole company network was blacklisted. Thus, breaking our build, again. That was the reason why we decided to go with squid's "offline-mode". [1] http://snapshot.debian.org/archive/debian/20240904T000000Z/dists/bookworm/InRelease > > Anyways, we have a dilemma here: We need a stable baseline to build against (both due to product requirements, as well as for the SState cache). But currently it is REALLY hard to get this working in CI builds. > > > > > What is your opinion? > > Probably we need both, until it is not clear which solution is long- term stable. > > Felix BR Alexander
From: Alexander Heinisch <alexander.heinisch@siemens.com> Added DISTRO_APT_SNAPSHOT_PREMIRROR_BASE to specify the base-url of the mirror used. This enables the use of local caches like apt-cache-ng when using isar's snapshot facility. We also added Kconfig support for such and a small (non-exhaustive) documentation how to use it in `doc/apt-cache.md`. Alexander Heinisch (3): Added DISTRO_APT_SNAPSHOT_PREMIRROR_BASE to specify the base-url of the mirror used. Added Kconfig for cached snapshot mirror Added doc to setup apt cache. doc/apt-cache.md | 55 +++++++++++++++++++++++++ kas/opt/Kconfig | 25 ++++++++++- kas/opt/mirror-snapshot.yaml | 3 ++ meta-isar/conf/distro/ubuntu-common.inc | 3 +- meta/conf/distro/debian-common.conf | 3 +- 5 files changed, 86 insertions(+), 3 deletions(-) create mode 100644 doc/apt-cache.md