Resolving issues with latest commit on k8s

I run TTRSS on a Kubernetes cluster (v1.27.2) with the latest stock docker images.

Now when I upgrade ttrss-web-nginx to the latest image (1fe1132a) I get these errors:
2023/11/13 16:18:29 [error] 36#36: send() failed (111: Connection refused) while resolving, resolver: 127.0.0.11:53

So it seems the latest commit (1fe1132a1a68bd0fedc313823b942130167fad86) with changes to nginx.conf is causing some internal Kubernetes resolving issues for my setup.

When reverting to the previous commit (image cthulhoo/ttrss-web-nginx:61910acb) it works perfectly again.

Does anybody else have the same the issue with the stock docker images on k8s?

https://gitlab.tt-rss.org/tt-rss/tt-rss/-/blob/master/.docker/web-nginx/Dockerfile?ref_type=heads#L18

I have the exact same issue @FWD . Thanks for opening a post, I was about to. I also run ttrss on a k8s cluster and exact same message. I haven’t tried all the images in between the broken one (the latest) and the one that was working 12 days ago so can’t confirm which commit broke it, but I believe you are right.

Thanks. I’m passing the environment variable for the Kubernetes resolver to nginx now. However somehow resolving is still not working.

I now get this error:
2023/11/13 17:59:59 [error] 39#39: *7 app could not be resolved (3: Host not found), client: 10.244.x.x, server: , request: “GET /tt-rss/ HTTP/1.1”, host: “MYDOMAINNAME”

However when I use a shell on this nginx pod resolving is working fine:
nslookup MYDOMAINNAME kube-dns.kube-system.svc.cluster.local
Server: kube-dns.kube-system.svc.cluster.local
Address: 10.96.x.x:53

Non-authoritative answer:
Name: MYDOMAINNAME
Address: CORRECT_PUBLIC_IP_ADRESS

NB: I use traefik in front of nginx to do SSL termination / Let’s Encrypt renewals.
If you spot what I do wrong please enlighten me. I’ll debug my k8s setup more when I’ve more time…

I use a similar setup except with nginx instead of Traefik and I get a similar error:

2023/11/13 18:20:25 [error] 36#36: *1 app could not be resolved (3: Host not found), client: 10.1.199.118, server: , request: “POST /tt-rss/api/ HTTP/1.1”, host: “149.X.X.X”
10.1.199.118 - - [13/Nov/2023:18:20:25 +0000] “POST /tt-rss/api/ HTTP/1.1” 502 157 “-” “Tiny Tiny RSS (Android)…”

I setup my web-nginx deployment with this env var:

    env:
    - name: RESOLVER
      value: "kube-dns.kube-system.svc.cluster.local"

i have no trouble running tt-rss demo on k8s using this helm chart - https://gitlab.tt-rss.org/tt-rss/helm-charts/tt-rss & these values - https://gitlab.tt-rss.org/tt-rss/tt-rss/-/blob/master/.helm/values-demo.yaml?ref_type=heads (nb: this won’t help you though because it’s a one pod setup)

you’re looking up MYDOMAINNAME while the DNS record that doesn’t resolve is the backend upstream name (‘app’) which is also configurable via environment. do pay more attention to the error messages.

figure out correct hostname for the app container/pod and go from there.

i think i see the problem - using resolver like this (as opposed to built-in resolving nginx upstream directive did before) does not use implied dns namespace suffix from resolv.conf which needs to be appended manually.

e:

--- a/templates/web-deployment.yaml
+++ b/templates/web-deployment.yaml
@@ -24,11 +24,13 @@ spec:
       containers:
         - env:
             - name: APP_UPSTREAM
-              value: {{ include "common.fullname" . }}-app
+              value: {{ include "common.fullname" . }}-app.{{ .Release.Namespace }}.svc.cluster.local
             - name: APP_WEB_ROOT
               value: "{{ .Values.web.root }}"
             - name: APP_BASE
               value: "{{ .Values.web.base }}"
+            - name: RESOLVER
+              value: kube-dns.kube-system.svc.cluster.local

this should work, i think (note APP_UPSTREAM env var).

e2: i need to stop editing this post :slight_smile:

I also ran into this issue (using podman). web-nginx turned unhealthy because “app” could not be found:

send() failed (111: Connection refused) while resolving, resolver: 127.0.0.11:53
and 
app could not be resolved (110: Operation timed out), client: ::1, server: , request: "GET /tt-rss/index.php HTTP/1.1", host: "localhost"

Traefik therefore returned 404.

I switched to the official NGINX image (docker.io/nginx:alpine) and volume-mounted a “crafted” nginx.conf from the template .docker/web-nginx/nginx.conf and commenting out the RESOLVER statement (https://gitlab.tt-rss.org/tt-rss/tt-rss/-/blob/master/.docker/web-nginx/nginx.conf?ref_type=heads#L19).

It seems podman (aardvark-DNS) listens on the Bridge-Interface for DNS-requests and configures /etc/resolv.conf with a nameserver entry for each bridge the container is connected to. As these bridge network ip addresses could change, I cannot configure them in advance for use in NGINX config.

But podman / aardvark-dns has problems by itself, e. g. DNS requests timeout · Issue #389 · containers/aardvark-dns · GitHub . It seems not to be finished/polished yet.

Using the setup described using podman and the official NGINX image (docker.io/nginx:alpine) I experience occasional dns lookup faults “Uncaught PDOException: SQLSTATE[08006] [7] could not translate host name “ttrss_db” to address: Name does not resolve”.

I’m not using helm, so hardcoded the namespace / local FQDN in the variable.
Working again now, thanks.

    spec:
      containers:
        - image: cthulhoo/ttrss-web-nginx
          env:
            - name: RESOLVER
              value: kube-dns.kube-system.svc.cluster.local
            - name: APP_UPSTREAM
              value: app.t2.svc.cluster.local