we have a small scale Indico setup that works reasonably for us now and now we are thinking about making it more resilient for all sort of crashes/issues. We run our setup on a CentOS7 VM with PostgreSQL server sharing the same VM. Our interim non-perfect solution for a backup is an identical VM where we copy all relevant data (database dump/restore and all configurations/attachments) on a regular basis. In case of failure of production VM we’d have to do manual fail-over with DNS switching to the second machine which is a potentially serious disruption. We’d like to have a more or less automated fail-over procedure without human intervention if possible.
Could anyone share their experiences with configuring Indico and backend database for HA? I think we are especially interested in learning how it is done at CERN, do you have an automated fail-over and/or database replication? (I also tried to google for such info but could not find any details, apologies if I missed it.)
A DNS alias pointing to both of them (with an automated system that removes unresponsive hosts from that alias, so if one LB goes down, no requests go there after a few minutes)
The haproxies perform regular checks to see if the workers are healthy; if not they stop sending requests to a particular worker until they are working fine again
4 workers where the haproxies send requests to
1 VM running only celery (could be done on one of the workers, but like this we have a less important machine without high load from requests that can also be used for administrative tasks etc)
1 redis cache vm (another one as a spare, but no hot failover etc)
1 database - we have a replica but no hot failover etc either; so far we did not have major issues and for database maintenance (e.g. on the VM’s host system) we don’t mind (too much) taking the service down for half an hour. of course we don’t like having to do that and a read-only mode (served from the replica) would be better, but currently we don’t really have a read-only mode that doesn’t do any DB writes (which would fail in case of a RO replica) in Indico
files are currently stored on AFS, and we plan to move to S3 (or rather a S3-compatible API since we want to keep hosting our files at CERN) using the new storage_s3 plugin
Here’s a simplified version of the haproxy config we’re using. Nothing special in there though:
frontend indico
bind :::443 v4v6 ssl crt /etc/ssl/haproxy/indico.cern.ch.pem crt
bind :::80 v4v6
capture request header Referer len 128
capture request header User-Agent len 128
default_backend indico-workers
redirect prefix https://indico.cern.ch code 301 unless { hdr(host) -i indico.cern.ch }
redirect scheme https code 301 if !{ ssl_fc }
backend indico-workers
http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
http-request set-header X-Forwarded-Proto https if { ssl_fc }
http-request del-header X-Forwarded-Host
http-request set-header Host indico.cern.ch
option forwardfor
option allbackups
option httpchk HEAD /ping HTTP/1.0\r\nHost:\ indico.cern.ch
server indico-wk1 indico-wk1.cern.ch:443 ssl check cookie indico-wk1
server indico-wk2 indico-wk2.cern.ch:443 ssl check cookie indico-wk2
server indico-wk3 indico-wk3.cern.ch:443 ssl check cookie indico-wk3
server indico-wk4 indico-wk4.cern.ch:443 ssl check cookie indico-wk4
server indico-maintenance1 indico-maintenance1.cern.ch:80 backup check
server indico-maintenance2 indico-maintenance2.cern.ch:80 backup check
There’s also a short talk I did during the Indico workshop, but it’s just a rough overview, nothing in-depth: