Not expected Exception

Hello!

I get quite a lot exceptions of the type below (running currently on 3.2.2), but I am not quite sure which action triggers them.
Can you give me pointers?

Million thanks!

2024-09-06 11:55:01,472 bde4bbdc265d456b - indico.cache - ERROR cache.py:149 – set(‘flask-session/c3db94b0-984d-429d-92ef-e48bd37e86c4’) failed

Traceback (most recent call last):
File “/opt/indico/.venv/lib/python3.9/site-packages/indico/core/cache.py”, line 145, in set
super().set(key, value, timeout=timeout)
File “/opt/indico/.venv/lib/python3.9/site-packages/flask_caching/init.py”, line 198, in set
return self.cache.set(*args, **kwargs)
File “/opt/indico/.venv/lib/python3.9/site-packages/cachelib/redis.py”, line 87, in set
result = self._write_client.setex(
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/commands/core.py”, line 2327, in setex
return self.execute_command(“SETEX”, name, time, value)
File “/opt/indico/.venv/lib/python3.9/site-packages/sentry_sdk/integrations/redis.py”, line 207, in sentry_patched_execute_command
return old_execute_command(self, name, *args, **kwargs)
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/client.py”, line 1258, in execute_command
return conn.retry.call_with_retry(
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/retry.py”, line 49, in call_with_retry
fail(error)
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/client.py”, line 1262, in
lambda error: self._disconnect_raise(conn, error),
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/client.py”, line 1248, in _disconnect_raise
raise error
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/retry.py”, line 46, in call_with_retry
return do()
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/client.py”, line 1259, in
lambda: self._send_command_parse_response(
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/client.py”, line 1235, in _send_command_parse_response
return self.parse_response(conn, command_name, **options)
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/client.py”, line 1275, in parse_response
response = connection.read_response()
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/connection.py”, line 865, in read_response
response = self._parser.read_response(disable_decoding=disable_decoding)
File “/opt/indico/.venv/lib/python3.9/site-packages/redis/connection.py”, line 501, in read_response
raise response
redis.exceptions.BusyLoadingError: Redis is loading the dataset in memory

{‘data’: {‘get’: {‘next’: ‘/category/3/overview?date=2371-06-01&detail=event&period=day’},
‘headers’: {‘Accept’: ‘text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8’,
‘Accept-Encoding’: ‘gzip, deflate’,
‘Connection’: ‘close’,
‘Host’: ‘indico.ectstar.eu’,
‘Referer’: ‘Indico’,
‘User-Agent’: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; ’
‘rv:95.0) Gecko/20100101 Firefox/95.0’,
‘X-Forwarded-By’: ‘10.192.1.4:443’,
‘X-Forwarded-For’: ‘52.40.18.77’,
‘X-Forwarded-Proto’: ‘https’,
‘X-Real-Ip’: ‘52.40.18.77’},
‘json’: None,
‘post’: {},
‘url’: {‘provider’: None}},
‘endpoint’: ‘auth.register’,
‘id’: ‘bde4bbdc265d456b’,
‘ip’: ‘10.192.1.4’,
‘method’: ‘GET’,
‘referrer’: ‘Indico’,
‘rh’: ‘RHRegister’,
‘time’: ‘2024-09-06T11:55:01.489948’,
‘url’: ‘ECT* Indico’,
‘user’: None,
‘user_agent’: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) ’
‘Gecko/20100101 Firefox/95.0’}

Looks like redis is (re-starting). Check if you are running in OOM-errors for redis.

Thank you. We have meanwhile doubled the RAM of our VM. If I understand well this should solve the issue for the time being.

It could, but I recommend to investigate why the cache grew to such a size that it caused OOM. Our cache e.g. uses currently ~ 40MB of RAM. So either you really have so many users that the cache grows too big or you may experience an issue with a rogue bot overloading your instance.

indico user search -e ‘@’ --substring

returns 2887 users.

redis-cli info | grep memory_human

returns

used_memory_human:1.32G
total_system_memory_human:6.77G
maxmemory_human:0B

Looks a fairly large amount of used memory for a VM that it is only dedicated to running indico. Should I ask the IT people to check it?

Yes, that seem quite huge, unless all your users are actively using indico at the same time.

you could run KEYS * inside redis-cli to see what redis keys there are…

As for keys I do not see anything wrong:

1) "_kombu.binding.celeryev"
2) "_kombu.binding.celery.pidbox"
3) "_kombu.binding.celery"

Do the same in redis-cli -n 1 which will get you the cache db instead of the db used for celery task communication.

Oooops. The count there is 3,365,040. They are all of the type e.g.,

indico_v3.2_flask-session/fdbd9c46-edc0-4f20-9b94-f6a1c1c59d5b

Looks to me these are hanging up somehow. Anything I can do to clean the cache without harming current operations?

flushdb in redis-cli -n 1 will wipe all sessions. users will be logged out but otherwise it won’t cause any issues.

also FWIW sessions expire after a week of inactivity by default

PS: You should now run del indico_v3.2_flask-session/fdbd9c46-edc0-4f20-9b94-f6a1c1c59d5b to revoke that session since its session id has been posted here

OK. Done that. Cache is back to more normal numbers (1Mb).
However:

  1. In the time elapsed I have >3,500 keys created
  2. I’ve got a few unexpected exceptions in my mailbox

With respect to 2) there is something I noticed: the logger report comes with a link which invariably points to the same event category overview but with a date either very far in the future (e.g., year 2371) or in the past (year 1543, probably looking for a seminar by the late Copernicus…).
Things like:

https://indico.ectstar.eu/category/7/overview?date=2504-04-17&detail=event&period=day

Can you check your access logs for strange accesses? Usually it’s pretty easy to see when there are requests from misbehaving bots/crawlers for example. Feel free to share the access_log file privately with me (to avoid posting client IP addresses in a public place) if you want me to have a look.

Usually you should NOT have that many sessions created on a typical small-ish instance.


FWIW, this is what our own Redis instance has, and db0 is indico.cern.ch (ie a VERY large and active instance)

db0:keys=876828,expires=866777,avg_ttl=252740870
db1:keys=17602,expires=17344,avg_ttl=275680265
db2:keys=101762,expires=101534,avg_ttl=250761963
db3:keys=30972,expires=30968,avg_ttl=297008205
db4:keys=883040,expires=883015,avg_ttl=290822382
db5:keys=40986,expires=40262,avg_ttl=293775528

Contacting the IT guys right now (my access to the VM is very limited so /var/log is out of reach). Will let you know their findings, and thanks for all the help!

By default it’s in /opt/indico/log/apache or /opt/indico/log/nginx where you might have access :wink:

After being granted permission to access the logs, we have discovered a single machine doing weird stuff with registering. We have banned it and the creation of keys has dropped to basically zero.

I am running on 3.2.8; could this be an exploit of the XSS vulnerability patched in 3.3.4 (that I will now proceed to install)?

No, that vuln cannot be exploited except by getting someone to register a new account using a “special” link.

Can you share the “weird” log entries with me so I can see what they’ve been trying to do?

Sent a message, I believe, with an excerpt of the logs.

This definitely looks like a bot following the next/previous links on the calendar page indefinetly.