Image download bug / Option to set custom user-agent

I’ve just finished tracking down an issue where indico is reporting “TooManyRedirects: Exceeded 30 redirects”, and generating an exception email when generating a contribution PDF. This turns out to be a pain to diagnose as there’s nothing in the traceback that helps to identify which URL or contribution is the culprit - and in the case of the event I was looking at there are over 500 to investigate.

In this case an academic has linked to some images hosted on their ‘staff home page’ type website that work fine in the web browser, but their institution seems to block all requests with the ‘python-requests’ user-agent by issuing a 302 redirect to /. Unfortunately, the / page also issues a redirect to itself, and so we end up with a failed PDF. Clearly this is broken behaviour on the part of their webserver, but it’s not the first time we’ve seen websites doing funny things based on user-agent. Cloudflare fronted websites are particularly prone to this kind of issue too, as they are often configured to block known ‘bots’ by default.

I’ve bodged this by changing the default useragent in requests/utils.py, but this is clearly not a long term solution :slight_smile:

  1. It would be nice if a failed URL request actually logged the URL (and possibly contribution friendly_id? in the failure email - at least we could then quickly notify the contributor that there’s an issue with their content. It’s not always going to be something that individual has control over, though, so:
  2. A sensible improvement would be for indico to allow you to set a custom user-agent, and it might be wise for it to do so by default as this might affect other indico users.
  3. Ideally indico would also catch this exception and put a ‘broken image’ placeholder in the pdf, much like already happens when a downloaded image isn’t a valid image.