Questions about migration Indico 2.3.5 to 3.x

mknoerzer · January 4, 2022, 9:23am

Dear Indico developers,

I have some questions regarding the migration of Indico 2.3.5 to 3.x:

I would like to do a fresh installation of Indico 3, but then first do some tests before migrating our Indico 2.3 data. To migrate the Indico 2.3 data to the Indico 3 server that has already a database setup with indico db prepare and has been used for testing, can I just do the following?

su - postgres
dropdb indico
pg_restore -C -d postgres indico.sql

Assuming indico.sql is the output file from

pg_dump -U postgres -d indico > indico.sql

on the old Indico 2.3 server.

I guess I do not need to set the ownership of the database or create database extensions after using pg_restore, since these should be restored from the dump.

Then deleting everything in /opt/indico/archive and copy the old data there.

Is there anything else to consider? E.g. emptying cache/tmp folders?

Thanks a lot!

Michael

ThiefMaster · January 4, 2022, 11:57am

Hi, looks fine!

I’d rather run the pg_restore as the indico user to make sure there are no ownership screwups, but generally what you do should work as well since the dump should contain everything important including the correct ownership (it would only be a problem when using different postgres users on the different systems).

FWIW, this is how I would do it:

su - postgres -c 'dropdb indico'
su - postgres -c 'createdb -O indico indico'
su - postgres -c 'psql indico -c "CREATE EXTENSION unaccent; CREATE EXTENSION pg_trgm;"'

and then as the indico user:

pg_restore -d indico -O /path/to/indico.dump

For creating the backup I’d use pg_dump -Fc -f /tmp/indico.dump indico to use the custom postgres dump format; it’s compressed and AFAIK also more efficient (and you don’t need a raw readable SQL file anyway if you don’t plan to modify any data in the sql dump)

Yes, that’s a good idea.

mknoerzer · January 4, 2022, 3:05pm

Thank you also for this!

I guess you mean pg_restore -d indico -O /path/to/indico.dump ?

ThiefMaster · January 4, 2022, 3:07pm

Yes… that happens when copying commands from your own shell history and forgetting to fully change it

mknoerzer · February 14, 2022, 2:56pm

@ThiefMaster: I am now in the middle of the migration process and ended up with a problem accessing some old timetable files that contains special characters in the filename in the legacy archive. Everything else works fine and I would prefer not to undo everything and hope for some fast help.

Accessing the file on the old Indico 2.3.5 server is no problem, but accessing them on the Indico 3.1 server fails.

The apache error log on the new server contains this (I replaced some content with XXX):

[XXX] [:error] [pid XXX:tid XXX] (2)No such file or directory: [client XXX] xsendfile: cannot open file: /data/indico/legacy-archive/XXX/Pr\xe4sentation_Mainz_end2.pdf

I try to access the file using this URL (which is from the timetable, again replaced part of URL by XXX): https://XXX/Praesentation_Mainz_end2.pdf

What can I do? I did the database dump and restore in the way you suggested.

Uploading a new file with special characters and downloading it again works. Most of the old files also work.

Best,

Michael

Edit: The name of the file in the filesystem on the server is “Präsentation_Mainz_end2.pdf”.

ThiefMaster · February 14, 2022, 3:02pm

Check what Attachment.get(ID).file.storage_file_id gives you in indico shell. the ID is in the last path segment before the filename in the URL, and compare it with what you have in the file system.

I have no idea what exactly is going wrong there - feels like a charset different between the local file system and what’s in the database, but some ideas:

Is the filesystem using UTF8 now, while it used latin1/iso-8859-1 before? Or is that \xe4 actually part of the storage_file_id (you may want to print() the value in indico shell, otherwise you see a python repr that may contain the escape sequence).?
If it’s just very few files you could rename them on disk and update the storage_file_id in the DB accordingly.

mknoerzer · February 14, 2022, 3:18pm

@ThiefMaster: The last part of the URL is /1747/Praesentation_Mainz_end2.pdf. So the ID is 1747. I am sorry but I never used indico shell. How can I start/use it? I did not find information in the doc.

You mean I should enter “print(Attachment.get(1747).file.storage_file_id)” somewhere?

When I type “locale” in the linux shell the output on the old and the new system are the same.
I have no idea how many files have that problem.

ThiefMaster · February 14, 2022, 3:21pm

Login as the indico user on the server, then run indico shell. This basically gives you an interactive Python interpreter where you can execute code snippets.

mknoerzer · February 14, 2022, 3:32pm

In [1]: print(Attachment.get(1747).file.storage_file_id)
2015/XXX/Präsentation_Mainz_end2.pdf

Same folder and file name as in the file system. What to do?

ThiefMaster · February 14, 2022, 3:35pm

Can you show me the repr() instead of print() as well?

Also, the output of this:

import os
os.listdir('/data/indico/legacy-archive/..../XXX/')

(ie the path to the file without the file name)

mknoerzer · February 14, 2022, 3:39pm

In [2]: repr(Attachment.get(1747).file.storage_file_id)
Out[2]: "'2015/XXX/Präsentation_Mainz_end2.pdf'"

In [3]: import os

In [4]: os.listdir('/data/indico/legacy-archive/2015/XXX/')
Out[4]: ['Präsentation_Mainz_end2.pdf']

ThiefMaster · February 14, 2022, 3:44pm

How about this one?

import sys
sys.getfilesystemencoding()

And does open('/data/indico/legacy-archive/2015/XXX/Präsentation_Mainz_end2.pdf', 'r') work?

mknoerzer · February 14, 2022, 3:48pm

In [6]: import sys

In [7]: sys.getfilesystemencoding()
Out[7]: 'utf-8'

In [8]: open('/data/indico/legacy-archive/2015/XXX/Präsentation_Mainz_end2.pdf', 'r')
Out[8]: <_io.TextIOWrapper name='/data/indico/legacy-archive/2015/XXX/Präsentation_Mainz_end2.pdf' mode='r' encoding='UTF-8'>

ThiefMaster · February 14, 2022, 3:50pm

now the filesystem encoding would be interesting to see on the old indico server (assuming you moved to a new server/VM as recommended in the docs) as well…

PS: Please wrap your output in triple backticks; that way it’s more readable and i don’t need to edit your posts to add this

mknoerzer · February 14, 2022, 3:57pm

Yes, I moved to a a new VM. On the old server:

Indico v2.3.5 is ready for your commands
In [1]: import sys

In [2]: sys.getfilesystemencoding()
Out[2]: 'UTF-8'

I am sorry

ThiefMaster · February 14, 2022, 4:00pm

hm so the main question is… where does this escaping happen between indico and apache… and without an environment where it happens it’s also kind of hard to debug

BTW as a workaroudn you could remove/comment out the STATIC_FILE_METHOD line in indico.conf. That way files would be sent directly by indico instead of handing it off to apache; in case of a small instance that’d probably be OK, even though it’s not ideal of course (apache/nginx are much better at service static file content)

mknoerzer · February 14, 2022, 4:15pm

OK, without STATIC_FILE_METHOD it works. Is it only a question of performance? I need to decide if I will continue migration and work with the new server or roll back (which I really do not want to do). The problem is: If I continue to work with the migrated data I will be unable to get the old state in case it turns out there was some kind of error when migrating the data. But it seems this is not the case, right?

In case you will have some ideas in future I am happy to try them out.

Thanks a lot for your help!

ThiefMaster · February 14, 2022, 4:18pm

Yes, it’s just a matter of performance. So no need to rollback. The problem is only when passing the path to Apache - the open() test showed that the files can be read just fine.

PS: I think theoretically downgrading the DB to the 2.3 state would be possible, but it’s something we did not test and of course do not recommend anyone to do.

mknoerzer · February 15, 2022, 11:47am

@ThiefMaster:

I don’t know if it’s correlated, but I find these errors in my syslog.all:

XXX indico-uwsgi[XXX]: uwsgi_response_write_body_do() TIMEOUT !!!
XXX indico-uwsgi[XXX]: OSError: write error

Is there some kind of timeout time I need to change? Where do these errors come from?

ThiefMaster · February 15, 2022, 11:49am

I think that happens if a client disconnects while data is being sent to them. Should be fine to ignore.