Generally you should first test your code on a dev setup (ie locally) before running it in the production (or production-like) environment. Debugging in a production environment is much more complicated in any case…
I had a devel server that used to work, but not anymore. I reinstalled the packages, redis and postgresql are running, and re-created the SSL keys. Still getting the same errors when I go to 127.0.0.1:8000. Are they familiar to you?
127.0.0.1 - - [26/Apr/2019 11:23:41] code 400, message Bad request syntax ('\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x030\xcf\x07J\xa7\xb1\x84\xe6-\x92<1Uy\x8f\xab\x04t\xe0\x15\x01dD\xa5\x1c \xe0\x0c\xb5\x97,\xc5 \x0c`\xc8\x1b\x13\xc7\x98\x83D\xd0KD\xd2a\xef3(\xdd\xb4\xbd\xb5\x9bT36\xe3\xb4\x11\x01\x02H_\x00"\xba\xba\x13\x01\x13\x02\x13\x03\xc0+\xc0/\xc0,\xc00\xcc\xa9\xcc\xa8\xc0\x13\xc0\x14\x00\x9c\x00\x9d\x00/\x005\x00')
127.0.0.1 - - [26/Apr/2019 11:23:41] code 400, message Bad HTTP/0.9 request type ("\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03-\xa8\xd2\xa0*\xe7\xbc\x7f5\xc4\xe7\x1eD\x99\x12Oi\xa9\x833)\x11'\xb8hE\xf8\x9a\xd5\xf62\x9e")
--------------------------------------------------------------------------------
Exception happened during processing of request from
(Exception happened during processing of request from'127.0. 0('127..1', 0.0.1'56228)
Traceback (most recent call last):
, 56226)
File "/usr/lib64/python2.7/SocketServer.py", line 593, in process_request_thread
Traceback (most recent call last):
File "/usr/lib64/python2.7/SocketServer.py", line 593, in process_request_thread
self.finish_request(request, client_address)
self.finish_request(request, client_address)
File "/usr/lib64/python2.7/SocketServer.py", line 334, in finish_request
File "/usr/lib64/python2.7/SocketServer.py", line 334, in finish_request
self.RequestHandlerClass(request, client_address, self)
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib64/python2.7/SocketServer.py", line 649, in __init__
File "/usr/lib64/python2.7/SocketServer.py", line 649, in __init__
self.handle()
self.handle()
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 293, in handle
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 293, in handle
rv = BaseHTTPRequestHandler.handle(self)
rv = BaseHTTPRequestHandler.handle(self)
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 340, in handle
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 340, in handle
self.handle_one_request()
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 327, in handle_one_request
self.handle_one_request()
elif self.parse_request():
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 327, in handle_one_request
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 286, in parse_request
elif self.parse_request():
self.send_error(400, "Bad request syntax (%r)" % requestline)
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 281, in parse_request
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 368, in send_error
"Bad HTTP/0.9 request type (%r)" % command)
self.send_response(code, message)
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 368, in send_error
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 332, in send_response
self.log_request(code)
self.send_response(code, message)
File "/home/fakeusername/dev/indico/src/indico/cli/devserver.py", line 161, in log_request
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 332, in send_response
self.log_request(code)
File "/home/fakeusername/dev/indico/src/indico/cli/devserver.py", line 161, in log_request
super(QuietWSGIRequestHandler, self).log_request(code, size)
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 373, in log_request
self.log('info', '"%s" %s %s', msg, code, size)
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 384, in log
message % args))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 18: ordinal not in range(128)
super(QuietWSGIRequestHandler, self).log_request(code, size)
----------------------------------------
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 373, in log_request
self.log('info', '"%s" %s %s', msg, code, size)
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 384, in log
message % args))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 18: ordinal not in range(128)
----------------------------------------
127.0.0.1 - - [26/Apr/2019 11:23:41] code 400, message Bad request syntax ("\x16\x03\x01\x00\xb5\x01\x00\x00\xb1\x03\x03m\x90\xd6\xda\x16!'3;\x03\xd8_\xa1\xf80\xcd\xe6\xe9\xd0\xd9\x055\xe78F\x9e\xfb\xf5Vv\xca\xb5\x00\x00\x1cJJ\xc0+\xc0/\xc0,\xc00\xcc\xa9\xcc\xa8\xc0\x13\xc0\x14\x00\x9c\x00\x9d\x00/\x005\x00")
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 56230)
Traceback (most recent call last):
File "/usr/lib64/python2.7/SocketServer.py", line 593, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib64/python2.7/SocketServer.py", line 334, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib64/python2.7/SocketServer.py", line 649, in __init__
self.handle()
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 293, in handle
rv = BaseHTTPRequestHandler.handle(self)
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 340, in handle
self.handle_one_request()
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 327, in handle_one_request
elif self.parse_request():
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 286, in parse_request
self.send_error(400, "Bad request syntax (%r)" % requestline)
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 368, in send_error
self.send_response(code, message)
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 332, in send_response
self.log_request(code)
File "/home/fakeusername/dev/indico/src/indico/cli/devserver.py", line 161, in log_request
super(QuietWSGIRequestHandler, self).log_request(code, size)
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 373, in log_request
self.log('info', '"%s" %s %s', msg, code, size)
File "/home/fakeusername/dev/indico/env/lib/python2.7/site-packages/werkzeug/serving.py", line 384, in log
message % args))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in position 14: ordinal not in range(128)
----------------------------------------
Thanks,
Jose
Looks like you’re accessing a dev server that’s running in http mode via https.
Hmm, my notes included https… But yes, using http works.
The dev server has some options to use https, or you can put e.g. nginx in front of it (the dev setup docs mention this as an option). But by default it’s http-only since that’s the easiest way to use it in development.
As agreed during yesterday’s meeting, I have created a docker-compose file that sets up the CERN Search microservice alongside Nginx, Postgres, Redis, ElasticSearch and Tika. This should be enough to get us started with the development of the plugin:
In order to run it, you should download the file to the root folder of the cern-search repo. You will also have to generate the test certificates by hand (we could have it in a separate Dockerfile for nginx, though…)
$ sh scripts/gen-cert.sh
$ mkdir nginx/tls
$ mv nginx.crt nginx/tls/tls.crt
$ mv nginx.key nginx/tls/tls.key
$ rm nginx.csr
If OpenSSL complains about the password being too short, just replace pass:x
with pass:12345
in gen-cert.sh
(I’ll send a PR to fix that upstream).
Then do docker-compose up
and you should have your development cluster running.
I managed to log in to Invenio (https://localhost:8080
)
(username: test@example.com
, password: test1234
)
Retrieving records through the REST API results in an error, probably because I haven’t set up the ElasticSearch indices propertly. In any case, it’s a start.
Apache Tika seems to work fine when I connect to it using tika-python
:
In [16]: from tika import parser
In [17]: parser.from_file('/tmp/test.docx', serverEndpoint="http://localhost:9998")
Out[17]:
{'content': u'\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTEST2\n',
'metadata': {u'Application-Name': u'LibreOffice/5.3.6.1$Linux_X86_64 LibreOffice_project/30$Build-1',
...
@pferreir I would appreciate, if you can provide a bit more information about docker (I never used docker before apart from the initial test to run the HelloWorld and list the docker images…).
I installed docker (version: 1.13.1, API version 1.26) and docker-compose (version: 1.24.0)
I created a directory with the downloaded docker-compose.yml file you created and tried to run docker-compose up
but this required the Dockerfile. What should the Dockerfile contain? And obviously I am missing the commands to initialize docker and the “development cluster”.
Also, you are using gen-cert.ch to create the certificates. What is the content of this file? a simple openssh command?
I think this answers your question
In order to run it, you should download the file to the root folder of the cern-search repo.
The repo to clone is GitHub - inveniosoftware-contrib/citadel-search: Citadel: Enterprise Search - it includes the Dockerfile and get-cert.sh script.
THANK YOU! Yes it does.
Hi,
quick question: does the plugin search_invenio actually works? Has it seen working?
As you know, I am trying to write a new plugin based on that one, trying to reuse as much as possible. But, was it functional?
For example, are all templates correct?
At this point, until we have CERN Search deployed, I am trying to mock an EventEntry() object as fake output of a query. I thought, naively, that if I build that object properly, I should see its content in the web page. But I am getting “Build Errors”.
- It could be (hopefully !!) I am not creating the object correctly.
- Or it could be the template interpolation is failing.
This is what makes me wonder if the code and architecture of search_invenio is correct…
Speaking of classes in entries.py, are they documented somewhere?
The input options are:
- result_id
- title
- location
- start_date
- materials
- authors
- description
Unclear to me the type of some of them (strings, integers, …) and their meaning. Where can I find some documentation?
replying myself…
I have just been reminded the invenio plugin does not work.
So the idea then is to interpolate the HTML templates directly with the JSON from the queries, correct?
It always helps to have the correct network configuration…
I followed @pferreir instructions and it was really simple to have docker running.
The following are all the commands I used on my RHEL 7 VM:
$ yum -y install docker
$ pip install docker-compose
$ systemctl start docker
$ git clone https://github.com/inveniosoftware-contrib/cern-search
$ cd cern-search
$ wget https://gist.githubusercontent.com/pferreir/77ede49adb292879c52e3e4a02e28582/raw/c26b7031a2fd5d01c9ac82293300b785d54dd7c9/docker-compose.yml
$ sh scripts/gen-cert.ch
$ mv nginx.crt nginx/tls/tls.crt
$ mv nginx.key nginx/tls/tls.key
$ rm nginx.csr
$ docker-compose up -d
Then, I was able to access Invenio from my desktop (https://web4604.fnal.gov:8080
) and used username test@example.com
-password test1234
As far as tika, I was able to access it from another server without a problem:
>>> from tika import parser
>>> parser.from_file("./penelope.py", serverEndpoint="http://web4604.fnal.gov:9998")
{'status': 200, 'content': u'\n\n\n\n\n\n\n\n\nfrom __future__ import unicode_literals\
.......
The next steps will be to access the cern-search-api: send indico livesync data to be indexed and then send search requests and receive the search results.
I assume that the example at http://cernsearchdocs.web.cern.ch/cernsearchdocs/example/ and the rest of the rest of the documentation should be our starting point.
Just a small suggestions: Do not use pip install docker-compose
- it installs TONS of dependencies, and when used outside a virtualenv it leaves behind a huge mess of python packages in your system python environment.
Better download a single-file bundle from https://github.com/docker/compose/releases: https://github.com/docker/compose/releases/download/1.24.0/docker-compose-Linux-x86_64, save it as /usr/local/bin/docker-compose
and chmod +x
it
@ThiefMaster Thank you for the information. Yes, I did notice all the packages it installs but I followed the instruction as I was not sure of what is needed.
Yes, that’s the idea. I wouldn’t spend tons of time with the interface, however. We will have someone working on a fancy UI on our side, this summer. So, a simple Google-like thing would be enough for now.
OK.
Would you then recommend me to adapt search_cern plugin https://github.com/indico/indico-plugins-cern/tree/master/search_cern ?
I was not planning on changing neither the interface nor the rendering. My plan was just to change the plugin to handle the new JSON output from the queries to “CERN Search” and let the existing code to do the rest. Right?
So, if that sounds like a reasonable approach, then I guess the steps here are:
- find the exact method where the queries are performed. In search_invenio plugin was _fetch_data(). I need to find out where exactly is done in search_cern
- find out where the output of the query is being used to fill the HTML templates.
- massage, if needed, the JSON output to be able to fill the HTML templates with it. The templates in search_cern are supposed to be correct, I assume…
Sounds correct to you?
The search_cern
plugin is not a great example because it uses an <iframe>
to show the results. So, it does absolutely no rendering of any results, it just displays the page that is sent by the search engine (Sharepoint in this case). So, yes, you can adapt it, but then you’ll have to write a very basic interface. You can actually just “steal” it from the old Invenio plugin: https://github.com/indico/indico-plugins/blob/master/search_invenio/indico_search_invenio/templates/results.html
find the exact method where the queries are performed. In search_invenio plugin was _fetch_data(). I need to find out where exactly is done in search_cern
You probably want https://github.com/indico/indico-plugins-cern/blob/master/search_cern/indico_search_cern/engine.py#L28.
find out where the output of the query is being used to fill the HTML templates.
It’s not. But you can steal that from the Invenio plugin as I’ve said.
massage, if needed, the JSON output to be able to fill the HTML templates with it.
Yes!
Oh. Then, my original approach was not that bad after all.
I was studying the invenio plugin. I more or less got the general idea
In this case, the output is converted (or tried to) to objects Author(), EventEntry(), ContributionEntry(), and SubContributionEntry(). After that, they are supposed to be used to fill the HTML templates. If I got the logic correctly.
Therefore, I was working assuming that, if you create those Entry() objects properly, the rendering would work.
From your answer I get the templates in invenio are correct.
So I assume the idea is to reuse
- code from search_cern, as much as possible
- templates from search_invenio
Did I get your comments correctly?
Yes, that’s what I mean, and it should be possible.
The Elasticsearch (CERN search) marshmallow schema is almost finished (I placed a draft at: Elasticsearch_Docs/schemas.py at master · penelopec/Elasticsearch_Docs · GitHub)
There are only the following fields that I am not able to get any values and I am not able to find information about these fields:
ContributionSchema:
creation_date = mm.DateTime(attribute='created_dt')
SubContributionSchema:
creation_date = mm.DateTime(attribute='created_dt')
start_date = mm.DateTime(attribute='start_dt')
end_date = mm.DateTime(attribute='end_dt')
For the implementation I made the following assumptions:
ACL assumptions (for the read
entry of _access
):
- For public access the ACL will contain only one entry
'ANONYMOUS'
or it could be just empty, depending on what Pablo expects. - For private access will contain the users’ ID and the users’ email.
- The ACL for subcontributions is that of the contribution it belongs to
- The ACL of the EventNote is that of the object it belongs to (contribution, session or event)
- For all mappings I have added a
URL
field to contain the external url for accessing the object from the search results.
The following are the questions that I have in order to move forward in the livesync_json
(I decided that this is a better name for this plugin):
-
How do I access the CERN search app (assuming that I have installed in docker what Pedro has supplied)?
-
How should I call the CERN search app for populating ES?
For the ES I have the following request.post line:
response = requests.post(self.url, auth=(self.username, self.password), data={'json': jsondata})
where the jsondata is a string that complies with the form appropriate for the BULK api of ES
(Bulk API | Elasticsearch Guide [8.11] | Elastic).
I am using only theindex
anddelete
operations, and the_index
is the mapping that I am accessing and_id
is the object’s id:
POST _bulk
{ "index" : {"_index" : "events", "_id" : 1 } }\n { "field1" : "value1" }\n { "delete" : { "_index" : "notes", "_id" : 2 } }\n
- What information should the web setup page of the plugin contain?
- tika server URL
- CERN search app URL
- Access username / password for ES(?)
- ??