Selected applications can be queried via an API. API access requires user authentication at all times, which holds also for applications where this would not be necessary for their use via the standard user web interface. For the authentication, personal access tokens should be used.
It is possible to create any number of personal access tokens. The tokens can be deleted. After a token expires, it is necessary to create a new one.
Token is sent in a HTTP request to https://korpus.cz/login
using POST in a personal_access_token
parameter.
A session cookie will be returned that should be used in every other request. It may be useful to save the session cookie and use it repeatedly within the next few hours.
The APIs can be queried at the following URLs:
https://korpus.cz/kontext-api/v0.17
Every application also imposes its own limits. After exceeding them, there is a risk of blocking access to the API.
#!/usr/bin/env bash # Log in curl --cookie cookies.txt --cookie-jar cookies.txt -X POST -F 'personal_access_token=0a1b2c3d4e5f6-abc012...' 'https://korpus.cz/login' # Query Treq curl --cookie cookies.txt --cookie-jar cookies.txt 'https://treq.korpus.cz/api/v1?from=cs&to=en&multiword=false®ex=true&lemma=true&ci=true&pkgs[0]=SYNDICATE&pkgs[1]=CORE&query=pravda&order=perc&asc=false'
#!/usr/bin/env python3 import pickle, requests personal_access_token = '0a1b2c3d4e5f6-abc012...' cookies_file = 'cookies.pickle' with requests.Session() as s: # Load cookies try with open(cookies_file, 'rb') as f: s.cookies.update(pickle.load(f)) except FileNotFoundError: pass # Log in r = s.post('https://korpus.cz/login', data={'personal_access_token': personal_access_token}) # Creating a concordance query request_body = { "type": "concQueryArgs", "maincorp": "syn2015", "usesubcorp": None, "viewmode": "kwic", "pagesize": 40, "attrs": ["word","tag"], "attr_vmode": "visible-kwic", "base_viewattr": "word", "ctxattrs": [], "structs": ["text","p","g"], "refs": [], "fromp": 0, "shuffle": 0, "queries": [ { "qtype": "advanced", "corpname": "syn2015", "query": "[word=\"celou\"] [lemma=\"pravda\"]", "pcq_pos_neg": "pos", "include_empty": False, "default_attr": "word" } ], "text_types": {}, "context": { "fc_lemword_wsize": [-5, 5], "fc_lemword": "", "fc_lemword_type": "all", "fc_pos_wsize": [-5, 5], "fc_pos": [], "fc_pos_type": "all" }, "async": False } r = s.post('https://korpus.cz/kontext-api/v0.17/query_submit', params={'format': 'json'}, json=request_body) response_json = r.json() print(response_json) # Displaying a concordance conc_persistence_op_id = response_json['conc_persistence_op_id'] r = s.get('https://korpus.cz/kontext-api/v0.17/view', params={'format': 'json', 'q': '~'+conc_persistence_op_id}) print(r.json()) # Store cookies with open(cookies_file, 'wb') as f: pickle.dump(s.cookies, f)