KonText interface: Version history
The corpus interface KonText is designed for general interaction with CNC corpora. A comprehensive list of KonText’s available functions can be found in the manual.
KonText is an extended and visually modified version of the original NoSketch Engine application. It is developed by the Institute of the Czech National Corpus (Faculty of Arts, Charles University) and the Institute of Formal and Applied Linguistics (Faculty of Mathematics and Physics, Charles University) under the GNU GPL 2 license (with Tomáš Machálek and Martin Zimandl as the main developers). Just as the NoSketch Engine, KonText uses Manatee as its backend.
The version history overview below contains only the most signicant changes as seen from the end-user perspective. A complete list of all changes and bug fixes can be found on KonText's GitHub page which also hosts a complete source code repository.
Release 0.18.0
Publication date: 7.2.2024
User changes:
-
displaying translation equivalents directly in a concordance in parallel corpora by clicking on the selected word (new tokens_linking
plug-in)
possibility to download a list of documents matching selected text types
JSONL as a new optional format for storing the results (concordance, word list, collocations, frequency list, document list), where each document line contains a separate JSON string – the format is particularly suitable for further automated processing
improved linking from external applications to KonText
“Federated Content Search” module supports searching in multiple corpora at the same time
Technical changes:
dropped support for Celery as the calculation backend (Rq remains)
new internal HTTP client for querying external data sources (authentication, translation equivalents, etc.)
improved installation script
KonText uses (optionally) a custom modification of Manatee-open with more statistical measures for keyword analysis
Release 0.17.0
Publication date: 17.2.2023
User changes:
enhanced and refined subcorpora
by default, every subcorpus is available to all users, addressing issues with URLs shared between users
if a user does not provide a description, the subcorpus remains undiscoverable
a subcorpus can be archived in which case all the URLs are still functional but the subcorpus won't be listed in author's subcorpora (unless explicitly specified in listing filter)
on the concordance query page, users can create a subcorpus draft from selected text types for future use
easily copy a subcorpus or create a new variant
a new function displays graphically the dispersion of a search term across the corpus data
highlighted translation equivalents (as retrieved from the Treq application) directly in the parallel concordance
sharing individual frequency tables through exported URLs
in the line selection function, users can navigate to the page with the first selected line
customizable “nice” backlinks allow other applications to reference KonText results (available for easier integration with other applications)
detection of overly time-consuming queries for large corpora (typically the ones producing large result sets) and suggestion of an alternative corpus
Technical changes:
core web application framework changed from Gunicorn+Werkzeug to
Sanic
upgrade to React 18
server backend rewritten with async/await
checking of background tasks from the client side is now by default doe via WebSockets
support for Manatee 2.2xx
improved caching of frequency distribution results for faster navigation between result pages
moved from HTTP sessions stored on server to
JWT
possibility to apply individual “cutoff” for large concordances
Release 0.16.0
Publication date: 23. 2. 2022
User changes:
new query type: paradigmatic query
enhanced “word list” query type
query history now supports all query types (concordance, word list, paradigmatic query)
enhanced frequency distribution
enhanced audio playback
option to create a subcorpus directly on the concordance query page
search suggestion with sublemma support (syn2020, syn_v9) and faster response
Technical changes:
integration of a number of modules (e.g. “liveattrs”, query history) with an internal database system
reorganization of server code
transition from
CSS files to Styled Components
Docker support
support for automatic testing of the user interface
removing unnecessary attributes from the configuration
Release 0.15.0
Publication date: 18. 12. 2020
User changes:
number of query types reduced to two:
advanced (equivalent to the original “
CQL”)
simple
new calendar-based widget for specifying date intervals in the “Restrict search” section of the main query form
syntax_viewer plug-in enhancement – added support for new features of SYN2020
new query_suggest plug-in providing interactive help with writing a query
token_connect plug-in can be now used also as a source for an alternative
KWIC detail view
taghelper plug-in now supports “key-value” tagsets and it is also possible to define multiple tagsets for a corpus
new option for displaying additional positional attributes (below the main text tokens)
possibility to set any positional attribute as the main one in the concordance view
more user-friendly “Corpus-specific settings” module
redesigned “Specify context” section of the main query form
possibility to perform more complex queries (billion-word corpora, aligned corpora when querying only the primary language) without the web-server's time limit constraint
an archived
URL of a frequency distribution or a collocation can be now restored even for complex queries, regardless the web server time-out
Technical changes:
Release 0.13.0
Publication date: 9. 12. 2019
rewrite of
HTML templates to Jinja2
transition to React.JS framework, which resulted mainly in extensive changes of the code and, to a lesser extent, also in user interface elements (e.g. corpus-specific view settings are now in three tabs)
preparing future functionality support
Release 0.12.0
Publication date: 30. 10. 2018
translation equivalents based on Treq directly displayed in KonText for parallel corpora (set up for InterCorp v10 and v11)
CQL editor with syntax highlighting and basic value validation
mixed mode for attribute displaying (directly in text for
KWIC, on mouse-over for other tokens)
sharing a named subcorpus and its description with other users
new filter functions
asynchronous exports with notification
improved keyboard navigation on the query result page
possibility to minimize individual text type boxes in the subcorpus form
Release 0.11.0
Publication date: 15. 12. 2017
2-dimensional frequency distribution with confidence intervals, including export of the data into Excel
added support for undo in the interactive text selection
added support for undo in the tag builder
improved query history
i.p.m. on demand calculation now works only in well-defined situations (i.e. subcorpus selected using the respective form, rather than a
CQL query)
chart depicting line group proportions can be exported into Excel
word list
added support for hiding individual columns of parallel corpora in concordance view
Release 0.10.0
Publication date: 11. 4. 2017
for spoken corpora, concordance detail views are rendered as dialogues with clear indication of speaker turns and overlaps where audio can be played back by clicking the “speaker” icon
documents for subcorpora can newly also be selected according to user-defined text type ratios
individual query processing steps within the breadcrumb navigation can now be edited, allowing the user to change the parameters of previous operations
working corpus can now be changed without losing other information from the current query form
manually grouped concordance lines are now distinguished by colours
web page titles contain query information (for use in bookmarks and better browser history navigation)
Release 0.9.0
Publication date: 26. 9. 2016
displaying of syntactic structures
asynchronous creation of subcorpora including a support for creating alignement-based subcorpora from parallel corpora
for attributes with long lists of values, a text input auto-complete function has been added for easier subcorpus creation
positional attributes can be displayed also on a mouse-over
navigation between concordance pages without reloading the whole page
frequency distribution and collocation results are now cached on server for faster pagination
user-defined numeric concordance line labels can be now renamed/removed
added support for displaying line numbers in a concordance
Release 0.8.0
Publication date: 8. 3. 2016
concordance lines can have user-defined numeric labels attached for manual grouping/categorization
i.p.m. calculation for ad-hoc subcorpora (on demand; previous versions calculated i.p.m. from the whole corpus which could have been confusing)
support for creating subcorpora based on conditions that contain different structures (e.g. <speaker sex=“male” /> and <session id=“foo.+” />)
added a breadcrumb navigation depicting consecutive steps that led to the current query result
Release 0.7.0
Publication date: 5. 10. 2015
new widget for corpus selection including favourite corpora, featured corpora etc.
rewritten “View” menu functions
enhancements of user interface usability (e.g. adding an aligned corpus)