AplikaceAplikace
Nastavení

KonText interface: Version history

The corpus interface KonText is designed for general interaction with CNC corpora. A comprehensive list of KonText’s available functions can be found in the manual.

KonText is an extended and visually modified version of the original NoSketch Engine application. It is developed by the Institute of the Czech National Corpus (Faculty of Arts, Charles University) and the Institute of Formal and Applied Linguistics (Faculty of Mathematics and Physics, Charles University) under the GNU GPL 2 license (with Tomáš Machálek and Martin Zimandl as the main developers). Just as the NoSketch Engine, KonText uses Manatee as its backend.

The version history overview below contains only the most signicant changes as seen from the end-user perspective. A complete list of all changes and bug fixes can be found on KonText's GitHub page which also hosts a complete source code repository.

Release 0.18.0

Publication date: 7.2.2024

User changes:

  • new “Keyword Analysis” module
  • displaying translation equivalents directly in a concordance in parallel corpora by clicking on the selected word (new tokens_linking plug-in)
  • possibility to download a list of documents matching selected text types
  • JSONL as a new optional format for storing results (concordance, word list, collocations, frequency list, document list), where each document line contains a separate JSON string – the format is particularly suitable for further automated processing
  • improved linking from external applications to KonText
    • multi-step operations (e.g. query + filter) with the possibility of subsequent editing in respective query forms
    • support for non-token filter ranges when linking to KonText (e.g. “from -1s to 1s”)
  • “Federated Content Search” module supports searching in multiple corpora at the same time

Technical changes:

  • dropped support for Celery as the calculation backend (Rq remains)
  • new internal HTTP client for querying external data sources (authentication, translation equivalents, etc.)
  • improved installation script
  • KonText uses (optionally) a custom modification of Manatee-open with more statistical measures for keyword analysis

Release 0.17.0

Publication date: 17.2.2023

User changes:

  • enhanced and refined subcorpora
    • by default, every subcorpus is available to all users, addressing issues with URLs shared between users
    • if a user does not provide a description, the subcorpus remains undiscoverable
    • a subcorpus can be archived in which case all the URLs are still functional but the subcorpus won't be listed in author's subcorpora (unless explicitly specified in listing filter)
    • on the concordance query page, users can create a subcorpus draft from selected text types for future use
    • easily copy a subcorpus or create a new variant
  • a new function displays graphically the dispersion of a search term across the corpus data
  • highlighted translation equivalents (as retrieved from the Treq application) directly in the concordance
  • sharing individual frequency tables through exported URLs
    • when a frequency result page contains multiple tables, users can now easily obtain URLs for each table to share or publish the table
  • in the line selection function, users can navigate to the page with the first selected line
    • for manually categorized lines in extensive concordances where the first selected line starts far beyond the initial page, this feature enables automatic location of the first selection
  • customizable “nice” backlinks allow other applications to reference KonText results (available for easier integration with other applications)
  • detection of overly time-consuming queries for large corpora (typically the ones producing large result sets) and suggestion of an alternative corpus

Technical changes:

  • core web application framework changed from Gunicorn+Werkzeug to Sanic
  • upgrade to React 18
  • server backend rewritten with async/awit
  • checking of background tasks from the client side is now by default doe via WebSockets
  • support for Manatee 2.2xx
  • improved caching of frequency distribution results for faster navigation between result pages
  • moved from HTTP sessions stored on server to JWT
  • possibility to apply individual “cutoff” for large concordances

Release 0.16.0

Publication date: 23. 2. 2022

User changes:

  • new query type: paradigmatic query
  • enhanced “word list” query type
    • improved user interface
    • optimalization of saved results for faster subsequent access
  • query history now supports all query types (concordance, word list, paradigmatic query)
  • enhanced frequency distribution
    • graphical mode
      • special support for time-based distributions
    • displaying of confidence intervals
    • default display option can now be set by the user (tables vs. figures)
  • enhanced audio playback
    • possibility to shift the playback in time
    • waveform display
  • option to create a subcorpus directly on the concordance query page
  • search suggestion with sublemma support (syn2020, syn_v9) and faster response

Technical changes:

  • integration of a number of modules (e.g. “liveattrs”, query history) with an internal database system
  • reorganization of server code
  • transition from CSS files to Styled Components
  • Docker support
  • support for automatic testing of the user interface
  • removing unnecessary attributes from the configuration

Release 0.15.0

Publication date: 18. 12. 2020

User changes:

  • number of query types reduced to two:
    • advanced (equivalent to the original “CQL”)
    • simple
      • multi-word search
      • optional support for regular expressions
      • optional (per corpus) default search attributes
  • new calendar-based widget for specifying date intervals in the “Restrict search” section of the main query form
  • syntax_viewer plug-in enhancement – added support for new features of SYN2020
  • new query_suggest plug-in providing interactive help with writing a query
  • token_connect plug-in can be now used also as a source for an alternative KWIC detail view
    • added a new module “formatted text”
  • taghelper plug-in now supports “key-value” tagsets and it is also possible to define multiple tagsets for a corpus
  • new option for displaying additional positional attributes (below the main text tokens)
  • possibility to set any positional attribute as the main one in the concordance view
  • more user-friendly “Corpus-specific settings” module
  • redesigned “Specify context” section of the main query form
  • possibility to perform more complex queries (billion-word corpora, aligned corpora when querying only the primary language) without the web-server's time limit constraint
  • an archived URL of a frequency distribution or a collocation can be now restored even for complex queries, regardless the web server time-out

Technical changes:

  • server-side rewritten to Python 3
  • added support for a new asychronous task processing backend Rq; the new backend is now the default one
  • client-side rewritten using the same framework as in WaG
  • synchronization between the web server and the back-end worker queue rewritten in case of concordance calculation
  • changes in HTTP API

Release 0.13.0

Publication date: 9. 12. 2019

  • rewrite of HTML templates to Jinja2
  • transition to React.JS framework, which resulted mainly in extensive changes of the code and, to a lesser extent, also in user interface elements (e.g. corpus-specific view settings are now in three tabs)
  • preparing future functionality support

Release 0.12.0

Publication date: 30. 10. 2018

  • translation equivalents based on Treq directly displayed in KonText for parallel corpora (set up for InterCorp v10 and v11)
  • CQL editor with syntax highlighting and basic value validation
  • mixed mode for attribute displaying (directly in text for KWIC, on mouse-over for other tokens)
  • sharing a named subcorpus and its description with other users
  • new filter functions
    • nested matches filter
    • first hits in docs filter
  • asynchronous exports with notification
  • improved keyboard navigation on the query result page
  • possibility to minimize individual text type boxes in the subcorpus form

Release 0.11.0

Publication date: 15. 12. 2017

  • 2-dimensional frequency distribution with confidence intervals, including export of the data into Excel
  • added support for undo in the interactive text selection
  • added support for undo in the tag builder
  • improved query history
    • query history items can be archived with a custom name for later reuse
    • full query form is now saved which includes also selected texts
  • i.p.m. on demand calculation now works only in well-defined situations (i.e. subcorpus selected using the respective form, rather than a CQL query)
  • chart depicting line group proportions can be exported into Excel
  • word list
    • more convenient upload and in-browser editing of uploaded black/white-lists
    • it is now possible to go directly to the last page
  • added support for hiding individual columns of parallel corpora in concordance view

Release 0.10.0

Publication date: 11. 4. 2017

  • for spoken corpora, concordance detail views are rendered as dialogues with clear indication of speaker turns and overlaps where audio can be played back by clicking the “speaker” icon
  • documents for subcorpora can newly also be selected according to user-defined text type ratios
  • individual query processing steps within the breadcrumb navigation can now be edited, allowing the user to change the parameters of previous operations
  • working corpus can now be changed without losing other information from the current query form
  • manually grouped concordance lines are now distinguished by colours
  • web page titles contain query information (for use in bookmarks and better browser history navigation)

Release 0.9.0

Publication date: 26. 9. 2016

  • displaying of syntactic structures
  • asynchronous creation of subcorpora including a support for creating alignement-based subcorpora from parallel corpora
  • for attributes with long lists of values, a text input auto-complete function has been added for easier subcorpus creation
  • positional attributes can be displayed also on a mouse-over
  • navigation between concordance pages without reloading the whole page
  • frequency distribution and collocation results are now cached on server for faster pagination
  • user-defined numeric concordance line labels can be now renamed/removed
  • added support for displaying line numbers in a concordance

Release 0.8.0

Publication date: 8. 3. 2016

  • concordance lines can have user-defined numeric labels attached for manual grouping/categorization
  • i.p.m. calculation for ad-hoc subcorpora (on demand; previous versions calculated i.p.m. from the whole corpus which could have been confusing)
  • support for creating subcorpora based on conditions that contain different structures (e.g. <speaker sex=“male” /> and <session id=“foo.+” />)
  • added a breadcrumb navigation depicting consecutive steps that led to the current query result

Release 0.7.0

Publication date: 5. 10. 2015

  • new widget for corpus selection including favourite corpora, featured corpora etc.
  • rewritten “View” menu functions
  • enhancements of user interface usability (e.g. adding an aligned corpus)