BASIC PRINCIPLES OF CORPUS LINGUISTIC PROCESSING OF LEGISLATIVE TEXTS

  • Tinatin Tenterashvili PhD Student in Philology, Ivane Javakhishvili Tbilisi State University, Ilia Tchavchavadze Avenue 1, Tbilisi, 0179, Georgia, http://orcid.org/0009-0002-8113-7115

Abstract

One of the important steps in the modern stage of the development of humanitarian sciences is the creation of big databases and the development of modern research methods. The existence of digital corpora provides an opportunity to carry out efficient research, which contributes to both the solution of a separate linguistic issue and the implementation of complex linguistic research. One such resource for Georgian language is the Georgian National Corpus, which has 202 mln tokens and including texts of different genres, is extended with two thematic corpora: a corpus of political and legal texts, although neither of them is completely filled. In particular, the sub-corpus of legal texts includes only historical legal documents. From this point of view, the genre balancing of the Georgian National Corpus is an important challenge, which is necessary to overcome for its final perfection and full functioning. One of the ways to solve the mentioned problem is to create a technological framework for digital processing of legislative texts using modern research methods.

For digital processing of the texts, it is necessary them to be available in electronic format. Nowadays, the legislative texts have already been digitized. Their electronic versions are available on the website matsne.gov.ge, however, they are not linguistically processed and therefore, lack the possibility of such research. We will present to you one of the ways to solve this issue in our report, in which we will describe the basic principles of corpus linguistic processing of legal texts on the example of specific legal texts. The obtained resource was an empirical base for linguistic research, based on which it was analyzed linguistic means with legal semantics.

The content of legal texts is transferred in different linguistic forms. From the point of view of studying sectoral vocabulary in legal texts, we should focus on cases when the vocabulary, transferring sectoral content, consists of more than one component and acquires a specific meaning for the sector only with this form. If we consider them independently of each other, in most cases, they will not be related to the legal field. In this sense, two-component verbs are important, which are given by a combination of the main verb and a noun or an infinitive closely related with it, For example, imprisonment, sentencing, etc. Such combinations are often found in legal texts to describe legal processes.

Identifying and taking into account characteristic features of legal texts is important for the corpus linguistic processing of legal texts on the one hand, and for filling the bank of legal terms on the other hand, since two-component verbs should be processed corpus-linguistically not at the level of individual lexical units, but in the form in which they are related to the field of law.

 

Keywords: GNC, Corpus Linguistics, Legislative, Term, Two-Component.

Published
2025-06-23
Section
SCIENTIFIC ARTICLES - LINGUISTICS SECTION