Самые трудные формальные языки
В теории формальных языков большое значение имеет понятие гомоморфной сводимости: язык называется сводимым к другому, если он представляется в виде его прообраза при каком-то гомоморфизме строк. Самым трудным в семействе языков называется тот, к которому сводим любой другой; классическими являются такие результаты, как существование самого трудного языка для обыкновенных (бесконтекстных) грамматик и несуществование такового для линейных грамматик. Данная магистрская работа исследует существование самого трудного языка для двух семейств: двусторонних автоматов линейного времени (родственных односторонним автоматам реального времени, которые уже были рассмотрены в бакалаврской работе) и LL(k) грамматик (классического подсемейства обыкновенных грамматик, разбираемого за линейное время). Для обоих семейств успешно построен самый трудный язык: в первом случае применён метод сигналов, чтобы смоделировать функцию перехода как поиск в фиксированной таблице; во втором случае разработана конструкция-“скрепка”, позволяющая детерминистически переносить информацию о разборе между соседними образами символов.
1 Introduction 3
2 Linear-time cellular automata 4
3 The hardest language for linear-time cellular automata 5
4 LL grammars 12
5 The hardest LL(0) language 14
6 Non-existence under standard definitions 17
7 Hardest language for LL(k) grammars 18
8 Conclusions
24
The notion of a complete set for a family of formal languages is among the central concepts of theoretical computer science. Given a reduction mechanism, such as logarithmic-space Turing machines, a language, to which every language from a family is reducible, is called hard for that family. If, furthermore, this language belongs to the family, it is complete for that family.
For this definition to make sense, the reduction mechanism should be computationally weaker than the family itself. Furthermore, the weaker the reduction mechanism, the stronger are the results on the completeness of some language with respect to those reductions. The weakest reduction mechanism are homomorphisms, under which every symbol a from the alphabet of the given language is mapped to a substring h(a) over the alphabet of the hard language. There is a notable result involving such reductions: Greibach’s [8] “hardest context-free language”, that is, a fixed language L0 over an alphabet Σ0 defined by a formal grammar, to which one can reduce every language L defined by a grammar over any alphabet Σ, by a suitable homomorphism h: Σ∗ → Σ∗0, so that a nonempty string w ∈ Σ+ belongs to L if and only if its image h(w) is in L0.
Greibach’s result inspired a line of research on the existence of hardest languages under homomorphic reductions in various families of formal languages. Already Greibach [9] proved the first negative result, that for the family of languages described by LR(1) grammars, or, equivalently, recognized by deterministic pushdown automata, there cannot exist such a hardest language. For another important subfamily of grammars, the linear grammars, non-existence of hardest languages was established by Boasson and Nivat [3]. On the other hand, as proved by Okhotin [24], two generalizations of ordinary (“context-free”) grammars do have hardest languages: these are conjunctive grammars [20, 23], with an explicit conjunction operation in the rules, and Boolean grammars [22], which are further equipped with a negation operation.
Hardest languages under homomorphic reductions have been investigated beyond formal grammars. For the family of regular languages, Cˇul ́ık and Maurer [7] proved that a hardest language under homomorphic reductions cannot exist; a related stronger result was later estab- lished by Harju, Karhum ̈aki and Kleijn [10]. Autebert [1] obtained the same negative result for one-counter automata. On the other hand, Greibach [8] proved a hardest language theorem for a certain restricted type of Turing machines, whereas Cˇul ́ık and Maurer [7] found a hardest language in the complexity class NSPACE(n); their argument extends to NSPACE(f(n)) and DSPACE(f(n)) for every space-constructible function f.
I have already researched the problem of hardest languages as my Bachelor’s thesis. The first topic of research was the family generated by grammars with one-sided context operators [2]—an extension of conjunctive grammars with a special operator that restricts the form of the prefix of the entire string preceding the derived substring. This allows the grammar to succinctly express “declaration before use” and similar conditions. As it turned out, the construction of a hardest language for conjunctive grammars could be augmented to handle this extension as well [17]. The second topic of my research was the family of linear conjunctive grammars—an intermediate step between linear and conjunctive grammars. The key to solving this problem was using an alternative characterization [21] of the generated languages as those recognized by one-way real-time cellular automata (also known as trellis automata): a classical model, first studied by Smith [26], that represents the process of parsing a string as a linear array of cells, each initialized with a single symbol of the string, passing finite information to their neighbours in one direction in discrete steps.
The present work continues this research by constructing a hardest language for a close rel- ative of the latter model—[two-way] linear-time cellular automata, which have been introduced by Smith [26] as well, but have since been found to be much more powerful than trellis automata: forinstance,theycanrecognizethelanguages{a2n |n 0}[5]and{anbin|n,i 1}[4].There is an important theoretical result on linear speed-up for this model: every language recognized by a cellular automaton in time C · n can be recognized in time (1 + ε)n, see Ibarra et al. [12].
3
LL
LR
UnambLin
Unamb
Lin
Ordinary (CF)
Conj+
UnambTAG TAG Multi
Reg LLLin
LRLin
IDPDA
Bool Conj
UnambBool
RT-CA
UnambConj LinConj=TA
Figure 1: Inclusion graph of language families for which the problem has already been solved (circled: hardest language exists; crossed: hardest language does not exist; both: hardest lan- guage exists, but only under a slightly relaxed definition), alongside some intermediate and related families (neither circled nor crossed).
Whether these automata are equivalent to the intermediate model of [two-way] real-time cel- lular automata operating in time exactly n, is a long-standing open problem raised already by Smith [26] and addressed many times in the literature. For more details about cellular automata as language-recognition devices, the reader is directed to a survey by Terrier [28].
Additionally, this work also investigates the existence of hardest languages for the classical language family parsable in linear time, the LL(k) languages, which are recognized by top-down parsers with a k-symbol lookahead. This family is not closed under inverse homomorphisms, see Lehtinen and Okhotin [15]. This does not affect possible existence of hardest languages, but that if they do exist, then their inverse homomorphic images would include languages beyond LL(k). Indeed, in spite of this non-closure, there is a hardest language theorem for LL: the first such result, presented in Section 5, is that LL(1) grammars in the Greibach normal form (these are “simple grammars” of Korenjak and Hopcroft [13]) have a hardest language with respect to reductions by homomorphisms. It is natural to try extending this result to the entire LL(k) hierarchy. An apparent obstacle is presented in Section 6: it is proved that already for the family of LL(1) languages, there is no hardest language under the strict definition of Greibach, that is, with respect to reductions by homomorphisms. This obstacle leads to a small modification of the definition of hardest languages: as shown in Section 7, if a single end-marker is appended to the input, then a hardest LL(k) language does exist. In fact, it can be generated by a single LL(1) grammar in the Greibach normal form uniformly for all k.
With the new contributions of this thesis, the current state of the art in hardest languages for basic language families is presented in Figure 1 (with newly proved results framed). For families represented by black circles, namely for multi-component grammars of several kinds as well as for unambiguous variations of ordinary, conjunctive and Boolean grammars, the existence of hardest languages remains an open problem. The results of this thesis have been accepted for presentation at formal language conferences: at LATA 2021 [18] for the results of Sections 2–3, and at DLT 2021 [19] for the results of Sections 4–7.
It has been proved that linear-time cellular automata admit a hardest language, but for LL(k) languages the answer is more complicated. Whereas LL(0) languages have a hardest language is the strict sense of Greibach, LL(k) languages admit one if the definition is slightly relaxed by adding an end-marker; in this case, there is a single hardest language for the whole LL(k) hierarchy, defined by an LL(0) grammar.
Firstly, of big interest is the family of two-way real-time cellular automata [26], intermediate between one-way real-time and two-way linear-time. While it it known to be separated from the former, no such result is known for the latter, only conditional ones: Ibarra and Jiang [11] showed that if the family accepted by real-time cellular automata is closed under reversal, then this model is equivalent in power to linear-time cellular automata, while Terrier [27] proved the same for the closure under cyclic shift. On the other hand, if this family has no hardest language under homomorphic reductions, this would now imply that real-time cellular automata and linear-time cellular automata define different families of languages.
Secondly, the way the hardest language for LL(k) was constructed raises another question: are there any similar examples of “hardest languages” existing under minimally relaxed defini- tions for language families known not to contain hardest languages in the sense of Greibach (for example, the closely related LR languages)?
Finally, there are still more language families that yet remain resilient to attempts to con- struct a hardest language or disprove its existence: for example, unambiguous grammars, for which the problem has been formulated almost half a century ago by Boasson and Nivat [3], and their relatives like unambiguous conjunctive and unambiguous boolean grammars.
[1] J.-M. Autebert, “Non-principalit ́e du cylindre des langages `a compteur”, Mathematical Systems Theory, 11:1 (1977), 157–167.
[2] M. Barash, A. Okhotin, “An extension of context-free grammars with one-sided context specifications”, Information and Computation, 237 (2014), 268–293.
[3] L. Boasson, M. Nivat, “Le cylindre des langages lin ́eaires”, Mathematical Systems Theory, 11 (1977), 147–155.
[4] W. Bucher, K. Cˇul ́ık II, “On real time and linear time cellular automata”, RAIRO Infor- matique Th ́eorique et Applications, 18:4 (1984), 307–325.
[5] Ch. Choffrut, K. Cˇul ́ık II, “On real-time cellular automata and trellis automata”, Acta Informatica, 21 (1984), 393–407.
[6] K. Cˇul ́ık II, “Variations of the firing squad problem and applications”, Information Pro- cessing Letters, 30:3 (1989), 152–157.
[7] K. Cˇul ́ık II, H. A. Maurer, “On simple representations of language families”, RAIRO In- formatique Th ́eorique et Applications, 13:3 (1979), 241–250.
[8] S. A. Greibach, “The hardest context-free language”, SIAM Journal on Computing, 2:4 (1973), 304–310.
[9] S. A. Greibach, “Jump PDA’s and hierarchies of deterministic context-free languages”, SIAM Journal on Computing, 3:2 (1974), 111–127.
[10] T. Harju, J. Karhum ̈aki, H. C. M. Kleijn, “On morphic generation of regular languages”, Discrete Applied Mathematics, 15 (1986), 55–60.
[11] O. H. Ibarra, T. Jiang, “Relating the power of cellular arrays to their closure properties”, Theoretical Computer Science, 57 (1988), 225–238.
[12] O. H. Ibarra, M. A. Palis, S. M. Kim, “Some results concerning linear iterative (systolic) arrays”, Journal of Parallel and Distributed Computing, 2:2 (1985), 182–218.
[13] A. J. Korenjak, J. E. Hopcroft, “Simple deterministic languages”, 7th Annual Symposium on Switching and Automata Theory (SWAT 1966, Berkeley, California, USA, 23–25 October 1966), IEEE Computer Society, 36–46.
[14] R. Kurki-Suonio, “Notes on top-down languages”, BIT Numerical Mathematics, 9:3 (1969), 225–238.
[15] T. Lehtinen, A. Okhotin, “Homomorphisms preserving deterministic context-free lan- guages”, International Journal of Foundations of Computer Science, 24:7 (2013), 1049– 1066.
[16] J. Mazoyer, “A six-state minimal time solution to the firing squad synchronization problem” Theoretical Computer Science, 50 (1987), 183–238.
[17] M. Mrykhin, A. Okhotin, “The hardest language for grammars with context operators”, arXiv:2012.03596 [cs:FL] (2020).
[18] M. Mrykhin, A. Okhotin, “On hardest languages for one-dimensional cellular automata”, Language and Automata Theory and Applications (LATA 2021, Milan, Italy, 1–5 March 2021), LNCS 12638, 118–130.
25
[19] M. Mrykhin, A. Okhotin, “The hardest LL(k) language”, International Conference on De- velopments in Language Theory (DLT 2021, Porto, Portugal, 16–20 August 2021), to ap- pear.
[20] A. Okhotin, “Conjunctive grammars”, Journal of Automata, Languages and Combinatorics, 6:4 (2001), 519–535.
[21] A. Okhotin, “On the equivalence of linear conjunctive grammars and trellis automata”, RAIRO Informatique Th ́eorique et Applications, 38:1 (2004), 69–88.
[22] A. Okhotin, “Boolean grammars”, Information and Computation, 194:1 (2004), 19–48.
[23] A. Okhotin, “A tale of conjunctive grammars”, Developments in Language Theory (DLT
2018, Tokyo, Japan, 10–14 September 2018), LNCS 11088, 36–59.
[24] A. Okhotin, “Hardest languages for conjunctive and Boolean grammars”, Information and
Computation, 266 (2019), 1–18.
[25] D. J. Rosenkrantz, R. E. Stearns, “Properties of deterministic top-down grammars”, Infor-
mation and Control, 17 (1970), 226–256.
[26] A. R. Smith III, “Real-time language recognition by one-dimensional cellular automata”,
Journal of Computer and System Sciences, 6 (1972), 233–252.
[27] V. Terrier, “Closure properties of cellular automata”, Theoretical Computer Science, 352:1–
3 (2006), 97–107.
[28] V. Terrier, “Language recognition by cellular automata”, in: G. Rozenberg, Th. B ̈ack, J.
N. Kok (Eds.), Handbook of Natural Computing, Springer, 2012, 123–158.
Последние выполненные заказы
Хочешь уникальную работу?
Больше 3 000 экспертов уже готовы начать работу над твоим проектом!