score: has_bounded_lag works on different automata
The commit f893b2f1 had a huge impact on has-bounded-lag:
0 1
0.67 0.00 a.has_bounded_lag() # a = std(['(a,x)'-'(b,y)']*{600}), 20x
1.18 0.24 a.has_bounded_lag() # a = std(['(a,x)'-'(b,y)']{1000}*), 10x
0 1
0. v2.3-041-g869ee51 style: minor improvements
1. v2.3-042-gf893b2f tupleset: fix the generation of generators
but actually, it reveals a nasty problem. The benches use expressions such as ['(a,x)'-'(b,y)']{1000}*
whose meaning has changed in f893b2f1 since now there can be 'letters' with the empty word on one of the tapes. This completely changes the meaning of the expression, and the automata are different, so of course, the results are different too.
The following program helps to contrast both results:
import vcsn
ctx = vcsn.context("lat<lan_char(abc), lan_char(xyz)>, b")
print(ctx.expression("['(a,x)'-'(b,y)']"))
e = ctx.expression("['(a,x)'-'(b,y)']{1000}*")
a = e.standard()
print("{:i}".format(a))
print(a.has_bounded_lag())
With the "old" vcsn:
a|x+a|y+a|z+b|x+b|y
type: mutable_automaton<lat<nullableset<letterset<char_letters(abc)>>, nullableset<letterset<char_letters(xyz)>>>, b>
number of states: 5001
number of lazy states: 0
number of initial states: 1
number of final states: 6
number of accessible states: 5001
number of coaccessible states: 5001
number of useful states: 5001
number of codeterministic states: N/A
number of deterministic states: N/A
number of transitions: 25005
number of spontaneous transitions: 0
is complete: N/A
is deterministic: N/A
is codeterministic: N/A
is empty: 0
is eps-acyclic: 1
is normalized: 0
is proper: 1
is standard: 1
is trim: 1
is useless: 0
is valid: 1
True
With the current version:
a|x+a|y+a|z+b|\e+b|x+b|y
type: mutable_automaton<lat<nullableset<letterset<char_letters(abc)>>, nullableset<letterset<char_letters(xyz)>>>, b>
number of states: 6001
number of lazy states: 0
number of initial states: 1
number of final states: 7
number of accessible states: 6001
number of coaccessible states: 6001
number of useful states: 6001
number of codeterministic states: N/A
number of deterministic states: N/A
number of transitions: 36006
number of spontaneous transitions: 0
is complete: N/A
is deterministic: N/A
is codeterministic: N/A
is empty: 0
is eps-acyclic: 1
is normalized: 0
is proper: 1
is standard: 1
is trim: 1
is useless: 0
is valid: 1
False
It's easy to fix vcsn-score. The question is also: is this change of semantics sane?
The original motivation to allow \e|x
and the like in the generators, is to support derivation in derived-term for multitape automata.
Note that a bench about compose on Thompson automata has exactly the same problem:
akim@erebus ~/src/lrde/2 $ v score-compare -O 'lag|compose' +scores/36s/{v2.3-041-g869ee51,v2.3-042-gf893b2f}
0 1
0.57 0.56 a.compose(a) # a = std(((\e|a+a|b+b|c+c|d+d|a)(w|x+x|y+y|z+z|w+z|\e)){300}), c = [a-z]?x[a-z]? -> B, 20x
0.83 0.80 a.compose(a) # a = std(((a|b+b|c+c|d+d|a)(w|x+x|y+y|z+z|w)){300}), c = [a-z]?x[a-z]? -> B, 50x
0.66 0.62 a.compose(a) # a = std(((a|b+b|c+c|d+d|a)(w|x+x|y+y|z+z|w)){300}), c = [a-z]x[a-z] -> B, 50x
6.49 6.43 a.compose(a) # a = std(['(a,a)'-'(i,z)']{4}), c = [a-z]x[a-z] -> B
0.78 0.79 a.compose(a) # a = thm(((\e|a+a|b+b|c+c|d+d|a)(w|x+x|y+y|z+z|w+z|\e)){300}), c = [a-z]?x[a-z]? -> B, 10x
0.74 0.75 a.compose(a) # a = thm(((a|b+b|c+c|d+d|a)(w|x+x|y+y|z+z|w)){300}), c = [a-z]?x[a-z]? -> B, 20x
0.68 1.98 a.compose(a) # a = thm(['(a,a)'-'(i,z)']{4}), c = [a-z]?x[a-z]? -> B, 2x
0.67 0.00 a.has_bounded_lag() # a = std(['(a,x)'-'(b,y)']*{600}), 20x
1.18 0.24 a.has_bounded_lag() # a = std(['(a,x)'-'(b,y)']{1000}*), 10x
0 1
0. v2.3-041-g869ee51 style: minor improvements
1. v2.3-042-gf893b2f tupleset: fix the generation of generators