csv.org 8 KB
Newer Older
1
#+TITLE: Reading and writing CSV files
2
3
#+SETUPFILE: setup.org
#+HTML_LINK_UP: tools.html
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237

This page discusses features available in Spot's command-line
tools to produce an consume CSV files.

* Producing CSV files

All the tools that normally produce formulas (like [[file:genltl.org][=genltl=]], [[file:randltl.org][=randltl=]],
and [[file:ltlfilt.org][=ltlfilt=]]) have a [[file:ioltl.org][=--format= option]].  That can be used to
customize the way output is formatted.

For instance here is how we could use =genltl= to generate a CSV file
with three columns: the family name of the formula, its parameter, and
the formula itself.

#+BEGIN_SRC sh :results verbatim :exports both
genltl --and-gf=1..5 --u-left=1..5 --format='%F,%L,%f' > gen.csv
cat gen.csv
#+END_SRC
#+RESULTS:
#+begin_example
and-gf,1,GFp1
and-gf,2,GFp1 & GFp2
and-gf,3,GFp1 & GFp2 & GFp3
and-gf,4,GFp1 & GFp2 & GFp3 & GFp4
and-gf,5,GFp1 & GFp2 & GFp3 & GFp4 & GFp5
u-left,1,p1
u-left,2,p1 U p2
u-left,3,(p1 U p2) U p3
u-left,4,((p1 U p2) U p3) U p4
u-left,5,(((p1 U p2) U p3) U p4) U p5
#+end_example

Tools that produce automata (like [[file:ltl2tgba.org][=ltl2tgba=]], or [[file:dstar2tgba.org][=dstar2tgba=]]) have a
=--stats= option that can be used to output various statistics about
the constructed automaton (these statistics are shown *instead* of
printing the automaton).

For instance, the following command will translate all the previous
formulas, and show the resulting number of states (=%s=) and edges
(=%e=) of the automaton constructed for each formula.

#+BEGIN_SRC sh :results verbatim :exports both
genltl --and-gf=1..5 --u-left=1..5 | ltl2tgba -F- --stats '%f,%s,%e'
#+END_SRC
#+RESULTS:
#+begin_example
GFp1,1,2
G(Fp1 & Fp2),1,4
G(Fp1 & Fp2 & Fp3),1,8
G(Fp1 & Fp2 & Fp3 & Fp4),1,16
G(Fp1 & Fp2 & Fp3 & Fp4 & Fp5),1,32
p1,2,2
p1 U p2,2,3
(p1 U p2) U p3,4,10
((p1 U p2) U p3) U p4,8,34
(((p1 U p2) U p3) U p4) U p5,16,116
#+end_example

If the translated formulas may contain commas, or double-quotes, this
simple output may prove difficult to process by other tools.  For
instance consider the translation of the following two formulas:

#+BEGIN_SRC sh :results verbatim :exports both
ltl2tgba -f Xa -f 'G("switch == on" -> F"tab[3,5] < 12")' --stats '%f,%s,%e'
#+END_SRC
#+RESULTS:
: Xa,3,3
: G(!"switch == on" | F"tab[3,5] < 12"),2,4

The second line of this input does no conform to [[https://www.rfc-editor.org/rfc/rfc4180.txt][RFC 4180]] because
non-escaped fields are not allowed to contain comma or double-quotes.
To fix this, use =ltl2tgba='s =--csv-escape= option: this causes
"=%f=" to produce a double-quoted string properly escaped.

#+BEGIN_SRC sh :results verbatim :exports both
ltl2tgba -f Xa -f 'G("switch == on" -> F"tab[3,5] < 12")' --stats '%f,%s,%e' --csv-escape
#+END_SRC

#+RESULTS:
: "Xa",3,3
: "G(!""switch == on"" | F""tab[3,5] < 12"")",2,4


The tool [[file:ltlcross.org][=ltlcross=]] has its own =--csv=FILENAME= option to format the
statistics it gathers in a CSV file, but you have very little control
hover how this CSV file is formatted (it can only be changed
via option such as =--products= or =--omit-missing=).

* Reading CSV files

All the tools that read formulas from files extend the filename syntax
to support the specification of a CSV column.  The notation
=filename/COL= denotes the column =COL= of that file.

For instance let's consider the file =gen.csv= built with the first command of
this page.  It contains:

#+BEGIN_SRC sh :results verbatim :exports results
cat gen.csv
#+END_SRC
#+RESULTS:
#+begin_example
and-gf,1,GFp1
and-gf,2,GFp1 & GFp2
and-gf,3,GFp1 & GFp2 & GFp3
and-gf,4,GFp1 & GFp2 & GFp3 & GFp4
and-gf,5,GFp1 & GFp2 & GFp3 & GFp4 & GFp5
u-left,1,p1
u-left,2,p1 U p2
u-left,3,(p1 U p2) U p3
u-left,4,((p1 U p2) U p3) U p4
u-left,5,(((p1 U p2) U p3) U p4) U p5
#+end_example

We can run =ltl2tgba= on the third column to produce
the same output as in a previous example:

#+BEGIN_SRC sh :results verbatim :exports both
ltl2tgba -F gen.csv/3 --stats '%f,%s,%e'
#+END_SRC
#+RESULTS:
#+begin_example
GFp1,1,2
G(Fp1 & Fp2),1,4
G(Fp1 & Fp2 & Fp3),1,8
G(Fp1 & Fp2 & Fp3 & Fp4),1,16
G(Fp1 & Fp2 & Fp3 & Fp4 & Fp5),1,32
p1,2,2
p1 U p2,2,3
(p1 U p2) U p3,4,10
((p1 U p2) U p3) U p4,8,34
(((p1 U p2) U p3) U p4) U p5,16,116
#+end_example

When =ltlfilt= is used on a CSV file, it will preserve the
text before and after the matched formula in the CSV file.
For instance:

#+BEGIN_SRC sh :results verbatim :exports both
ltlfilt -F gen.csv/3 --size-min=8 --relabel=abc
#+END_SRC
#+RESULTS:
: and-gf,3,GFa & GFb & GFc
: and-gf,4,GFa & GFb & GFc & GFd
: and-gf,5,GFa & GFb & GFc & GFd & GFe
: u-left,5,(((a U b) U c) U d) U e

For security, in case a formula may contain double-quotes or
commas, you should use the =--csv-escape= option:

#+BEGIN_SRC sh :results verbatim :exports both
ltlfilt -F gen.csv/3 --size-min=8 --relabel=abc --csv-escape
#+END_SRC
#+RESULTS:
: and-gf,3,"GFa & GFb & GFc"
: and-gf,4,"GFa & GFb & GFc & GFd"
: and-gf,5,"GFa & GFb & GFc & GFd & GFe"
: u-left,5,"(((a U b) U c) U d) U e"

The preservation in the output of the text before and after the
selected column can be altered using the =--format= option.  The =%<=
escape sequence represent the (comma-separated) data of all the
columns before the selected column, and =%>= is the same for the
trailing data.  Note that the comma that separate formulas' column
from the other column are excluded and should be added in the format
string.

For instance this moves the first two columns after the formulas.

#+BEGIN_SRC sh :results verbatim :exports both
ltlfilt -F gen.csv/3 --size-min=8 --csv-escape --format='%f,%<'
#+END_SRC
#+RESULTS:
: "GFp1 & GFp2 & GFp3",and-gf,3
: "GFp1 & GFp2 & GFp3 & GFp4",and-gf,4
: "GFp1 & GFp2 & GFp3 & GFp4 & GFp5",and-gf,5
: "(((p1 U p2) U p3) U p4) U p5",u-left,5


Typical uses of =ltlfilt= on CSV file include:
- Filtering lines based on an LTL criterion, as above.
- Changing the syntax of LTL formulas.  For instance =ltl2tgba='s
  =--stats= option, and =ltlcross='s =--csv= option always output
  formulas in Spot's format.  If that is inappropriate, simply
  use =ltlfilt= to rewrite the relevant column in your prefered
  syntax.

* Dealing with header lines

Some CSV contain a header lines that should not be processed.
The CSV file produced by =ltlcross= have such a line:

#+BEGIN_SRC sh :results verbatim :exports both
randltl -n 2 a b | ltlfilt --remove-wm |
  ltlcross --csv=results.csv 'ltl2tgba -s %f >%N' 'ltl3ba -f %s >%N'
cat results.csv
#+END_SRC
#+RESULTS:
: "formula","tool","exit_status","exit_code","time","states","edges","transitions","acc","scc","nonacc_scc","terminal_scc","weak_scc","strong_scc","nondet_states","nondet_aut","terminal_aut","weak_aut","strong_aut","product_states","product_transitions","product_scc"
: "(1)","ltl2tgba -s %f >%N","ok",0,0.0247303,1,1,1,1,1,0,1,0,0,0,0,1,0,0,200,3994,1
: "(1)","ltl3ba -f %s >%N","ok",0,0.00314673,1,1,1,1,1,0,1,0,0,0,0,1,0,0,200,3994,1
: "(0)","ltl2tgba -s %f >%N","ok",0,0.0246916,1,1,0,1,1,1,0,0,0,0,0,1,0,0,1,0,1
: "(0)","ltl3ba -f %s >%N","ok",0,0.00343519,1,0,0,1,1,1,0,0,0,0,0,1,0,0,1,0,1
: "(!(G((F(b)) | (F(!((b) | (G(b))))))))","ltl2tgba -s %f >%N","ok",0,0.0233752,1,1,0,1,1,1,0,0,0,0,0,1,0,0,1,0,1
: "(!(G((F(b)) | (F(!((b) | (G(b))))))))","ltl3ba -f %s >%N","ok",0,0.00316933,1,0,0,1,1,1,0,0,0,0,0,1,0,0,1,0,1
: "(G((F(b)) | (F(!((b) | (G(b)))))))","ltl2tgba -s %f >%N","ok",0,0.0238983,1,1,1,1,1,0,1,0,0,0,0,1,0,0,200,4083,1
: "(G((F(b)) | (F(!((b) | (G(b)))))))","ltl3ba -f %s >%N","ok",0,0.00315896,1,1,1,1,1,0,1,0,0,0,0,1,0,0,200,4083,1

If we run =ltlfilt= on the first column, it will process the =formula=
header as if it was an LTL formula.

#+BEGIN_SRC sh :results verbatim :exports both
ltlfilt -F results.csv/1 --format='%f' --unique
#+END_SRC

#+RESULTS:
: formula
: 1
: 0
: !G(Fb | F!(b | Gb))
: G(Fb | F!(b | Gb))

In such case, the syntax =FILENAME/-COL= (with a minus sign before the
column number) can be used to discard the first line of a CSV file.

#+BEGIN_SRC sh :results verbatim :exports both
ltlfilt -F results.csv/-1 --format='%f' --unique
#+END_SRC

#+RESULTS:
: 1
: 0
: !G(Fb | F!(b | Gb))
: G(Fb | F!(b | Gb))