I have a project for translating an exercise in Physic into known variables. I use awk only for do that. This is the exercise example in Indonesian:
Dari salah satu bagian gedung yang tingginya 20 m, dua buah batu dijatuhkan secara berurutan. Massa kedua batu masing-masing 1/2 kg dan 5 kg. Bila percepatan gravitasi bumi di tempat itu g = 10 m/s2, tentukan waktu jatuh untuk kedua batu itu (Abaikan gesekan udara).
I should translate it into known variables (diketahui -Indonesian) like this:
Tinggi = 20 m
Massa 1 = 1/2 kg
Massa 2 = 5 kg
Gravitasi = 10 m/s2
Actually, I have tried awk and stuck on this code for getting numbers.
{ for(i=1; i<=NF; i++){ if($i ~ /^[[:digit:]]+/) print $i } }
And this second code for getting units (like m, kg, m/s2).
{ for(i=1; i<=NF; i++){ if(($i ~ /^m\/s2/) || ($i ~ /^kg$/) || ($i ~ /^m$/)) print $i } }
And I have tried to join those two codes into one.
BEGIN { FS = "[, ]+" } #getting units { for(i=1; i<=NF; i++){ if(($i ~ /^m\/s2/) || ($i ~ /^kg$/) || ($i ~ /^m$/)) print $i } } #getting numbers { for(i=1; i<=NF; i++){ if($i ~ /^[[:digit:]]+/) print $i } }
Result
master@master:~/Dokumen/Pelajaran/Semester 4/Pak Anom$ awk -f plasma.awk soal1
m
20
kg
m/s2
1/2
5
10
master@master:~/Dokumen/Pelajaran/Semester 4/Pak Anom$
But all fail. What makes me fail? Because I don't understand awk syntax and logic. After I asked Stackoverflow (you can see my question at http://stackoverflow.com/questions/17312343/parsing-physic-exercise-in-awk), two or five minutes later I get the answer. So quick. The best code was this:
{ for(i=1;i<=NF;i++) { gsub(/[,.]/,"",$(i+1)) if($i~/^[[:digit:]]/ && $(i+1)=="m") { print "Height = "$i,$(i+1) } else if($i~/^[[:digit:]]/ && $(i+1)=="kg") { print "Mass "++x" = "$i,$(i+1) } else if($i~/^[[:digit:]]/ && $(i+1)=="m/s2") { print "Gravity = "$i,$(i+1) } } }
Result
Height = 20 m
Mass 1 = 1/2 kg
Mass 2 = 5 kg
Gravity = 10 m/s2
Short Analysis
My first code works in my thought baseline. I should scan all field (read: column) and save every founded pattern into variable then print the variable content. But the problem, main problem is I don't understand how to use variable in awk.
for(i=1; i<=NF; i++)
This code, awk for() looping, is same with C for() looping. The main difference is NF variable. This is built-in variable in awk used for Number of Field (read: number of column). So, with this for() looping I scan my whole exercise. I use i variable for counting field by field.
if($i ~ /^[[:digit:]]+/)
This code, if() statement, used for searching pattern. Basically, this if() statement does saving any matching pattern with regex I specified, into $i variable. Remember, $i variable. It is different with just i. My regex for this is /^[[:digit:]]+/ that means:
- ^ = must at first place, avoid the pattern match in after or in middle. Must in the front. So, every pattern should be at first appear in the word. Example: /^anu/ is match with anu1 and anu3, not match with banu, 8anu, or every pattern not placing anu at first. That is ^ (carat).
- [[:digit:]] = POSIX style regex for every numbers. It limits pattern for only number, no alphabet or strange character can enter.
- + = one or many. It causes [[:digit:]] regex can used for 20. Without +, it only can match single number like 1, 3, 5, and so on. And + causes [[:digit:]] regex never match empty character, it must mach at least 1 character. So, with this +, I can scan number at least 0 until infinite.
if(($i ~ /^m\/s2/) || ($i ~ /^kg$/) || ($i ~ /^m$/))
What is the meaning of that code? Actually simple. We can divide that into 3 parts. First:
($i ~ /^m\/s2/)
Second:
($i ~ /^kg$/)
Third:
($i ~ /^m$/)
and those 3 parts connected by || operator. Those code mean: m/s2 OR kg OR m is match and save it on $i variable. Why we must write m\/s2 and not directly m/s2? Because awk differ pattern and internal awk code. Actually / (slash) is internal awk code, used for limiting regex. Any code snipped between two slashes (//), is pattern. So, when we want to search slash (/), we should inform awk that this slash is not internal awk code. How? Just use awk escape character named backslash (\) persistent in front of slash (/).
And what is $ (dollar) sign? It is opponent of ^ sign. If ^ is in front, then $ is in rear. If there is /anu$/ regex pattern, then it is match with lanu, 8anu, banu, any word contains anu pattern in rear. This /anu$/ is not match with anu9, anuyu, anum, any word contains character after anu.
My third code, the joined code, contains my first and second code. But there is difference.
BEGIN { FS = "[, ]+" }
What is that? This code section contains FS variable which is built-in awk variable for Field Separator customization. I put comma (,) and space ( ) as field separator. So, if there is m, or kg, or m/s, on my exercise, awk will never consider comma. In other word, I specify field separator so awk just take any unit but not the comma. So, later I can get output kg only not kg, (notice the comma).
No comments:
Post a Comment