perl expression find & replace to modify an external function definition

Is it possible to use perl to remove parameters from a function definition? eg if my file contains the following text:

var1 = myfunc(sss,'ROW DRILL',1,1,0);
var2 = myfunc(fff,'COL DRILL',1,1,0);
var3 = myfunc(anyAlphaNum123,'anyAlphaNum123 or space',1,1,0);
donotcapture=myfunc2(rr,'some string',1,1,0); 

I need to change it so that it becomes:

var1 = myfunc(sss,'ROW DRILL');
var2 = myfunc(fff,'COL DRILL');
var3 = myfunc(anyAlphaNum123,'anyAlphaNum123 or space');
donotcapture=myfunc2(rr,'some string',1,1,0);

Essentially just removing ,1,1,0 from all instances where myfunc is called, but preserving the first two parameters.

I have tried the following, but this approach would mean I have to write rules for each permutation...

perl -pi -w -e "s/myfunc\(rr,'COL SUBSET',1,1,0\)/myfunc\(rr,'COL SUBSET'\)/g;" *.txt

1 answer

  • answered 2017-06-17 19:48 Yunnosch

    • In order to reduce complexity, generalize your regex, using flexible regexes.
      • The regex for "anything between ( and ,, except , " : \(([^,]+),
      • The regex for "anything between ' and ', except ' " : \'([^']+)\'
    • In order to get the output right for the specific input (in spite of the flexibility),
      use capture groups, i.e. (...).
      They populate variables, which you can use as $1 in the substitution.
    • To prevent matching functions with names ending in your functions name, e.g. notmyfunc(),
      use the regex for word boundary, i.e. \b.
    • Ikegamis edit (separated to keep visible what you and I learned the hard way):
      • Avoid double-quotes for the program argument.
        It's just asking for trouble and requires so much extra escaping.
        Note that \x27 is a single quote when used inside double-quotes or regex literals.
        \' -> \x27
      • Only use one capture group (myfunc\([^,]+,\x27[^\x27]+\x27)
      • Remove the ;, which is not needed for a single statement.
      • Add a . to the input file wildcard, assuming you actually meant it like that.

    Working code
    (Comparing to chat note the \((; the backslash got lost, eaten by the chat I believe.):

    perl -pi -w -e "s/(\bmyfunc)\(([^,]+),\'([^']+)\'(?:,\d+){3}\)/\$1\(\$2,\'\$3\'\)/g;" *txt
    

    Ikegamis nice edit
    (The detail which was so time-consuming in our chat is not easily visible anymore,
    because the ( for the capture group was moved somewhere else.):

    perl -i -wpe's/\b(myfunc\([^,]+,\x27[^\x27]+\x27)(?:,\d+){3}\)/$1)/g' *.txt
    

    Input:

    var1 = myfunc(sss,'ROW DRILL',1,1,0);
    var2 = myfunc(fff,'COL DRILL',1,1,0);
    var3 = myfunc(s,'ROW SUBSET',1,1,0);
    var4 = myfunc(rr,'COL SUBSET',1,1,0);
    var5 = myfunc(rr,'COL SUBSET',2,12,50); with different values
    var6 = notmyfunc(rr,'COL SUBSET',1,1,0); tricky differet name
    var1 = myfunc(sss,'ROW DRILL',1,1,0);
    var2 = myfunc(fff,'COL DRILL',1,1,0);
    var3 = myfunc(anyAlphaNum123,'anyAlphaNum123 or space',1,1,0);
    donotcapture=myfunc2(rr,'some string',1,1,0);
    

    Output (version "even more relaxed"):

    var1 = myfunc(sss,'ROW DRILL');
    var2 = myfunc(fff,'COL DRILL');
    var3 = myfunc(s,'ROW SUBSET');
    var4 = myfunc(rr,'COL SUBSET');
    var5 = myfunc(rr,'COL SUBSET'); with different values
    var6 = notmyfunc(rr,'COL SUBSET',1,1,0); tricky differet name
    var1 = myfunc(sss,'ROW DRILL');
    var2 = myfunc(fff,'COL DRILL');
    var3 = myfunc(anyAlphaNum123,'anyAlphaNum123 or space');
    donotcapture=myfunc2(rr,'some string',1,1,0);
    

    Lessons learned:

    • I made a habit of creating regexes as tightly fitting the input as possible.
      But that caught us/me unprepared, when applied to sample input by someone inexperienced with regexes. (Absolutely no blame on you.)
    • Posting code-quotes into chat is dangerous, be careful with \.