Finding and replacing matched letters at same position

It is a regex question, but I could not find a proper option for my case in Wiki page so I decided to ask here. May be a simple unknown option of regex can resolve my case.

I have a log file(a.txt) which has multiple lines of strings. I want to compare every two lines (1st vs. 2nd, 3rd vs. 4th...) and replace matched letter (not a string) to "."(or any special character).

a.txt:

1100110010
1100101100
0011001100
0110101111
.
.
.

result.txt:

.....1001.
.....01100
.0.10...00
.1.01...11
.
.
.

This may be XOR problem of two strings, so I tried this way, but it needed to be converted to ASCII and then XOR is doable (may be this approach is not right). I think there is a very simple way to do this job with SED/PERL. Any suggestion and/or guidance is appreciated. Thank you for reading my question.

5 answers

  • answered 2018-01-14 06:44 Cyrus

    With bash:

    #!/bin/bash
    
    while read -r line1; do
      read -r line2
      for ((i=0; i<${#line1}; i++)); do
        if [[ "${line1:$i:1}" == "${line2:$i:1}" ]]; then
          new1+="."
          new2+="."
        else
          new1+="${line1:$i:1}"
          new2+="${line2:$i:1}"
        fi
      done
      echo "$new1"
      echo "$new2"
      unset new1 new2
    done < file
    

    Output:

    .....1001.
    .....0110.
    .0.10...00
    .1.01...11
    

  • answered 2018-01-14 06:44 Yunnosch

    Here is an answer in sed.
    It assumes that the lines are always equally long and only contain "0"s and "1"s.
    Only "0"s or "1"s especially covers the assumption 'no ">" anywhere'.
    It seems to be somewhat robust against differently long lines (I did a few simple tests), but no guarantee.

    sed -En "N;s/^(.*)\n(.*)$/>\1\n>\2/;:a;s/>([01])(.*)\n(.*)>\1/.>\2\n\3.>/;ta;s/>([^$\n])/\1>/g;ta;s/>//g;p"
    

    The code means:

    • -En use extended regexes, do not print automatically
    • N look at this and next line at once
    • s/// do a single replace, non-globally because of the absence of g
    • first replace introduces a cursor ">" at the start of both lines
    • :a introduce a label for looping
    • second replace does replace
      cursor, 0 or 1, rest of first line,
      start of second line, cursor, same 0 or 1
      by
      dot, cursor, rest of line one,
      start of second line, dot cursor
    • then, in case of successful replace, loop to label
    • otherwise third replace moves boths cursors one ahead and loops,
      except if end of line is reached
    • fourth replace remoces the cursors
    • p print result

    Output for you sample input (interleaved with sample input):

    1100110010
    1100101100
    .....1001.
    .....0110.
    0011001100
    0110101111
    .0.10...00
    .1.01...11
    

    The output differs in line two from your stated desired output, "." instead of "0",
    but with all due respect, I think your desired output is incorrect there.

    Using: GNU sed version 4.2.1

  • answered 2018-01-14 06:44 shawnhcorey

    Here is a Perl version:

    #!/usr/bin/env perl
    
    # always use these two
    use strict;
    use warnings;
    
    # handle errors in open and close
    use autodie; # See http://perldoc.perl.org/autodie.html
    
    while( ! eof( DATA ) ){
        chomp( my $line1 = <DATA> );
        chomp( my $line2 = <DATA> );
    
        my @data1 = split //, $line1;
        my @data2 = split //, $line2;
    
        # do the first
        for my $i ( 0 .. $#data1 ){
            if( $data1[$i] eq $data2[$i] ){
                print ".";
            }else{
                print $data1[$i];
            }
        }
        print "\n";
    
        # do the second
        for my $i ( 0 .. $#data2 ){
            if( $data1[$i] eq $data2[$i] ){
                print ".";
            }else{
                print $data2[$i];
            }
        }
        print "\n";
    
    }
    
    __DATA__
    1100110010
    1100101100
    0011001100
    0110101111
    

  • answered 2018-01-14 06:44 Miller

    Perl using bitwise operators:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    use v5.10;
    
    while ( !eof(DATA) ) {
        chomp( my $line1 = <DATA> );
        chomp( my $line2 = <DATA> );
    
        ( my $uniq_mask = $line1 ^ $line2 ) =~ s/[^\0]/\xFF/g;
    
        my $uniq1 = $line1;
        my $uniq2 = $line2;
    
        for ( $uniq1, $uniq2 ) {
            $_ &= $uniq_mask;
            s/\0/./g;
        }
    
        say for $line1, $line2, $uniq1, $uniq2, '';
    }
    
    __DATA__
    1100110010
    1100101100
    0011001100
    0110101111
    

    Outputs:

    1100110010
    1100101100
    .....1001.
    .....0110.
    
    0011001100
    0110101111
    .0.10...00
    .1.01...11
    

  • answered 2018-01-14 06:44 ikegami

    Since you mentioned xor,

    my $xor = $s1 ^ $s2;
    my $mask = $xor =~ tr/\x01-\xFF/\xFF/r;
    my $dots = $xor =~ tr/\x00\x01-\xFF/.\x00/r;
    
    say $s1 & $mask | $dots;
    say $s2 & $mask | $dots;
    

    Assumes the line feed has been removed, and assumes the lenght of $s1 and $s2 are the same.