Skip to main content

Perl script for comparing files: List missing lines, regardless of order

The other day, I was comparing two different sitemap files of the same site. One had more links than the other, and I was trying to get a list of what was missing from the shorter one. However, since they were from different sitemap generators, the order of the links were completely different in each file.

Surprisingly, this turned out to be a much bigger challenge than I thought. I figured I could use some variation of a grep command line, or diff, but I wasn't able to find a simple combination of command line options for either that would do what I was looking for. It seems like everything I found was more geared toward comparing files that were in the same order. Diff simply dumped a large list of all the lines in file2; since the order was different than file1, every line was considered a mismatch.

Knowing this was a fairly trivial operation to do in Perl, I decide to write a quick script to do it. I'm sharing it here in case it can benefit anyone else:

#!/usr/bin/perl

# The purpose of this script is to print the lines in file2 that are not present in file1, regardless of order.

unless ($#ARGV>=1) {die "Usage: different-lines.pl [file1.txt] [file2.txt]";}

open (FILE1,$ARGV[0]) || die "Unable to open $ARGV[0]: $!\n";
open (FILE2,$ARGV[1]) || die "Unable to open $ARGV[1]: $!\n";

# Store the contents of file1 in array
while (<FILE1>) {push (@lines_one,$_);}

close FILE1;

# Iterate through each line of file2, checking for presence in file1, and setting a flag if it's found.
while (<FILE2>) {
        $flag=0;
        foreach $line (@lines_one) {
                $line=~s/\s+$//g;
                s/\s+$//g;

                if ($line =~ /$_/) {$flag=1; last;}
        }
        unless ($flag) {push (@missing_from_1, $_);}
}

close FILE2;

# Dump the results (missing lines)
foreach $line (@missing_from_1) {
        print $line."\n";
}

Comments

Popular posts from this blog

Timbaland rips off a Demoscene artist

I knew this day would come. The new Timbaland/Nelly Furtado song "Do It" uses a song made in 2000 by Finnish demoscene artist "Tempest" (Janne Suni). It's a 4 channel .mod (the ripoff is from a playback using the C64 SID soundchip). The song was hosted on scene.org's servers (the main repository for all everyones demos and tracked music, etc.). As you might expect, no permission or royalties were paid to Tempest. Just to clarify, we're not talking about some kind of coincidence here. There is no question that this track was used to create the song "Do It". In an interview, Timbaland tries to downplay it, saying things like "he sampled it from a video game". (This track was not written for a video game- it was actually written for the 2000 demoscene music competition, in which it won 1st place). Regardless, he basically claims he has no legal obligations because it's just like all the other pop artists that sample other m

Reaper, Linux, and the Behringer X-Air - Complete Studio Solution, Part 1

Introduction and Rationale This is part one of a major effort to document my experiences with recreating my home studio, entirely using Linux.  Without getting into too many of the specifics, a few months ago I decided that I was unhappy with Windows' shenanigans - to the point that I was ready to make a serious attempt to leave it behind.  For most in this situation, the obvious choice is to switch to Mac OS.  With its proven track record, support, and options for multimedia production, it is naturally the first alternative to consider if your goal is to simply use something other than Windows. For me the choice was not so simple. I despise Mac OS and, in general, the goals and philosophies put forth by Apple in an effort to ostensibly provide users with an "easy" working environment.  It does not help that I have also failed to find any aspect of the Mac OS UI intuitive, but I realize that this is a subjective matter. With my IT background and user-control* favori

Windows 8 audio clicks and glitches narrowed down to Malwarebytes

Ever since I got my Windows 8 PC, I have been having serious problems with audio.  Basically all sound playback on my system experiences a brief  but frequent click, skip, glitch, stutter, whatever you prefer.  I can reproduce the issue on any sound card or firewire sound interface (devices tested include the onboard Conexant SmartAudio HD, my external Phonic Helix 12, and my Edirol FA-101).  All of them seem to have audio clicks, with the firewire interfaces' clicks seeming more harsh for whatever reason.