Cleaning house with Ruby
Tuesday, August 8th, 2006I started to consolidate all my old laptops today so I copied my iTunes library from the different macs and noticed that I had all sorts of duplicate files with slightly different names on em.
As usual the solution is a small shell script. In this case in Ruby. This tiny little script recurses thru a directory structure (my iTunes lib is what I used) and SHA-1 hashes the contents of every file. It then moves any dupe files to the trash for my perusal. I figure if I wrote this in C++ or in Cocoa it would have taken me much much longer, even though I’m a total rookie Rubyist.
Scripting for the win
#!/usr/local/bin/ruby
require 'digest/sha1'
require 'FileUtils'
map = Hash.new
dupes = Array.new
#parse the cmd lne args, removing any trailing dir seps
root = ARGV[0]
root = root.chomp("/")
#add the magic string to cause glob to recurse dirs
root += "/**/**"
totalSize = 0
start = Time.now
Dir.glob(root) do |path|
unless File.directory?(path) #skip dirs, we just want files
f = File.open(path)
sz = File.size(path)
if sz > 0 # skip zero length files
print "processing #{path}:"
$stdout.flush
result = Digest::SHA1::hexdigest(f.read)
puts result
totalSize += sz
if map.has_key?(result)
#it's a dupe, so make an array of the two paths and add it to the dupes array
dupes << [path , map[result]]
else
map[result] = path #just add it to the hash
end
end
end
end
elapsed = Time.now - start
puts "scan completed #{((totalSize/1024) / elapsed).to_i}K per sec"
dupes.each() do |a|
puts "dupe: #{a[0]} and #{a[1]}"
#our highly sophisticated algo..if we have a dupe..keep the one with the shorter name
if a[0].length < a[1].length
delme = a[1]
else
delme = a[0]
end
puts "moving #{delme} to the trash."
bn = File.basename(delme)
nn = File.expand_path("~/.Trash") + "/" + bn
FileUtils.move(a[0],nn)
end
if dupes.empty?
puts "**** No duplicates found ****"
end
Note that this script doesn’t prompt you for anything…so if you use it…be prepared and backup yer stuff first!