Sunday, April 23, 2006

iTunes Cleanup: "Rolling Stones" vs. "The Rolling Stones"

As a database nerd, I'm bothered by inconsistencies in iTunes song data that seem easy to fix, but are too time-consuming for a large library.

I'm using a combination of Perl & Applescript to simplify the process of cleaning up information that is similar but not identical across different songs and artists. I wrap everything in Perl, and call Applescript only when necessary. That way, people on Windows (or even Linux) systems could use the code to identify problems, even if the fixes required editing the song info by hand. And on the Mac, the relevant songs can be automatically put into a playlist for later editing or fixing via other Applescripts.

For example, here's a little snippet that identifies all the artists who are represented as both "something" and "The something". Copy and paste it into the Terminal to run it.

grep "Artist" ~/Music/iTunes/"iTunes Music Library.xml" |
sort -u |
sed -e 's/>The />/' |
sort -d |
perl -e '

while($line = <>)
{
chomp $line;
if ($line =~ /Artist<\/key><string>(.*?)<\/string>/)
{
$artist = $1;

if ($artist eq $last_artist)
{
print "Inconsistent THE: [$artist] and [The $artist]\n";
}
$last_artist = $artist;
}
}
'

What this does is get an alphabetical list of all the artist names, with the initial "The" stripped off. If two consecutive names are identical, the full list of artist names included both the "The" and "no-The" forms.

Advantages:
  • All the songs for an artist will be filed under a single folder, rather than separate folders for "Beatles" and "The Beatles" and so on. Makes it easier to copy or transfer the files via the command line.
  • When going through the iPod "Artists" menu, you'll be able to get to all of the songs by that artist, instead of having separate entries with different groups of songs.
  • Having consistent names makes it easier to check for duplicate songs and weed out other sorts of problems.

Watch this space for more developments on this front!

Tags: , ,

No comments: