Fixing PyLyrics
PyLyrics is a great package for retrieving lyrics.
It uses BeautifulSoup to scrape the lyrics from lyrics.fandom.com
BeautifulSoup or bs4 provides;
Python idioms for iterating, searching, and modifying the parse tree.
See here for more information.
Mulyrica
PyLyrics is a central part of Mulyrica, my lyrics search/sentiment analysis app. I was pretty disappointed when it did not work for some queries.
I thought this was a parsing error but upon inspection, it turned out to be something else entirely.
It works!
PyLyrics worked until I tested it on a few songs.
It returned the following error;
ValueError: Song or Singer does not exist or the API does not have the lyrics
I checked the website but the lyrics were there. So I opened up a debug session to investigate and find out where the error was being thrown.
Source of the problem
@staticmethod
def getLyrics(singer, song):
#Replace spaces with _
singer = singer.replace(' ', '_')
song = song.replace(' ', '_')
r = requests.get('http://lyrics.wikia.com/{0}:{1}'.format(singer,song))
s = BeautifulSoup(r.text)
#Get main lyrics holder
lyrics = s.find("div",{'class':'lyricbox'})
if lyrics is None:
raise ValueError("Song or Singer does not exist or the API does not have Lyrics")
return None
#Remove Scripts
[s.extract() for s in lyrics('script')]
#Remove Comments
comments = lyrics.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
#Remove unecessary tags
for tag in ['div','i','b','a']:
for match in lyrics.findAll(tag):
match.replaceWithChildren()
#Get output as a string and remove non unicode characters and replace <br> with newlines
output = str(lyrics).encode('utf-8', errors='replace')[22:-6:].decode("utf-8").replace('\n','').replace'<br/>','\n')
try:
return output
except:
return output.encode('utf-8')specifically
if lyrics is None:
raise ValueError("Song or Singer does not exist or the API does not have Lyrics")
return NoneI quickly did a print lyrics and sure enough they showed up in the debugging shell running in my browser
so the problem was further down.
More hunting
artist = form.artist.data
song = form.song.data
lyrics = PyLyrics.getLyrics(artist, song)This time I got the following error;
ascii code can't decode byte 0xc3 in position 1245: ordinal not in range(128) in file
/venv/lib/python2/site/packages/functions.pyThat was on line 96 in functions.py under getLyrics(singer,song)
#Remove unecessary tags
for tag in ['div','i','b','a']:
for match in lyrics.findAll(tag):
match.replaceWithChildren()
#Get output as a string and remove non unicode characters and replace <br> with newlines
output = str(lyrics).encode('utf-8', errors='replace')[22:-6:].decode("utf-8").replace('\n','').replace('<br/>','\n')
try:
return output
except:
return output.encode("utf-8")I thought encoding the lyrics twice didn’t make sense so I removed call to the encoding method;
output = str(lyrics).encode('utf-8', errors='replace')[22:-6:].decode("utf-8").replace('\n','').replace('<br/>','\n')New code;
output = str(lyrics)[22:-6:].decode("utf-8").replace('\n','').replace('<br/>','\n')Saved the script and restarted the flask app again.
I now had a situation were queries that had caused problems before were working and a few of
previously error-free queries were suddenly not working.
Curiously, I tried running all queries under two environments, one used python2.7 the other
used python3.5.
It was then that I realized that this was a matter of different versions of python.
The package was made for and tested only on python3+
I quickly seperated what worked for python3 from what workd for python2 and everything finally worked.
#Python 3
if sys.version_info.major > 2:
output = str(lyrics).encode('utf-8', errors='replace')[22:-6:].\
decode("utf-8").replace('\n','').replace('<br/>','\n')
else: # Python 2
output = str(lyrics)[22:-6:].decode("utf-8").replace('\n','').\
replace('<br/>','\n')
try:
return [album, output]
except:
return [album, output.encode('utf-8')]Happy pythoning.
Let me know what you think of this article on twitter @agabalevis or leave a comment below!