I got a decent amount of feed back and advice on my post the other day about getting a Dell warranty expiration with web scraping. It was recommended to change my scraping to use regex, Beautiful Soup or Scrapy. I figured I’d do all three and make a post on each one.

As you know if you read any of this blog so far, I’m just learning python, so I have a ton to learn. What better way than try the different options presented to me.

The first option I decided to try since I already did a little bit of it for other scripts I haven’t blogged about yet is scraping it via regex. This was quite challenging for a noob like me. I couldn’t seem to quite get the expression down to grab all the dates needed. The original expression I was using would grab the last date, but would skip right over the first one. I have no clue why.

One thing I learned from doing this scrap with regex is my original script was wrong. It was grabbing the date, but it wasn’t necessarily grabbing the correct date. Dell’s website can have multiple expiration dates. If you renew, it’s going to show the original warranty, and the old warranty. If you have a default warranty and upgraded to a better warranty, it’s going to show both. By using regex, I was able to grab the dates and compare them to find the correct expiration.

Another thing I learned about this task in particular is the slowness is not so much my code but Dell’s crappy website. As a network/systems guy, I have to go on Dell’s site a lot, and it is horribly painful to use because of the speed.

OK, so here’s a quick run down of the code and then the actual code.

First it grabs the url as a string. Then it performs a regular expression search looking for the dates and creates a list of tuples with the date being the second item in each tuple. After having a list of tuples, I have a while loop that runs through the tuples and grabs the dates out as integers and puts them into a list of tuples so they can be compared. After I have the dates in a list of tuples, I just use the max function to find out which is the correct date. I’m not sure this is the greatest way to do this, but it seems to work on the service tags I’ve tested out. Lastly, I convert the warranty back into a string to return the warranty as a string.

As I said with my original post, I’m sure this could be improved a million ways. I’m just learning, so any pointers would be appreciated.

 

 


Author
profilepicJason Vanzin is the Director of Business Consulting and Services at EssentiaLink. He has over 15 years of IT experience and lives in Pittsburgh, PA. He blogs on topics related to Business Continuity, Python programming, and technology in general.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

%d bloggers like this: