For those of you not following this mess of me learning to program in Python, this is the third option so far for getting Dell warranty expirations via web scraping. The first option I posted was the one I did without any direction from anyone who knows what they are doing. I used a string function. You can read that post here. After I posted that version on Google+ and Reddit, I got recommendations to do this with regex, Scrapy and BeautifulSoup. My last post was getting the expiration date via regex. This post is getting it with BeautifulSoup, which I must say once I figured out how to do what I wanted was much better.

Here’s a quick run down of how I’m doing this. Again, I’m sure some of this could be done much better.

The modules I use are sys for getting the command line arguments, requests to pull the data from, and lastly BeautifulSoup to parse the html. The function is only a few lines. First, I pull the html from Dell followed by parsing it with BeautifulSoup. Next, I find all the TopTwoWarrantyListItems and assign to the variable lis. Lastly, I compare those list items to pull out the max value which is assigned and returned as the warranty expiration date.

Let me know what you think, good and bad. Every time I post one of these, I get some new advice that helps me learn.



I got a decent amount of feed back and advice on my post the other day about getting a Dell warranty expiration with web scraping. It was recommended to change my scraping to use regex, Beautiful Soup or Scrapy. I figured I’d do all three and make a post on each one.

As you know if you read any of this blog so far, I’m just learning python, so I have a ton to learn. What better way than try the different options presented to me.

The first option I decided to try since I already did a little bit of it for other scripts I haven’t blogged about yet is scraping it via regex. This was quite challenging for a noob like me. I couldn’t seem to quite get the expression down to grab all the dates needed. The original expression I was using would grab the last date, but would skip right over the first one. I have no clue why.

One thing I learned from doing this scrap with regex is my original script was wrong. It was grabbing the date, but it wasn’t necessarily grabbing the correct date. Dell’s website can have multiple expiration dates. If you renew, it’s going to show the original warranty, and the old warranty. If you have a default warranty and upgraded to a better warranty, it’s going to show both. By using regex, I was able to grab the dates and compare them to find the correct expiration.

Another thing I learned about this task in particular is the slowness is not so much my code but Dell’s crappy website. As a network/systems guy, I have to go on Dell’s site a lot, and it is horribly painful to use because of the speed.

OK, so here’s a quick run down of the code and then the actual code.

First it grabs the url as a string. Then it performs a regular expression search looking for the dates and creates a list of tuples with the date being the second item in each tuple. After having a list of tuples, I have a while loop that runs through the tuples and grabs the dates out as integers and puts them into a list of tuples so they can be compared. After I have the dates in a list of tuples, I just use the max function to find out which is the correct date. I’m not sure this is the greatest way to do this, but it seems to work on the service tags I’ve tested out. Lastly, I convert the warranty back into a string to return the warranty as a string.

As I said with my original post, I’m sure this could be improved a million ways. I’m just learning, so any pointers would be appreciated.



As I mentioned in my blog on my first learning resources, right after finishing the Google training, the first script I wrote was a script to get a Dell hardware warranty expiration given the service tag. It was my first attempt at web scraping, something you learn with the Google python videos. I am not sure if this is the best or most efficient way to do this, but it was a way for me to test out what I just learned. If you have some advice, don’t hesitate to share it in the comments.

I’m actually going to do more with this as I learn more. The goal is to access our database of hardware, grab the service tags, get the warranty expiration from Dell, and lastly update a warranty field in the database. From that field, we’ll generate alerts to notify customers of their pending expiration date and hopefully get some warranty sales. This script will be a part of that bigger picture later.

For now, all you do is run SERVICETAG to get the date back in a string format.

This script grabs the html and then searches for TopTwoWarrantyListItem. From that starting point, it looks for four greater than brackets to find the beginning position of the date. Then it looks for the very next less than bracket as the ending point for the date.

When I started typing up this blog, I ran the script and it was giving me errors. It worked when I wrote it, but I’ve since rebuilt my laptop with Linux Mint. My laptop was previously running CentOS, which I believe runs Python 2.6 as default. I’m assuming something changed from there. Considering the current training I’m doing is in Python 3, I made the script check for the python version and error out if not version 3.

Without further ado, here’s the Dell warranty script code.