I used Python Selenium Webdriver for a project wherein a client needs a program that will log into around 25 different web sites, and download a total of 750-1000 different documents. Automating operations across so many different sites has been a huge learning opportunity for me. It’s a lot of fun!
I had a passing familiarity with Selenium at the start of this, but my knowledge was dated. I had used BeautifulSoup more recently, but not… recently. So, this was very slow going at first, but now that I’m over the hump of getting reacquainted with it, and learning quite a number of new things about it, I thought I’d post the notes here for anyone else who might find them useful (including future me – because my memory is terrible).
Please note that in this project, the point of the work is automating sites with no API to download data and documents. I was not constrained to emulating behaviors that actual end users are expected to take, as web QA automation often demands. If I wanted to do something really hacky that a real person would never do on the site, I was free to do that so long as I got the data as a result. These notes reflect that.
Enjoy!
Program Structure
This might be the last thing you need to hear when you’re trying to make forward progress fast, but I found that doing this kind of automation scripting seems to demand a different structure from, say, writing backend queue workers that update data or cache systems, or writing API endpoints that perform simple CRUD operations.
I wound up in a slightly foreign-looking structure through trial and error. Because a selenium program controls a browser that is meant to be used by a human, the flow can feel pretty strange when you’re used to the usual back end coding constructs. The general pattern I’ve found success with looks like this:
go_to_listing_page()
items = find_items_I_care_about()
for item in items:
try:
go_from_listing_to_detail_page(driver, item)
go_from_detail_to_invoice_page(driver)
download_invoice(driver)
go_from_invoice_page_to_listing_page(driver)
Except NoSuchElementException:
handle_session_timeout(driver)
Except ElementNotInteractable:
handle_feedback_modal(driver)
So, at a high level, there is a function for each possible navigation path, unless there are static urls. If there are static urls, you don’t need the ‘go_from_x_to_y’ functions – they can just be “go_to_y” functions instead, which is great if it’s feasible. In my experience, it is NOT always feasible.
File Downloads With Selenium Webdriver
In order to control the browser so it doesn’t pop up a download dialog or open a PDF in a new tab by default, I configured my browser driver with the magical settings that will tell it where to put downloaded files. I also disabled the settings that cause it to open new tabs, etc. This is straight from my code:
def get_driver():
fp = FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", download_destination)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types)
fp.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)
fp.set_preference("pdfjs.disabled", True)
driver = webdriver.Firefox(firefox_profile=fp)
driver.implicitly_wait(45)
return driver
Once this browser profile is in place, it’s a matter of finding and clicking the download link or button. I’m sure you can find similar profiles for Chrome or other browsers all over the web.
Login
I had no intention of writing this section, but since the sites all work so similarly with regards to login, it’s worth sharing. I created a sort of site login config dictionary the code could use to find, populate, and submit login forms.
So, here’s an example dictionary, a function that uses it, and a call to the function passing the site dict:
sites = {
"ExampleSite": {
"login_url": "https://example.com/login",
"login_form_id": "login_form",
"uname": "billing@mycompany.com",
"uname_input_id": "input-Email",
"passwd": "dTowNRu13z!",
"passwd_input_id": "input-Password",
"login_button_input_id": "login-submit",
},
"OtherSite": {
"login_url": "https://othersite.com/login",
"login_form_id": "fctl-login",
"uname": "billing@mycompany.com",
"uname_input_id": "uname",
"passwd": "myPasswordRocks",
"passwd_input_id": "passwd",
"login_button_input_id": "Login"
}
}
def do_login(browser, site):
print("Getting login url")
browser.get(site["login_url"])
# fill out the fields
browser.find_element_by_id(site["uname_input_id"]).send_keys(site["uname"])
browser.find_element_by_id(site["passwd_input_id"]).send_keys(site["passwd"])
print("Submitting login form")
login_button = site["login_button_input_id"]
browser.find_element_by_xpath(f"//button[@id='{login_button}']").click()
print("Login form submitted")
if __name__ == ‘__main__’:
do_login(sites[“ExampleSite”])
There isn’t a lot happening here that’s special, but it’s worth noting the send_keys
method, which maps keypresses to form text inputs.
Also worth noting: I believe that selenium will understand what a form is and you can just call submit()
on a form like you can call click()
on a button element. Read the docs – don’t quote me. I haven’t tried it yet.
Handling Infinite Scroll With Selenium
Sometimes you need to get a list of all entries on a site before you process any of them, and if there’s an infinite scroll implementation, it’s not straightforward to do. The best solutions I’ve found consist of telling selenium to just execute javascript code to try to handle this in a reasonable way. Here’s what worked for me:
def scroll(browser):
# Get scroll height
last_height = browser.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(3)
# Calculate new scroll height and compare with last scroll height
new_height = browser.execute_script("return document.body.scrollHeight")
if new_height == last_height:
# If heights are the same it will exit the function
break
last_height = new_height
The above code will keep scrolling until the “infinite” scrolling is exhausted & you’ve exposed all of the items. It’s pretty easy to just scroll one time if you want to handle the scrolling differently:
def scroll_once(browser):
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(3)
return None
Compound Attributes (Tag Attributes Containing Spaces)
I had to find an anchor tag that was a descendant of a div. I intended to use the class attribute of both tags to locate them. However, both tags had “compound” class attributes, meaning the class attribute was a space-delimited series of names instead of just one. Nothing was being found at all, so I was stuck until someone on stackoverflow mentioned the phrase “compound classes”. I added that phrase to my google search and presto!
Here’s some code ripped right from the project.
div_selector = "div[contains(concat(' ', normalize-space(@class), ' '), ' billing-account-header ')]"
link_selector = "a[contains(concat(' ', normalize-space(@class), ' '), ' toggle-link ')]"
detail_buttons = browser.find_elements_by_xpath(f"//{div_selector}//{link_selector}")
Sadly, I’m not 100% certain (yet) why this works. But, it does work fantastically well for me when I need to locate things with compound classes.
Handling Modals With Selenium
So far, it’s been pretty easy to handle modals. The hardest part is that it’s become easier to create them, so it’s become an aspect of sites that changes frequently. As a result, I expect I’ll make updates for this a lot. The pattern is that you go to a page, you look for an element and, usually when you try to interact with it, you’ll get an ElementNotInteractibleException, and the message might even say that the element is being obscured. This is because a modal has popped up.
Depending on the situation, you can either trap that exception and deal with the modal then, or you can preemptively check to see if there’s a modal before you do anything else.
Here’s a function for handling a modal:
def detect_and_dismiss_modal(browser):
print("Looking for a modal...")
try:
modal_dismissal_button = browser.find_element_by_xpath("//button[@aria-label='No, thanks']")
except NoSuchElementException:
print("Looks like there's no modal. Moving on.")
return None
else:
print("There's a modal. Clicking to dismiss it now...")
modal_dismissal_button.click()
print("Modal should be gone now. Returning...")
return None
You can either call something like this at the top of another function that’s going to eventually do something on the page, or you can call it within a try/except, etc. Here’s an example using it in an except block:
for button in detail_buttons:
try:
button.click()
except ElementClickInterceptedException:
print("looks like there's a modal. Just a sec...")
detect_and_dismiss_modal(browser)
print("Ok, gonna try that detail expansion button click again...")
button.click()
This code is a little hacky in that it doesn’t have any formal retry semantics, but you get the idea. Overall, the hardest part of dealing with modals in my experience so far is just detecting that they’re there. If you know there’s a modal there, dismissing it amounts to finding and clicking a button on the page, just like any other button on any other page.
Clicking With Javascript
It turns out that it’s far, far more reliable (at least in my experience so far) to ask javascript to click on something for you than to have selenium webdriver locate something, scroll it into view, move to the element, and then click on it. All of those operations are important if what you need is to emulate user behavior, but I don’t need that. I just need content! So, in the case where you only need content and do not need to emulate a human using a web site, you can click on something by asking javascript to do it for you, like this:
print(f"Looking for the pagination link for page {page_number}")
page_link = browser.find_element_by_link_text(str(page_number))
print(f"Found link. Clicking now.")
browser.execute_script("arguments[0].click();", page_link)
return
Headless Firefox With Selenium Webdriver
I did not have a particular reason to choose one browser over the other. I switch browsers all the time, and I happened to be using Firefox as my main browser when I started the project, so I used the Firefox driver. Through all of the work I did, I never had a reason to switch, so the final project uses Firefox.
One of the sites I had to automate caused a quirk where firefox opened a new window for each downloaded file and never closed it, so I temporarily switched to Chrome (which worked fine) until realizing that the production project would run headless anyway, so I moved it back to Firefox for the sake of consistency.
Running any selenium-supported browser in headless mode appears to be well-documented, but here’s the utility function I used to configure the driver for my projects:
def get_firefox_driver(download_dir, log_dir, implicit_wait=60, headless=False):
"""
Assembles a profile configuration for a firefox browser and returns it.
"""
mime_types = "application/pdf,application/vnd.adobe.xfdf,application/vnd.fdf,application/vnd.adobe.xdp+xml"
options = Options()
options.headless = headless
fp = FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", download_dir)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types)
fp.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)
fp.set_preference("pdfjs.disabled", True)
options.profile = fp
service_log_path = os.path.join(log_dir, "geckodriver.log")
driver = webdriver.Firefox(options=options, service_log_path=service_log_path)
driver.implicitly_wait(implicit_wait)
return driver
For the record, I was able to do all of my early testing with a visible browser window, and change it to headless mode at the last minute before deployment. I implemented a toggle flag (`–headless`) in my script that, if present, passed `headless=True` to this function.
Select Dropdown Elements
Selenium Webdriver has a built-in “Select” object (do ‘from selenium.webdriver.support.ui import Select’ to get it). It makes dealing with select elements a breeze. Here’s some code from one of the scripts I wrote.
destination_select = Select(browser.find_element_by_id("Destination"))
destination_select.select_by_visible_text('Option Text')