Skip to content

Musings of an Anonymous Geek

Made with only the finest 1's and 0's

Menu
  • About
  • Search Results
Menu

Python Selenium Webdriver Notes

Posted on December 22, 2020December 22, 2020 by jonesy

I used Python Selenium Webdriver for a project wherein a client needs a program that will log into around 25 different web sites, and download a total of 750-1000 different documents. Automating operations across so many different sites has been a huge learning opportunity for me. It’s a lot of fun!

I had a passing familiarity with Selenium at the start of this, but my knowledge was dated. I had used BeautifulSoup more recently, but not… recently. So, this was very slow going at first, but now that I’m over the hump of getting reacquainted with it, and learning quite a number of new things about it, I thought I’d post the notes here for anyone else who might find them useful (including future me – because my memory is terrible).

Please note that in this project, the point of the work is automating sites with no API to download data and documents. I was not constrained to emulating behaviors that actual end users are expected to take, as web QA automation often demands. If I wanted to do something really hacky that a real person would never do on the site, I was free to do that so long as I got the data as a result. These notes reflect that.

Enjoy!

Program Structure

This might be the last thing you need to hear when you’re trying to make forward progress fast, but I found that doing this kind of automation scripting seems to demand a different structure from, say, writing backend queue workers that update data or cache systems, or writing API endpoints that perform simple CRUD operations.

I wound up in a slightly foreign-looking structure through trial and error. Because a selenium program controls a browser that is meant to be used by a human, the flow can feel pretty strange when you’re used to the usual back end coding constructs. The general pattern I’ve found success with looks like this: 

go_to_listing_page()
items = find_items_I_care_about()
for item in items:
     try:
	go_from_listing_to_detail_page(driver, item)
	go_from_detail_to_invoice_page(driver)
	download_invoice(driver)
	go_from_invoice_page_to_listing_page(driver)
     Except NoSuchElementException:
     	handle_session_timeout(driver)
     Except ElementNotInteractable:
  	handle_feedback_modal(driver)

So, at a high level, there is a function for each possible navigation path, unless there are static urls. If there are static urls, you don’t need the ‘go_from_x_to_y’ functions – they can just be “go_to_y” functions instead, which is great if it’s feasible. In my experience, it is NOT always feasible.

File Downloads With Selenium Webdriver

In order to control the browser so it doesn’t pop up a download dialog or open a PDF in a new tab by default, I configured my browser driver with the magical settings that will tell it where to put downloaded files. I also disabled the settings that cause it to open new tabs, etc. This is straight from my code: 

def get_driver():
	fp = FirefoxProfile()
	fp.set_preference("browser.download.folderList", 2)
	fp.set_preference("browser.download.manager.showWhenStarting", False)
	fp.set_preference("browser.download.dir", download_destination)
	fp.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types)
	fp.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)
	fp.set_preference("pdfjs.disabled", True)

	driver = webdriver.Firefox(firefox_profile=fp)
	driver.implicitly_wait(45)
	return driver

Once this browser profile is in place, it’s a matter of finding and clicking the download link or button. I’m sure you can find similar profiles for Chrome or other browsers all over the web.

Login

I had no intention of writing this section, but since the sites all work so similarly with regards to login, it’s worth sharing. I created a sort of site login config dictionary the code could use to find, populate, and submit login forms.  

So, here’s an example dictionary, a function that uses it, and a call to the function passing the site dict:

sites = {
    "ExampleSite": {
        "login_url": "https://example.com/login",
        "login_form_id": "login_form",
        "uname": "billing@mycompany.com",
        "uname_input_id": "input-Email",
        "passwd": "dTowNRu13z!",
        "passwd_input_id": "input-Password",
        "login_button_input_id": "login-submit",
    },
    "OtherSite": {
        "login_url": "https://othersite.com/login",
        "login_form_id": "fctl-login",
        "uname": "billing@mycompany.com",
        "uname_input_id": "uname",
        "passwd": "myPasswordRocks",
        "passwd_input_id": "passwd",
        "login_button_input_id": "Login"
    }

}

def do_login(browser, site):
    print("Getting login url")
    browser.get(site["login_url"])

    # fill out the fields
    browser.find_element_by_id(site["uname_input_id"]).send_keys(site["uname"])
    browser.find_element_by_id(site["passwd_input_id"]).send_keys(site["passwd"])

    print("Submitting login form")
    login_button = site["login_button_input_id"]
    browser.find_element_by_xpath(f"//button[@id='{login_button}']").click()
    print("Login form submitted")

if __name__ == ‘__main__’:
    do_login(sites[“ExampleSite”])

There isn’t a lot happening here that’s special, but it’s worth noting the send_keys method, which maps keypresses to form text inputs. 

Also worth noting: I believe that selenium will understand what a form is and you can just call submit() on a form like you can call click() on a button element. Read the docs – don’t quote me. I haven’t tried it yet. 

Handling Infinite Scroll With Selenium

Sometimes you need to get a list of all entries on a site before you process any of them, and if there’s an infinite scroll implementation, it’s not straightforward to do. The best solutions I’ve found consist of telling selenium to just execute javascript code to try to handle this in a reasonable way. Here’s what worked for me:

def scroll(browser):
	# Get scroll height
	last_height = browser.execute_script("return document.body.scrollHeight")

	while True:
    		# Scroll down to bottom
    		browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    		# Wait to load page
    		time.sleep(3)

    		# Calculate new scroll height and compare with last scroll height
    		new_height = browser.execute_script("return document.body.scrollHeight")
    		if new_height == last_height:
        		# If heights are the same it will exit the function
        		break
    		last_height = new_height

The above code will keep scrolling until the “infinite” scrolling is exhausted & you’ve exposed all of the items. It’s pretty easy to just scroll one time if you want to handle the scrolling differently: 

def scroll_once(browser):
	browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

	# Wait to load page
	time.sleep(3)
	return None

Compound Attributes (Tag Attributes Containing Spaces)

I had to find an anchor tag that was a descendant of a div. I intended to use the class attribute of both tags to locate them. However, both tags had “compound” class attributes, meaning the class attribute was a space-delimited series of names instead of just one. Nothing was being found at all, so I was stuck until someone on stackoverflow mentioned the phrase “compound classes”. I added that phrase to my google search and presto! 

Here’s some code ripped right from the project.

	div_selector = "div[contains(concat(' ', normalize-space(@class), ' '), ' billing-account-header ')]"
	link_selector = "a[contains(concat(' ', normalize-space(@class), ' '), ' toggle-link ')]"
	detail_buttons = browser.find_elements_by_xpath(f"//{div_selector}//{link_selector}")

Sadly, I’m not 100% certain (yet) why this works. But, it does work fantastically well for me when I need to locate things with compound classes. 

Handling Modals With Selenium

So far, it’s been pretty easy to handle modals. The hardest part is that it’s become easier to create them, so it’s become an aspect of sites that changes frequently. As a result, I expect I’ll make updates for this a lot. The pattern is that you go to a page, you look for an element and, usually when you try to interact with it, you’ll get an ElementNotInteractibleException, and the message might even say that the element is being obscured. This is because a modal has popped up.

Depending on the situation, you can either trap that exception and deal with the modal then, or you can preemptively check to see if there’s a modal before you do anything else. 

Here’s a function for handling a modal: 

def detect_and_dismiss_modal(browser):
	print("Looking for a modal...")
	try:
    		modal_dismissal_button = browser.find_element_by_xpath("//button[@aria-label='No, thanks']")
	except NoSuchElementException:
    		print("Looks like there's no modal. Moving on.")
    		return None
	else:
    		print("There's a modal. Clicking to dismiss it now...")
    		modal_dismissal_button.click()
    		print("Modal should be gone now. Returning...")
    	return None

You can either call something like this at the top of another function that’s going to eventually do something on the page, or you can call it within a try/except, etc. Here’s an example using it in an except block:

for button in detail_buttons:
    try:
        button.click()
    except ElementClickInterceptedException:
        print("looks like there's a modal. Just a sec...")
        detect_and_dismiss_modal(browser)
        print("Ok, gonna try that detail expansion button click again...")
        button.click()

This code is a little hacky in that it doesn’t have any formal retry semantics, but you get the idea. Overall, the hardest part of dealing with modals in my experience so far is just detecting that they’re there. If you know there’s a modal there, dismissing it amounts to finding and clicking a button on the page, just like any other button on any other page.

Clicking With Javascript

It turns out that it’s far, far more reliable (at least in my experience so far) to ask javascript to click on something for you than to have selenium webdriver locate something, scroll it into view, move to the element, and then click on it. All of those operations are important if what you need is to emulate user behavior, but I don’t need that. I just need content! So, in the case where you only need content and do not need to emulate a human using a web site, you can click on something by asking javascript to do it for you, like this:

print(f"Looking for the pagination link for page {page_number}")
page_link = browser.find_element_by_link_text(str(page_number))

print(f"Found link. Clicking now.")
browser.execute_script("arguments[0].click();", page_link)
return

Headless Firefox With Selenium Webdriver

I did not have a particular reason to choose one browser over the other. I switch browsers all the time, and I happened to be using Firefox as my main browser when I started the project, so I used the Firefox driver. Through all of the work I did, I never had a reason to switch, so the final project uses Firefox. 

One of the sites I had to automate caused a quirk where firefox opened a new window for each downloaded file and never closed it, so I temporarily switched to Chrome (which worked fine) until realizing that the production project would run headless anyway, so I moved it back to Firefox for the sake of consistency. 

Running any selenium-supported browser in headless mode appears to be well-documented, but here’s the utility function I used to configure the driver for my projects: 

def get_firefox_driver(download_dir, log_dir, implicit_wait=60, headless=False):
	"""
	Assembles a profile configuration for a firefox browser and returns it.
	"""
	mime_types = "application/pdf,application/vnd.adobe.xfdf,application/vnd.fdf,application/vnd.adobe.xdp+xml"
	options = Options()
	options.headless = headless

	fp = FirefoxProfile()
	fp.set_preference("browser.download.folderList", 2)
	fp.set_preference("browser.download.manager.showWhenStarting", False)
	fp.set_preference("browser.download.dir", download_dir)
	fp.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types)
	fp.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)
	fp.set_preference("pdfjs.disabled", True)

	options.profile = fp
	service_log_path = os.path.join(log_dir, "geckodriver.log")
	driver = webdriver.Firefox(options=options, service_log_path=service_log_path)
	driver.implicitly_wait(implicit_wait)
	return driver

For the record, I was able to do all of my early testing with a visible browser window, and change it to headless mode at the last minute before deployment. I implemented a toggle flag (`–headless`) in my script that, if present, passed `headless=True` to this function. 

Select Dropdown Elements

Selenium Webdriver has a built-in “Select” object (do ‘from selenium.webdriver.support.ui import Select’ to get it). It makes dealing with select elements a breeze. Here’s some code from one of the scripts I wrote.

 	destination_select = Select(browser.find_element_by_id("Destination"))
 	destination_select.select_by_visible_text('Option Text')

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Facebook (Opens in new window) Facebook

Recent Posts

  • Auditing Your Data Migration To ClickHouse Using ClickHouse Local
  • ClickHouse Cheat Sheet 2024
  • User Activation With Django and Djoser
  • Python Selenium Webdriver Notes
  • On Keeping A Journal and Journaling
  • What Geeks Could Learn From Working In Restaurants
  • What I’ve Been Up To
  • PyCon Talk Proposals: All You Need to Know And More
  • Sending Alerts With Graphite Graphs From Nagios
  • The Python User Group in Princeton (PUG-IP): 6 months in

Categories

  • Apple
  • Big Ideas
  • Books
  • CodeKata
  • Database
  • Django
  • Freelancing
  • Hacks
  • journaling
  • Leadership
  • Linux
  • LinuxLaboratory
  • Loghetti
  • Me stuff
  • Other Cool Blogs
  • PHP
  • Productivity
  • Python
  • PyTPMOTW
  • Ruby
  • Scripting
  • Sysadmin
  • Technology
  • Testing
  • Uncategorized
  • Web Services
  • Woodworking

Archives

  • January 2024
  • May 2021
  • December 2020
  • January 2014
  • September 2012
  • August 2012
  • February 2012
  • November 2011
  • October 2011
  • June 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • September 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
© 2025 Musings of an Anonymous Geek | Powered by Minimalist Blog WordPress Theme