Scrape ESPN Fantasy Data with Selenium

When it comes to scraping data from a fantasy league to an analytical framework, ESPN does not make it easy. If you have found this article, you probably have also found that there used to be an API for ESPN that made data collection easy, but today we need to do a lot more legwork. We will need to go about this in several ways. Normally scraping a website we could use something like the Requests library with the addition of the BeautifulSoup library. Now, in a normal case, these are great and are simple to use, but when we have to log into a website to get our data, we need some other steps. The reason behind this is that when Requests goes to a website, it opens it in an internal browser. This means that even if you have all your passwords saved in with your cookies, you will still need to log in.

Thus we introduce Selenium. For those who don’t know, Selenium will actually open a browser for python to click through. Selenium can be a bit difficult to set up, but here is the link to that process. From there, you can use the Selenium functions to navigate the site autonomously to collect the data.

As ESPN holds the fantasy data behind your password, we are going to need Selenium to log into your own account, so lets go through the steps to set up our script.

import time
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from pywebcopy import save_website
from pywebcopy import WebPage
import pywebcopy
from bs4 import BeautifulSoup
import pandas as pd
pywebcopy.config['bypass_robots'] = True
from datetime import datetime

These are all different pieces of the Selenium library that are needed. I personally, like to use datetime to test how long certain pieces of the process take and ESPN will need to load, so that is why we bring in the time library.

Once we have our libraries imported, we can open our internet browser that we will use for our navigation.

driver = webdriver.Chrome(ChromeDriverManager().install())

This will open a Chrome browser, but you can set it to a different browser should you choose from the link above in the ‘Drivers’ section. Now that we have our browser open, we can start going through the log in and data collection process.

driver.get('https://www.espn.com/');
time.sleep(8)
##Open Initial Log In Location
search_box = driver.find_element_by_id('global-user-trigger')
search_box.click()
time.sleep(2)
print('Open Log In Tab')
##Click on the Log In location
nextbox = driver.find_element_by_xpath("//a[@data-affiliatename='espn']")
nextbox.click()
print('Click Login')
##Switch to iFrame to enter log in credentials
time.sleep(2)
driver.switch_to.frame("disneyid-iframe")
username = driver.find_element_by_xpath("//input[@placeholder='Username or Email Address']")
print('Switching to iFrame')
##Submit Username and Password
time.sleep(2)
username.send_keys('USERNAME')
password = driver.find_element_by_xpath("//input[@placeholder='Password (case sensitive)']")
password.send_keys('PASSWORD')
time.sleep(2)
print('Logging In')
##Submit credentials
button = driver.find_element_by_xpath("//button[@class='btn btn-primary btn-submit ng-isolate-scope']")
button.click()
driver.page_source
##Open Link Page
time.sleep(8)
search_box = driver.find_element_by_id('global-user-trigger')
search_box.click()
print('Going to Fantasy Link')
##Selecting Fantasy League
time.sleep(2)
leaguego = driver.find_element_by_partial_link_text('FANTASY TEAM NAME')
leaguego.click()
print('Entering League')
site = driver.page_source

Once you are here, you will need to actually collect the data from these pages. To do this you will need to use the BeautifulSoup library. This is where we actually collect the tables from the html format of the pages and to read the tables.

time.sleep(8)
page = driver.page_source
soup = BeautifulSoup(page, 'html.parser')
tables = soup.find_all("table")

This will take the table from your team, or the team of the page you are currently looking at. Be aware that you are on the correct week for your analysis. You either need to time your automation of this to fit in with when all scores are on the most recent page, or you will need to use Selenium to select the correct week for your scrape. Once I have this done, I personally like to format the table and add a couple of my own columns.

time.sleep(8)
table = tables[0]
tab_data = [[cell.text for cell in row.find_all(["th","td"])]
for row in table.find_all("tr")]
df = pd.DataFrame(tab_data)
gdf = df.drop([0,11,19])
gdf.columns = gdf.iloc[0]
gdf = gdf[1:]
#gdf.drop(columns=['action'])
gdf['Team'] = 'Fantasy Team Name'
gdf['Week'] = 01
gdf['Time'] = datetime.now()

Now that you have this, the analysis is up to you! Let me know what you are doing analysis on or other improvements that could be added into this format!

— —

Here is the repository for this code along with some extras:

https://github.com/dendi19/FFL

M.A. International Relations/M.S. Data Analytics. Certified in Data Analytics, GIS, and Humanitarian Assistance. Returned Peace Corps Volunteer.