Scraping Email Addresses Using Python – Ultimate Guide 2026
8 min read

Owning a list of email prospects can help marketers to scale up their businesses. By scraping email addresses using Python scripts, business people can have better outreach to their audience.
Table of Contents
Scraping Email Addresses Using Python
One of the easiest ways to have good clientage is to have as many business email addresses as possible and send them your service details time and again. They are many scraping tools present on the internet that provide these services for free, but they have withdrawal data limits. They also offer unlimited data extraction limits, but they are paid. Why pay them when you can build one with your own hands? Let us discuss the steps to build a quality scraping tool using Python.
Steps To Scrape Email Addresses
Though it will be a very simple example for beginners, it will be a learning experience, especially for those who are new to web scraping. This will be a step-by-step tutorial that will help you get email addresses without any limits. Let’s start with the building process of our intelligent web scraper.
Step 1: Importing Modules
We will be using the following six modules for our project.
// python
import re
import requests
from urllib.parse import urlsplit
from collections import deque
from bs4 import BeautifulSoup
import pandas as pd
from google.colab import files
The details of the imported modules are given below:
- re is for regular expression matching.
- requests for sending HTTP requests.
- urlsplit for dividing the URLs into component parts.
- deque is a container that is in the form of a list used for appending and popping on either end.
- BeautifulSoup for pulling data from HTML files of different web pages.
- pandas for email formatting into DataFrame and for further operations.
Step 2: Initializing Variables
In this step, we will initialize a deque that will save scraped URLs, unscraped URLs, and a set of saving emails scraped successfully from the websites.
// python
# read url from input
original_url = input("Enter the website url: ")
# to save urls to be scraped
unscraped = deque([original_url])
# to save scraped urls
scraped = set()
# to save fetched emails
emails = set()
Duplicate elements are not allowed in a set, so they are all unique.
Step 3: Starting the Scraping Process
- The first step is to distinguish between the scraped and unscraped URLs. The way to do this is to move a URL from unscraped to scraped.
// python
while len(unscraped):
# move unsraped_url to scraped_urls set
url = unscraped.popleft() # popleft(): Remove and return an element from the left side of the deque
scraped.add(url)
- The next step is to extract data from different parts of the URL. For this purpose, we will use urlsplit.
// python
parts = urlsplit(url)
urlsplit() returns a 5-tuple: (addressing scheme, network location, path, query, fragment, identifier).
I can’t show sample inputs and outputs for urlsplit() due to confidential reasons, but once you try, the code will ask you to input some value (website address). The output will display the SplitResult(), and inside the SplitResult() there would be five attributes.
This will allow us to get the base and path part for the website URL.
// python
base_url = "{0.scheme}://{0.netloc}".format(parts)
if '/' in parts.path:
path = url[:url.rfind('/')+1]
else:
path = url
- This is the time to send the HTTP GET request to the website.
// python
try:
response = requests.get(url)
except (requests.exceptions.MissingSchema, requests.exceptions.ConnectionError):
# ignore pages with errors and continue with next url
continue
- For extracting the email addresses we will use the regular experession and then add them to the email set.
// python
# You may edit the regular expression as per your requirement
new_emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.com",
response.text, re.I)) # re.I: (ignore case)
emails.update(new_emails)
Regular expressions are of massive help when you want to extract the information of your own choice. If you are not comfortable with them, you can have a look at Python RegEx for more details.
- The next step is to find all linked URLs to the website.
// python
# create a beutiful soup for the html document
soup = BeautifulSoup(response.text, 'lxml')
The <a href=””> tag indicates a hyperlink that can be used to find all the linked URLs in the document.
// python
for anchor in soup.find_all("a"):
# extract linked url from the anchor
if "href" in anchor.attrs:
link = anchor.attrs["href"]
else:
link = ''
# resolve relative links (starting with /)
if link.startswith('/'):
link = base_url + link
elif not link.startswith('http'):
link = path + link
Then we will find the new URLs and add them in the unscraped queue if they are not in the scraped nor in the unscraped.
When you try the code on your own, you will notice that not all the links are able to be scraped, so we also need to exclude them,
// python
if not link.endswith(".gz" ):
if not link in unscraped and not link in scraped:
unscraped.append(link)
Step 4: Exporting Emails to a CSV file
To analyze the results in a better way, we will export the emails to the CSV file.
// python
df = pd.DataFrame(emails, columns=["Email"]) # replace with column name you prefer
df.to_csv('email.csv', index=False)
If you are using Google Colab,you can download the file to your local machine by
// python
from google.colab import files
files.download("email.csv")
As already explained, I can’t show the scrapped email addresses due to confidentiality issues.
[Disclaimer! Some websites don’t allow to do web scraping and they have very intelligent bots that can permanently block your IP, so scrape at your own risk.]
Complete Code
// python
import re
import requests
from urllib.parse import urlsplit
from collections import deque
from bs4 import BeautifulSoup
import pandas as pd
from google.colab import files
# read url from input
original_url = input("Enter the website url: ")
# to save urls to be scraped
unscraped = deque([original_url])
# to save scraped urls
scraped = set()
# to save fetched emails
emails = set()
while len(unscraped):
url = unscraped.popleft()
scraped.add(url)
parts = urlsplit(url)
base_url = "{0.scheme}://{0.netloc}".format(parts)
if '/' in parts.path:
path = url[:url.rfind('/')+1]
else:
path = url
print("Crawling URL %s" % url)
try:
response = requests.get(url)
except (requests.exceptions.MissingSchema, requests.exceptions.ConnectionError):
continue
new_emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.com", response.text, re.I))
emails.update(new_emails)
soup = BeautifulSoup(response.text, 'lxml')
for anchor in soup.find_all("a"):
if "href" in anchor.attrs:
link = anchor.attrs["href"]
else:
link = ''
if link.startswith('/'):
link = base_url + link
elif not link.startswith('http'):
link = path + link
Proxies in Scraping Email Addresses
As businesses require numerous email addresses to build their contact list, it is necessary to collect data from multiple sources. A manual data collection process may be tedious and time-consuming. In this case, scrapers usually go for proxies to speed up the process and bypass the restrictions that come their way. Proxyscrape provides high-bandwidth proxies that are capable of scraping unlimited data and work 24/7 to ensure uninterrupted functionality. Their proxy anonymity level is high enough to hide the identity of the scrapers.
Frequently Asked Questions
Creating a potential contact list with qualified email addresses will ease the process of reaching out to the target audience. As most people use email as their communication medium, it is quite easier to reach them through email addresses.
While scraping the email addresses from multiple sources, scrapers may face some challenges like IP blocks or geographical barriers. In this case, proxies will hide users’ addresses with the proxy address and remove the blocks in accessing blocked websites.
It is always legal to collect publicly available data. So, scrapers must make sure the data they are collecting is available in the public domain. If not they can collect data with prior permission to maintain legality in scraping.
Wrapping Up
In this article, we have explored one more wonder of web scraping by showing a practical example of scraping email addresses. We have tried the most intelligent approach by making our web crawler using Python and it’s the easiest and yet most powerful library called BeautfulSoup. Web Scraping can be of massive help if done rightfully considering your requirements. Although we have written a very simple code for scraping email addresses, it is totally free of cost, and also, you don’t need to rely on other services for this. I tried my level best to simplify the code as much as possible and also added room for customization so you optimize it according to your own requirements.
If you are looking for proxy services to use during your scraping projects, don’t forget to look at ProxyScrape residential and premium proxies.
Recent Blogs
Outstanding Support That Prioritizes You
Get 100% human, expert support available 24/7. Our team understands proxy infrastructure and prioritizes your needs to ensure smooth setup, fast troubleshooting, and uninterrupted performance.
Real experts, not bots
Fast response times 24/7
Expert help when you need it most
