Jun 242013
 

As a freelance tech contractor, I get a lot of emails every day from recruiters about prospective jobs. Many of these are unsolicited, but this is a good thing, as over time, quite a large list of contacts can be built up. I don’t have the time to reply to each one – particularly those that don’t suit my skillset, or if I’m happily indentured – but I do file these emails into a separate Gmail Mailbox which I’ve labelled “Employment”.

I’m not going to add each recruiter to my Contacts as it arrives – that would be time consuming to start and difficult to maintain. But when it comes time to make a great big recruiter contact list, I’ve written the Python script below to scrap the entire mailbox and output each unique email address.

#!/usr/bin/python

import imaplib
import sys
import email
import re

#FOLDER=sys.argv[1]
FOLDER='Employment'
LOGIN='example.address@gmail.com'
PASSWORD='xxxxxxxxxxxx'
IMAP_HOST = 'imap.gmail.com'  # Change this according to your provider

email_list = []
email_unique = []

mail = imaplib.IMAP4_SSL(IMAP_HOST)
mail.login(LOGIN, PASSWORD)
mail.select(FOLDER) 

result, data = mail.search(None, 'ALL')
ids = data[0]
id_list = ids.split()
for i in id_list:
	typ, data = mail.fetch(i,'(RFC822)')
	for response_part in data:
		if isinstance(response_part, tuple):
			msg = email.message_from_string(response_part[1])
			sender = msg['from'].split()[-1]
			address = re.sub(r'[<>]','',sender)
# Ignore any occurences of own email address and add to list
	if not re.search(r'' + re.escape(LOGIN),address) and not address in email_list:
		email_list.append(address)
		print address

I’ve hard-coded my email login, password and the mailbox name, although it’s easy enough to modify the script to enter them as argmuents (I’ve commented out a line demonstrating this).

In a later post, I’m going to discuss how I use this script for job-seeking.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

  9 Responses to “Extract all sender email addresses from a mailbox with Python”

  1. Hi Matt,
    Thank you for your posting. I’m getting an invalid syntax error when i run the code. Would you please take a look of my code. I’m using Python 3.3.2. I don’t know if it makes a difference.

    >>> import imaplib
    >>> import sys
    >>> import email
    >>> import re
    >>> folder=’inbox’
    >>> user=’GoodmanJ@gmail.com’
    >>> password=’XXXXXXX’
    >>> IMAP_HOST = ‘imap.gmail.com’
    >>> email_list = []
    >>> email_unique=[]
    >>> mail=imaplib.IMAP4_SSL(IMAP_HOST)
    >>> mail.login(user,password)
    (‘OK’, [b’goodmanj@gmail.com Manix Kaplan authenticated (Success)’])
    >>> mail.select(folder)
    (‘OK’, [b’9′])
    >>> data = mail.search(None, ‘ALL’)
    >>> ids = data[0]
    >>> id_list = ids.split()
    >>> for i in id_list:
    typ, data = mail.fetch(i,'(RFC822)’)
    for response_part in data:
    if isinstance(response_part, tuple):
    msg = email.message_from_string(response_part[1])
    sender = msg[‘from’].split()[-1]
    address = re.sub(r'[]’,”,sender)
    print address

    • I’m not too sure as the formatting has been lost when you pasted your code into the comment. However, bear in mind that whitespace indentation in Python is used to demarcate blocks. Make sure that you’ve indented correctly.

  2. WOW
    Amazing

  3. Please help — I get the following error message (using Python 3.3.2)
    Thank you!
    Johannes

    —————–

    File “/Users/js2044/Desktop/area.py”, line 550, in mail_box
    msg = email.message_from_string(response_part[1])
    File “/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/__init__.py”, line 40, in message_from_string
    return Parser(*args, **kws).parsestr(s)
    File “/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/parser.py”, line 69, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
    TypeError: initial_value must be str or None, not bytes

  4. I am on windows 7 – 64 bit and this works like charm with Python 2.7, have not test with the lates 3.4 version yet but thanks for putting it out there mate. If you would be free for a moment would you care to give an idea how this could be modified to include the email addresses from the body of the email too.
    Cheers.

  5. I need to extract ids from all the fields “to, from, cc, bcc”
    how to use the script in such case

  6. Saved me some time. Thanks.

  7. I have another quick question: if I want to have store the time sent by each email, how can I extract that piece of information?

    Thank you!

  8. In a similar situation to yourself, this save me some time once I got the right path delimiter for the Mailbox in my Dovecot IMAP server ‘INBOX.dir1.dir2.dir3’

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>