Antoine Dahan

Ashley Madison Leak User's Email Domains

Published on

Looking through the Ashley Madison email dataset, I noticed a surprising number of users registered with employee email addresses. Here’s a list of the top 20 Fortune 50 companies ordered by number of employee emails registered on Ashley Madison.

Company Email Domain # Of Emails
General Motors 361
IBM 251
General Electric 217
Hewlett Packard 193
Ford 179
Wells Fargo 171
Proctor and Gamble 152
Boeing 117
UPS 74
Bank of America 72
Citi Bank 60
PepsiCo 59
DOW 59
Walmart 52
Metlife 43
State Farm 37
AIG 26
Archer Daniels Midland 19
Kroger 14
CVS 12

The overwhelming majority of users did not register with their employee email addresses. The overall distribution of email address domains is shown in the pie chart below:


Here’s the script used to sift through the dump:

# write domain counts to file after every 1,000,000 emails read
increment_to_save = 1000000

def write_to_file(file_name, domains):
    # sort the domains by count
    from operator import itemgetter
    domains = sorted(domains, key=itemgetter('count'))

    # write the domains to the results file
    results_file = open(file_name, "w")
    domains = reversed(domains)
    for domain in domains:
        results_file.write(domain["domain"] + "," + str(domain["count"]) + "\n")

emails_file = open("emails_dump.txt", "r")

# storage for domain counts
domains = list()

email_number = 1
for email in emails_file:
    # if not @ symbol in email, go onto next email
    if(email.find("@") != -1):
        # get the email's domain
        email_domain = email.split("@")[-1].rstrip('\n')
        # go through the existing domains checking for the email's domain
        domain_exists = False
        for domain in domains:
            if domain["domain"] == email_domain:
                domain["count"] += 1
                domain_exists = True
        # if the domain doesn't already exist in the counts, then add it     
        if not domain_exists:
            new_domain = {"domain": "", "count": 1}
            new_domain["domain"] = email_domain
        # for every 1 million email addresses read, write the current domain counts to a file
        if ((email_number % increment_to_save) == 0) and (email_number > 0):
            write_to_file("domain_counts_" + str(int(email_number/increment_to_save)) + ".csv", domains)

# write the final domain counts to a file
write_to_file("final_domain_counts.csv", domains)