Are statistics sexy? Visualising social networks certainly is! I wrote a little function, which makes producing beautiful plots depicting a mailbox with R an extremely easy task. I find visualisations of ‘social graphs’ particularly appealing. They look like flowers.
I had to use a few Python functions which can be executed within R with rJython library. The function connects to IMAP server and looks for “To:” and “From:” sections in stored emails. It should not be difficult to adapt this script to work with POP3 too. I am really impressed by what R can do (with a little bit of help from Python). Can anyone suggest a more elegant way to do the same thing without executing Python?
As rJython depends on rJava I had to install Java Development kit to launch it.
Warning: For me this function worked very well and did not do any harm to my mailbox. Despite that I am not an expert in IMAP so if you are going to run it you are doing it at your own risk.
Here is the function:
mailSoc <- function(login, pass, serv = "imap.gmail.com", #specify IMAP server ntore = 50, #ignore if addressed to more than todow = -1, #how many to download begin = -1){ #from which to start #load rJython and Python libraries require(rJython) rJython <- rJython(modules = "imaplib") rJython$exec("import imaplib") #connect to server rJython$exec(paste("mymail = imaplib.IMAP4_SSL('", serv, "')", sep = "")) rJython$exec(paste("mymail.login(\'", login, "\',\'", pass, "\')", sep = "")) #get number of available messages rJython$exec("sel = mymail.select()") rJython$exec("number = sel[1]") nofmsg <- .jstrVal(rJython$get("number")) nofmsg <- as.numeric(unlist(strsplit(nofmsg, "'"))[2]) #if 'begin' not specified begin from the newest if(begin == -1) { begin <- nofmsg } #if 'todow' not specified download all if(todow == -1) { end <- 1 } else { end <- begin - todow } #give a little bit of information todownload <- begin - end print(paste("Found", nofmsg, "emails")) print(paste("I will download", todownload, "messages.")) print("It can take a while") data <- data.frame() #fetching emails for (i in begin:end) { nr <- as.character(i) #get sender rJython$exec(paste("typ, fro = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (from)])\')", sep = "")) rJython$exec("fro = fro[0][1]") from <- .jstrVal(rJython$get("fro")) from <- unlist(strsplit(from, "[<>\r\n, \"]")) from <- sub("from: ", "", from, ignore.case = TRUE) from <- grep("@", from, value = TRUE) #get addresees rJython$exec(paste("typ, to = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (to)])\')", sep = "")) rJython$exec("to = to[0][1]") to <- .jstrVal(rJython$get("to")) to <- unlist(strsplit(to, "[<>\r\n, \"]")) to <- sub("to: ", "", to, ignore.case = TRUE) from <- sub("\"", "", from, ignore.case = TRUE) to <- grep("@", to, value = TRUE) #if reasonable number of addressses add to data frame if(length(to) <= ntore){ vec <- rep(from, length(to)) data <- rbind(data, data.frame(vec, to)) } #give some information about progress if((i - begin) %% 100 == 0) { print(paste((i - begin)*(-1), "/", todownload, " Downloading...", sep = "")) } } names(data) <- c("from", "to") data$from <- tolower(data$from) data$to <- tolower(data$to) #close connection rJython$exec("mymail.shutdown()") return(data) }
Now we can run eg.
#download 200 most recent emails from gmail account maild <- mailSoc("login", "password", serv = "imap.gmail.com", ntore = 40, todow = 200)
And to make a plot it is necessary to load network library
library(network) mailnet <- network(maild) plot(mailnet)
This is the result:
R provides many other social network analysis tools such as igraph library. For instance, it can be used to make an interactive ‘plot’:
library(igraph) h <- graph.data.frame(maild, directed = FALSE) tkplot(h, vertex.label = V(h)$name, layout=layout.fruchterman.reingold)
I would like to learn more about SNA as well as I would like to try out Gephi which can produce visualisations which are even more attractive than those made in R so I think that I will write about my first impressions soon.
UPDATE: I tested it only with gmail. If anybody tries it with other email servers please let me know about the results.

I think you have a typo at the end, should be plot(mailnet). not plot(maild).
Ups! You are right. Corrected. Thanks!
plot(maild) => plot(mailnet) : This line is still wrong in the version of the blog-post on the r-bloggers web site.
I think there is nothing I can do about it now.
Probably a stupid question, but what exactly are you visualising here?
I think you could take a look at these pages:
http://en.wikipedia.org/wiki/Social_network_analysis#Visualization_of_networks
or/and
http://svitsrv25.epfl.ch/R-doc/library/network/html/plot.network.html
http://cneurocvs.rmki.kfki.hu/igraph/doc/R/tkplot.html
First of all I think that’s a great idea, but I’m confused about the dot in the bottom left corner!? No from nor a to?
Could you please keep us informed whether the article at r-bloggers automagically pulls an update concerning your typo?
Btw. the shocking pink background of marked words makes me insane
This is an email that somebody sent to himself and just put me in the “CC:” field. I know that it is strange but it happens from time to time.
I am trying to learn a little bit about R, so I tried to do this on my Mac using the R.app. After stumbling around for a while, I think I have all the dependent packages installed and I copy & pasted the main function into a file. Then I used the source(“path.to.mailsoc”) command in the interactive terminal. Then I pasted the “maild <- mailSoc… [snip]" with my correct credentials. After thinking for a while R fails with the error.
“Error in .jcall(“RJavaTools”, “Ljava/lang/Object;”, “invokeMethod”, cl, :
Traceback (most recent call last):
File “”, line 1, in
File “/Library/Frameworks/R.framework/Versions/2.13/Resources/library/rJython/jython.jar/Lib/imaplib.py”, line 437, in fetch
File “/Library/Frameworks/R.framework/Versions/2.13/Resources/library/rJython/jython.jar/Lib/imaplib.py”, line 1055, in _simple_command
File “/Library/Frameworks/R.framework/Versions/2.13/Resources/library/rJython/jython.jar/Lib/imaplib.py”, line 892, in _command_complete
imaplib.error: FETCH command error: BAD ['Could not parse command']
>
> library(network)
> mailnet plot(maild)
Error in plot(maild) : object ‘maild’ not found
I guess I need to declare maild first? how?
Thanks for any help!!
OK, I just tried this in a linux virtual environment and got a segfault in the same imap library.
hello there, really good blog, and a decent understand! definitely one for my book marks.
I can confirm that your beautiful function works on my institution server (postal.uv.es) and in aim.con (imap.aim.com).
Thanks
Now that I have this working somewhat, I thought I would point out some limitations that I think I have found.
1: I think that domains which use google services and gmail fail to authenticate. In these cases the username for Google’s imap server is the complete email address, including the domain (User.Name@domain.com). While for usernames for email addresses using the gmail.com domain, “@gmail.com” is omitted. I only had access to 1 account like this, so I am not 100% sure.
2: Passwords with symbols can cause errors. The specific case I saw was a password which included “\”, which I have been using for months with no other problems.
3: It only fetches messages with the label “inbox”, messages with other labels are ignored. This causes the mailboxes of fastidious organizers to be pretty much ignored.
4: If the value of available messages (with the label “inbox”) is less than the value of the specified sample size (todow) the process fails with the error:
Error in .jcall(“RJavaTools”, “Ljava/lang/Object;”, “invokeMethod”, cl, :
Traceback (most recent call last):
File “”, line 1, in
File “/Library/Frameworks/R.framework/Versions/2.13/Resources/library/rJython/jython.jar/Lib/imaplib.py”, line 437, in fetch
File “/Library/Frameworks/R.framework/Versions/2.13/Resources/library/rJython/jython.jar/Lib/imaplib.py”, line 1055, in _simple_command
File “/Library/Frameworks/R.framework/Versions/2.13/Resources/library/rJython/jython.jar/Lib/imaplib.py”, line 892, in _command_complete
imaplib.error: FETCH command error: BAD ['Could not parse command']
5: If the value of the value of the specified sample size (todow) is greater than about 425, the process fails with the error:
[1] “Found 16881 emails”
[1] “I will download 500 messages.”
[1] “It can take a while”
[1] “0/500 Downloading…”
[1] “100/500 Downloading…”
[1] “200/500 Downloading…”
[1] “300/500 Downloading…”
[1] “400/500 Downloading…”
Error in data.frame(vec, to) :
arguments imply differing number of rows: 0, 1
Cheers!
when I ask it to download 2000+ emails, r issues a warning. It seems this code could not efficiently deal with a bigger data.
Tweaked it a little. This allows selection of a folder, and also returns the date stamp on each message (for other interesting analyses, e.g. what time of day / day of week do you get most email?)
mailSoc <- function(login,
pass,
serv = "imap.gmail.com", #specify IMAP server
#ntore = 50, #ignore if addressed to more than
todow = -1, #how many to download
begin = -1, #from which to start
folder = ''){ #folder to download (default:inbox)
#load rJython and Python libraries
require(rJython)
rJython <- rJython(modules = "imaplib")
rJython$exec("import imaplib")
#connect to server
rJython$exec(paste("mymail = imaplib.IMAP4_SSL('",
serv, "')", sep = ""))
rJython$exec(paste("mymail.login(\'",
login, "\',\'",
pass, "\')", sep = ""))
#get number of available messages
rJython$exec(paste("sel = mymail.select(\"", folder,"\")", sep=""))
rJython$exec("number = sel[1]")
nofmsg <- .jstrVal(rJython$get("number"))
nofmsg <- as.numeric(unlist(strsplit(nofmsg, "'"))[2])
#if 'begin' not specified begin from the newest
if(begin == -1)
{
begin <- nofmsg
}
#if 'todow' not specified download all
if(todow == -1)
{
end <- 1
}
else
{
end <- begin – todow
}
#give a little bit of information
todownload <- begin – end
print(paste("Found", nofmsg, "emails"))
print(paste("I will download", todownload, "messages."))
print("It can take a while")
data <- data.frame()
#fetching emails
for (i in begin:end) {
nr <- as.character(i)
#get sender
rJython$exec(paste("typ, fro = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (from)])\')", sep = ""))
rJython$exec("fro = fro[0][1]")
from <- .jstrVal(rJython$get("fro"))
from <- unlist(strsplit(from, "[\r\n, \"]“))
from <- sub("from: ", "", from, ignore.case = TRUE)
from <- grep("@", from, value = TRUE)
#get addresees
rJython$exec(paste("typ, to = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (to)])\')", sep = ""))
rJython$exec("to = to[0][1]")
to <- .jstrVal(rJython$get("to"))
to <- unlist(strsplit(to, "[\r\n, \"]“))
to <- sub("to: ", "", to, ignore.case = TRUE)
from <- sub("\"", "", from, ignore.case = TRUE)
to <- grep("@", to, value = TRUE)
#get dates:
rJython$exec(paste("typ, date = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (date)])\')", sep = ""))
rJython$exec("date = date[0][1]")
date <- .jstrVal(rJython$get("date"))
date <- strptime(date, format="Date: %a, %d %b %Y %H:%M:%S %z")
#add to data frame
#vec <- rep(from, length(to))
if(length(to)==0)
to <- 'NA'
if(length(from)==0)
to <- 'NA'
data <- rbind(data, data.frame(from, to, date))
#give some information about progress
print(i)
if((i – begin) %% 100 == 0)
{
print(paste((i – begin)*(-1), "/", todownload,
" Downloading…", sep = ""))
}
}
names(data) <- c("from", "to", "date")
data$from <- tolower(data$from)
data$to <- tolower(data$to)
#close connection
rJython$exec("mymail.shutdown()")
return(data)
}
Oh, and your “arguments imply differing number of rows: 0, 1″ error is due to a blank to or from field, which causes data.frame to choke. I’ve added a fix for that.