Sample of 50 emails
Description
This is a subsample of the email
data set.
Usage
email50
Format
A data frame with 50 observations on the following 21 variables.
- spam
Indicator for whether the email was spam.
- to_multiple
Indicator for whether the email was addressed to more than one recipient.
- from
Whether the message was listed as from anyone (this is usually set by default for regular outgoing email).
- cc
Number of people cc'ed.
- sent_email
Indicator for whether the sender had been sent an email in the last 30 days.
- time
Time at which email was sent.
- image
The number of images attached.
- attach
The number of attached files.
- dollar
The number of times a dollar sign or the word “dollar” appeared in the email.
- winner
Indicates whether “winner” appeared in the email.
- inherit
The number of times “inherit” (or an extension, such as “inheritance”) appeared in the email.
- viagra
The number of times “viagra” appeared in the email.
- password
The number of times “password” appeared in the email.
- num_char
The number of characters in the email, in thousands.
- line_breaks
The number of line breaks in the email (does not count text wrapping).
- format
Indicates whether the email was written using HTML (e.g. may have included bolding or active links).
- re_subj
Whether the subject started with “Re:”, “RE:”, “re:”, or “rE:”
- exclaim_subj
Whether there was an exclamation point in the subject.
- urgent_subj
Whether the word “urgent” was in the email subject.
- exclaim_mess
The number of exclamation points in the email message.
- number
Factor variable saying whether there was no number, a small number (under 1 million), or a big number.
Source
David Diez's Gmail Account, early months of 2012. All personally identifiable information has been removed.