Note: This metadata was prepared by the CRAWDAD team and verified by the data set (or tool) authors. We have made every effort to ensure its accuracy, but urge all users to consider the metadata and data carefully and be sure that their use in research is consistent with the nature and limitations of the data. We welcome any corrections.
|
version
| v. 2005-07-01 |
|
changes
| the initial version |
|
bibtex
|
@MISC{mit-reality-2005-07-01,
author = {Nathan Eagle and Alex (Sandy) Pentland},
title = {{CRAWDAD} data set mit/reality (v. 2005-07-01)},
howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/mit/reality},
month = jul,
year = 2005
}
|
| metadata last modified | 2006-11-09 |
| summary | The authors have captured communication, proximity, location, and activity information from 100 subjects at MIT over the course of the 2004-2005 academic year. This data represents over 350,000 hours (~40 years) of continuous data on human behavior. Such rich data on complex social systems have implications for a variety of fields. |
| release date | 2005-07-01 |
| measurement start | 2004-07-26 |
| measurement end | 2005-05-05 |
| authors | Nathan Eagle Alex (Sandy) Pentland
|
|
web site
| http://reality.media.mit.edu/ |
|
wiki
|
go to the wiki page for this data set
|
| keyword | Bluetooth, cellular network, social network, DTN, location |
| measurement purposes | Social Network Analysis Human Behavior Modeling
|
| network type | cellular network |
| network type | bluetooth |
| environment | Our study consists of one hundred Nokia 6600 smart phones pre-installed
with several pieces of software we have developed as well as a version of
the Context application from the University of Helsinki.
Seventy-five users are either students or faculty in the MIT Media Laboratory,
while the remaining twenty-five are incoming students at the MIT Sloan business
school adjacent to the laboratory. Of the seventy-five users at the lab,
twenty are incoming masters students and five are incoming MIT freshman. |
| network | We exploit the fact that modern phones use both a short-range RF network
(e.g., Bluetooth) and a long-range RF network (e.g., GSM), and that
the two networks can augment each other for location and activity inference.
We logged cell tower ID to determine approximate location and at the same
time we logged Bluetooth devices.
Bluetooth is a wireless protocol in the 2.40-2.48 GHz range, developed
by Ericsson in 1994 and released in 1998 as a serial-cable replacement
to connect different devices. |
| collection | The information we are collecting includes call logs, Bluetooth devices in proximity,
cell tower IDs, application usage, and phone status (such as charging and idle),
which comes primarily from the Context application. The study will generate
data collected by one hundred human subjects over the course of nine months and
represent approximately 500,000 hours of data on users' location, communication
and device usage behavior. |
|
tracesets included
| mit/reality/blueaware (v. 2005-07-01)
|
|
version
| v. 2005-07-01 |
|
changes
| the initial version |
|
bibtex
|
@MISC{mit-reality-blueaware-2005-07-01,
author = {Nathan Eagle and Alex (Sandy) Pentland},
title = {{CRAWDAD} trace set mit/reality/blueaware (v. 2005-07-01)},
howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/mit/reality/blueaware},
month = jul,
year = 2005
}
|
| metadata last modified | 2006-10-17 |
| summary | The authors have captured communication, proximity, location, and activity information from 100 subjects at MIT over the course of the 2004-2005 academic year. This data represents over 350,000 hours (~40 years) of continuous data on human behavior. |
| release date | 2005-06-01 |
| measurement start | 2004-07-26 |
| measurement end | 2005-05-05 |
| measurement purposes | Social Network Analysis Human Behavior Modeling
|
| methodology | Every Bluetooth device is capable of device-discovery, which allows them
to collect information on other Bluetooth devices within 5-10 meters.
This information includes the Bluetooth MAC address (BTID), device name, and
device type. The BTID is a hex number unique to the particular device.
The device name can be set at the user's discretion;
e.g., Tony's Nokia. Finally, the device type is a set of three integers
that correspond to the device discovered; e.g., Nokia mobile phone, or IBM laptop.
To log BTIDs we designed a software application, BlueAware, that runs passively
in the background on MIDP2-enabled mobile phones. Bluetooth was primarily designed
to enable wireless headsets or laptops to connect to phones, but as a byproduct,
devices are becoming aware of other Bluetooth devices carried by people
nearby. Our application records and timestamps the BTIDs encountered in a proximity
log and makes them available to other applications. BlueAware is automatically
run in the background when the phone is turned on, making it essentially invisible
to the user.
Bluedar was developed to be placed in a social setting and continuously scan
for visible devices, wirelessly transmitting detected BTIDs to a server
over an 802.11b network. The heart of the device is a Bluetooth
beacon designed by Mat Laibowitz incorporating a class 2 Bluetooth chipset that can
be controlled by an XPort web server. We integrated this beacon with an
802.11b wireless bridge and packaged them in an unobtrusive box. An application
was written to continuously telnet into multiple BlueDar systems, repeatedly scan for
Bluetooth devices, and transmit the discovered proximate BTIDs to our server. Because
the Bluetooth chipset is a class 2 device, it is able to detect any visible Bluetooth
device within a working range of up to twenty-five meters. |
| hole | 1. All the data from a phone are stored on a flash memory card, which has a finite
number of read-write cycles. Initial versions of our application wrote over the same
cells of the memory card. This led to failure of a new card after about a month of data
collection, resulting in the complete loss of data. When the application was changed
to store the incremental logs in RAM and subsequently write each complete log
to the flash memory, our data corruption issues virtually vanished.
However, ten cards were lost before this problem was identified, destroying portions
of the data collected during the months of September and October for six Sloan
students and four Media Lab students.
2. Another source of missing data is due to powered-off devices.
On average we have logs accounting for approximately 85.3% of the time
since the phones have been deployed. Less than 5% of this is due to data corruption,
while the majority of the missing 14.7% is due to almost one fifth of the subjects
turning off their phones at night.
3. There is a small probability (between 1-3% depending on the phone)
that a proximate, visible device will not be discovered during a scan.
Typically this is due to either a low level Symbian crash of an application
called the "BTServer", or a lapse in the device discovery protocol. The BT server
crashes and restarts approximately once every three days
(at a 5 minute scanning interval) and accounts for a small fraction of the total error.
However, to detect other subjects, we can leverage the redundancy implicit in the system.
Because both of the subjects' phones are actually scanning, the probability of
a simultaneous crash or device discovery error is less than 1 in 1000 scans. |
| limitation | 1. Continually scanning and logging BTIDs can expend an older mobile phone battery
in about 18 hours. While continuous scans provide a rich depiction of a user's
dynamic environment, most individuals expect phones to have standby times exceeding
48 hours. Therefore BlueAware was modified to only scan the environment once every
five minutes, providing at least 36 hours of standby time.
2. While the custom logging application on the phone crashes occasionally
(approximately once every week), these crashes fortunately do not result
in significant data loss. An additional small application was written to start
on boot and continually review the running processes on the phone,
verifying that our logging application is always running. Should there be a time
where this is not the case, the application is immediately restarted.
This functionality also ensures that logging begins immediately once the phone
is turned on. However, while this logging application is now fairly robust
and can be assumed to be running anytime the phone is on, the dataset generated is
certainly not without noise.
3. By scanning only periodically every five minutes, shorter proximity events
may be missed. |
| error | 1. The ten meter range of Bluetooth along with the fact that it can penetrate
some types of walls, means that people not physically proximate may incorrectly
be logged as such.
2. An error comes from the phone being either explicitly turned off by the user
or exhausting the batteries. According to our collected survey data, users report
exhausting the batteries approximately 2.5 times each month. One fifth of our subjects
manually turn the phone off on a regular basis during specific contexts such
as classes, movies, and (most frequently) when sleeping. Immediately before the phone
powers down, the event is timestamped and the most recent log is closed. A new log
is created when the phone is restarted and again a timestamp is associated with the event.
3. A more critical source of error occurs when the phone is left on, but not carried
by the user. From surveys, we have found that 30% of our subjects claim to never
forget their phones, while 40% report forgetting it about once each month, and
the remaining 30% state that they forget the phone approximately once each week.
Identifying the times where the phone is on, but left at home or in the office presents
a significant challenge when working with the dataset. To grapple with the problem,
we have created a 'forgotten phone' classifier. Features included staying in the same
location for an extended period of time, charging, and remaining idle through missed
phone calls, text messages and alarms. When applied to a subsection of the dataset
which had corresponding diary text labels, the classifier was able to identify
the day where the phone was forgotten, but also mislabeled a day when the user
stayed home sick. By ignoring both days, we risk throwing out data on outlying days,
but have greater certainty that the phone is actually with the user. A significantly
harder problem is to determine whether the user has temporarily moved beyond ten meters
of his or her office without taking the phone. Empirically, this appears to happen
with many subjects on a regular basis and there doesn't seem to be enough unique
features of the event to accurately classify it. However, this phenomenon does not
diminish the extremely strong correlation between detected proximity and self-report
interactions. Lastly, while frequency of proximity within the workplace can be useful,
the most salient data comes from detecting a proximity event outside MIT,
where temporarily forgetting the phone is less likely to repeatedly occur. |
| note | In return for the use of the Nokia 6600 phones, students have been asked to
fill out web-based surveys regarding their social activities and the people
they interact with throughout the day. Comparison of the logs with survey data
has given us insight into our dataset's ability to accurately map social network
dynamics. Through surveys of approximately forty senior students, we have validated
that the reported frequency of (self-report) interaction is strongly correlated
with the number of logged BTIDs (R=.78, p=.003), and that the dyadic self-report
data has a similar correlation with the dyadic proximity data (R=.74, p~=.0001).
Additionally, a subset of subjects kept detailed activity diaries over several months.
Comparisons revealed no systematic errors with respect to proximity and location,
except for omissions due to the phone being turned off. |
| download url | Download (39 MB tar.gz) from US UK |
| parent data | mit/reality (v. 2005-07-01)
|
|
traces included
| mit/reality/blueaware/activityscpan (v. 2005-07-01) mit/reality/blueaware/callspan (v. 2005-07-01) mit/reality/blueaware/cellspan (v. 2005-07-01) mit/reality/blueaware/coverspan (v. 2005-07-01) mit/reality/blueaware/devicespan (v. 2005-07-01)
|
|
version
| v. 2005-07-01 |
|
changes
| The initial version |
|
bibtex
|
@MISC{mit-reality-blueaware-callspan-2005-07-01,
author = {Nathan Eagle and Alex (Sandy) Pentland},
title = {{CRAWDAD} trace mit/reality/blueaware/callspan (v. 2005-07-01)},
howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/mit/reality/blueaware/callspan},
month = jul,
year = 2005
}
|
| metadata last modified | 2006-10-17 |
| summary | Call span logs |
| derived | true |
| release date | 2005-07-01 |
| measurement start | 2004-08-03 |
| measurement end | 2004-12-25 |
| format | oid, endtime, starttime, person_oid, phonenumber_oid, callid, contact, description, direction, duration, number, status, remote
"person_oid" refers to the person running the software on their phone,
for which this call was logged. It is who this callspan is 'attached'
to, and will always be attached to some person_oid.
"direction" refers to the direction of the call from the perspective of
this particular person/cellphone that recorded this callspan (the same
as the person referred to by person_oid). Can be Incoming, Missed Call,
or Outgoing.
"phonenumber_oid" refers to the number 'on the other end' of the
network, which may be a landline, a cell phone line, or even that phone
network's voicemail.
So in other words, person_oid and phonenumber_oid represent the two ends
of the phone call, with the direction of the phone call represented in
the direction field. If you want to utilize all 897921 callspan records,
you might want to define these "calls" as between two phonenumbers,
instead of as between two persons. So the call would exist between
callspan.person_oid's phonenumber_oid, and the callspan.phonenumber_oid.
In addition, if the callspan records a call between two people that were
running the software and part of the study (they both are part of the
study), then there are a few additional properties that will hold about
the callspan:
For some person src: src.oid = callspan.person_oid (for all calls)
For some person dst: dst.phonenumber_oid = callspan.phonenumber_oid
(only for in-network calls)
There should also be a symmetric callspan going in the other direction.
For some callspan Y:
Y.person_oid == dst.oid
Y.phonenumber_oid = src.phonenumber_oid |
| configuration | call span logs |
| parent data | mit/reality/blueaware (v. 2005-07-01)
|
|
category
| techreport |
| authors | Pan Hui Jon Crowcroft
|
| title | Bubble Rap: Forwarding in small world DTNs in ever decreasing circles |
| month | --05-- |
| year | 2007 |
| institution | University of Cambridge Computer Laboratory |
| download url | http://www.cl.cam.ac.uk/TechReports/UCAM-CL-TR-684.pdf |
| abstract | In this paper we seek to improve understanding of the structure of human
mobility, and to use this in the design of forwarding algorithms for Delay
Tolerant Networks for the dissemination of data amongst mobile users.
Cooperation binds but also divides human society into communities. Members of
the same community interact with each other preferentially. There is structure
in human society. Within society and its communities, individuals have varying
popularity. Some people are more popular and interact with more people than
others; we may call them hubs. Popularity ranking is one facet of the
population. In many physical networks, some nodes are more highly connected to
each other than to the rest of the network. The set of such nodes are usually
called clusters, communities, cohesive groups or modules. There is structure to
social networking. Different metrics can be used such as information flow,
Freeman betweenness, closeness and inference power, but for all of them, each
node in the network can be assigned a global centrality value. What can be
inferred about individual popularity, and the structure of human society from
measurements within a network? How can the local and global characteristics of
the network be used practically for information dissemination? We present and
evaluate a sequence of designs for forwarding algorithms for Pocket Switched
Networks, culminating in Bubble, which exploit increasing levels of information
about mobility and interaction. |
| keywords | measurement |
| keywords | wireless |
| keywords | cambridge/haggle |
| keywords | mit/reality |
| keywords | upmc/content |
| keywords | crawdad |
| related data/tools | cambridge/haggle mit/reality upmc/content
|
|
category
| inproceedings |
| authors | Thomas Karagiannis Jean-Yves Le Boudec Milan Vojnovic
|
| title | Power law and exponential decay of inter contact times between mobile devices |
| booktitle | MobiCom '07: Proceedings of the 13th annual ACM international conference on Mobile computing and networking |
| year | 2007 |
| pages | 183-194 |
| address | Montreal, Quebec, Canada |
| keywords | measurement |
| keywords | wireless |
| keywords | mit/reality |
| keywords | cambridge/haggle |
| keywords | crawdad |
| download url | http://doi.acm.org/10.1145/1287853.1287875 |
| publisher | ACM Press |
| abstract | We examine the fundamental properties that determine the basic performance
metrics for opportunistic communications. We first consider the distribution of
inter-contact times between mobile devices. Using a diverse set of measured
mobility traces, we find as an invariant property that there is a
characteristic time, order of half a day, beyond which the distribution decays
exponentially. Up to this value, the distribution in many cases follows a power
law, as shown in recent work. This power law finding was previously used to
support the hypothesis that inter-contact time has a power law tail, and that
common mobility models are not adequate. However, we observe that the time
scale of interest for opportunistic forwarding may be of the same order as the
characteristic time, and thus the exponential tail is important. We further
show that already simple models such as random walk and random waypoint can
exhibit the same dichotomy in the distribution of inter-contact time asc in
empirical traces. Finally, we perform an extensive analysis of several
properties of human mobility patterns across several dimensions, and we present
empirical evidence that the return time of a mobile device to its favorite
location site may already explain the observed dichotomy. Our findings suggest
that existing results on the performance of forwarding schemes based on
power-law tails might be overly pessimistic. |
| related data/tools | mit/reality cambridge/haggle
|