CRAWDAD metadata: umich/virgil (v. 2008-03-28)

We collected the data set through war walking, i.e., collecting Wi-Fi beacons by walking around the neighborhoods in different cities in the United States, for the field study and evaluation of Virgil, an access point selection system.
[xml metadata]

Note: This metadata was prepared by the CRAWDAD team and verified by the data set (or tool) authors. We have made every effort to ensure its accuracy, but urge all users to consider the metadata and data carefully and be sure that their use in research is consistent with the nature and limitations of the data. We welcome any corrections. This metadata was prepared based on the following reference(s):


CRAWDAD metadata structure[what is CRAWDAD metadata]


[Dataset] umich/virgil (v. 2008-03-28)

top

version v. 2008-03-28
changes
the initial version
bibtex
@MISC{umich-virgil-2008-03-28,
  author = {Anthony J. Nicholson and Yatin Chawathe and Mike Chen and Brian Noble and David Wetherall},
  title = {{CRAWDAD} data set umich/virgil (v. 2008-03-28)}, 
  howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/umich/virgil},
  month = mar,  
  year = 2008
}
					
metadata last modified2008-03-31
summary
We collected the data set through war walking, i.e., collecting Wi-Fi beacons 
by walking around the neighborhoods in different cities in the United States, 
for the field study and evaluation of Virgil, an access point selection system.
release date2008-03-28
measurement start 2005-04-28
measurement end 2005-09-16
authorsAnthony J. Nicholson
Yatin Chawathe
Mike Chen
Brian Noble
David Wetherall
web site http://www.crawdad.org/umich/virgil
wiki go to the wiki page for this data set
keyword802.11, Wi-Fi hotspot, signal strength, wardriving
measurement purposesOpportunistic Connectivity
network type802.11 infrastructure
environment
802.11 wireless LAN access points (APs) are increasingly widespread 
in urban areas, with users commonly finding multiple APs on each scan. 
Therefore, access point selection - determining which AP will provide 
the best quality of service - is a critical problem. We conducted a 
small field study to determine the scope of the problem of existing 
access point discovery and selection system. Armed with the lessons 
from the field study, we designed a new AP selection system, which we 
named Virgil. 

We release two main tracesets: results from a field study 
(umich/virgil/field_study), and results from the evaluation of our 
prototype implementation of Virgil (umich/virgil/eval_data).
network
For the field study, we walked a 1/2 square mile (1.3 square kilometer)
grid of city streets with a PDA containing an 802.11 wireless card. 
The PDA ran Familiar Linux, a distribution targeted for handheld devices. 
We used a Compaq iPAQ handheld with an 802.11b wireless LAN card to collect 
data on the density and properties of different access points in an urban 
environment. 

For the evaluation study, we also used different hardware (a different iPAQ) 
for the evaluation runs than for the field study, due to equipment failure.
collection
For the field study portion of our traces, data was collected in three
different neighborhoods of Chicago, Illinois. For the evaluation, data 
was collected in five different neighborhoods, of three different cities 
in the United States. Please see the tracesets /umich/virgil/field_study 
and /umich/virgil/eval_data for the methodology details.
sanitization
Only the SSIDs and MAC addresses have been altered. Each MAC address
has been mapped to a string of the form mac:<neighborhood><uniqid> 
where uniqid is an increasing value (starting at 0) for each neighborhood, 
determined by the order of appearance in the trace of a given AP.
mind). We use the 32-bit MD5 hash of each SSID string.
download urlDownload (180KB gz)
(MD5 Hash: 90bfe33878f8fc2c978c0ea581d5d8fa) from US UK
tracesets included umich/virgil/field_study (v. 2008-03-28)
umich/virgil/eval_data (v. 2008-03-28)

[Traceset] umich/virgil/field_study (v. 2008-03-28)

top

version v. 2008-03-28
changes
the initial version.
bibtex
@MISC{umich-virgil-field_study-2008-03-28,
  author = {Anthony J. Nicholson and Yatin Chawathe and Mike Chen and Brian Noble and David Wetherall},
  title = {{CRAWDAD} trace set umich/virgil/field_study (v. 2008-03-28)}, 
  howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/umich/virgil/field_study},
  month = mar,  
  year = 2008
}
					
metadata last modified2008-03-31
summary
We collected the traceset through war walking, i.e., collecting Wi-Fi beacons 
by walking around the neighborhoods of Chicago, Illinois for the field study of 
Virgil, an access point selection system.
release date2008-03-28
measurement start 2005-04-28
measurement end 2005-05-05
measurement purposesOpportunistic Connectivity
methodology
We briefly summarize our methodology here---for full details, please
refer to our paper: A.J. Nicholson et al., "Improved Access Point
Selection", in proceedings of MobiSys 2006.

We walked each neighborhood with a Compaq iPAQ that contained an
802.11 wireless card. Our field study script repeatedly performed the
following process. First, it scanned for AP beacons, then processed
each detected AP in turn. For all APs, parameters such as AP MAC
address, channel, encryption status, signal strength, et cetera were
logged. Next, for AP not using encryption, the script attempted to
receive an IP address from the AP via DHCP. The success of this
operation was also logged.

Finally, for APs that granted a DHCP address, the script ran a series
of tests designed to probe the application-visible quality of the
Internet connection provided by the AP. The script connected to a
reference server at the University of Michigan to estimate the 
bandwidth and latency to an arbitrary Internet server, and the status
(open, closed, or redirected) of 37 common TCP ports.
parent dataumich/virgil (v. 2008-03-28)
traces included umich/virgil/field_study/warwalk (v. 2008-03-28)

[Traceset] umich/virgil/eval_data (v. 2008-03-28)

top

version v. 2008-03-28
changes
the initial version.
bibtex
@MISC{umich-virgil-eval_data-2008-03-28,
  author = {Anthony J. Nicholson and Yatin Chawathe and Mike Chen and Brian Noble and David Wetherall},
  title = {{CRAWDAD} trace set umich/virgil/eval_data (v. 2008-03-28)}, 
  howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/umich/virgil/eval_data},
  month = mar,  
  year = 2008
}
					
metadata last modified2008-03-31
summary
We collected the trace set through war walking, i.e., collecting Wi-Fi beacons 
by walking around the neighborhoods in different cities in the United States, 
for the evaluation of Virgil, an access point selection system.
release date2008-03-28
measurement start 2005-07-18
measurement end 2005-09-16
measurement purposesOpportunistic Connectivity
methodology
We briefly summarize our methodology here---for full details, please
refer to our paper: A.J. Nicholson et al., "Improved Access Point
Selection", in proceedings of MobiSys 2006.

We walked each neighborhood with a Compaq iPAQ that contained an
802.11 wireless card. Unlike for the field study data set, we did not
just periodically scan for available APs and test their
capabilities. The Virgil AP selection daemon periodically scanned and
tested APs to locate a usable AP, but once one was found, it stuck
with it until the iPAQ passed out of its radio range.

As a result, users of this dataset will notice significant gaps in
between scan sets. This is the time during which the device was
associated with an access point.

Note: due to a bug, all test results (AP frequency, signal strength,
et cetera) for APs using WEP encryption were mistakenly set to 0 when
Virgil wrote out the logs. This did not affect our results because
none of the algorithms in the evaluation attempted to use these 
encrypted, inaccessible APs. We regret, however, that this data on 
the link-layer properties of these encrypted APs is unavailable to the
user. We recommend the field study dataset for those who require such
data.

Also note that, unlike in the field study, the Virgil daemon caches 
test results for performance. Therefore, once a given AP is seen in a
neighborhood trace, when it is subsequently detected the
application-level tests are not re-run, but rather the cached test
results written out to the log.
parent dataumich/virgil (v. 2008-03-28)
traces included umich/virgil/eval_data/warwalk (v. 2008-03-28)

[Trace] umich/virgil/field_study/warwalk (v. 2008-03-28)

top

version v. 2008-03-28
changes
the initial version
bibtex
@MISC{umich-virgil-field_study-warwalk-2008-03-28,
  author = {Anthony J. Nicholson and Yatin Chawathe and Mike Chen and Brian Noble and David Wetherall},
  title = {{CRAWDAD} trace umich/virgil/field_study/warwalk (v. 2008-03-28)}, 
  howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/umich/virgil/field_study/warwalk},
  month = mar,  
  year = 2008
}
					
metadata last modified2008-03-31
summary
War-walking traces collected in Chicago, Illinois for the field study of 
an access point selection system.
derivedfalse
release date2008-03-28
measurement start 2005-04-28
measurement end 2005-05-05
configuration
For the field study portion of our traces, data was collected in three
different neighborhoods of Chicago, Illinois:

The Loop (loop): the central business district. Data was collected
during the day on a busy workday (Wednesday 4 May 2005, from 10:32 am
to 3:47 pm).

Wicker Park (wkpk): a high-density residential neighborhood northwest
of downtown. Due to inclement weather, data was collected in two
different sessions: Thursday 28 April 2005, from 3:40 pm to 5:04 pm,
and Monday 2 May 2005, from 10:32 am to 11:40 am. Different areas of
the neighborhood were probed on the two collection days, but the user
will notice a small number of duplicate APs. We have kept the two
traces separate simply for the benefit of CRAWDAD users who may want
to draw conclusions from the timestamps in the traces.

Evanston (evanston): a suburb and college town, north of the city
limits. Data was collected on Thursday 5 May 2005, from 11:02 am to
12:49 pm.

For all three neighborhoods, we walked a roughly 1/2 square-mile (1.3
square-kilometer) area on the sidewalk (following the street grid
pattern). For the case of Wicker Park, both days of mapping together
cover a 1/2 square-mile area.

This methodology was in no way intended to duplicate a realistic
mobility pattern, but rather to simply "map-out" each neighborhood so
one can draw conclusions in aggregate concerning the quality,
availablity and deployment of wireless connectivity in each instance.
format
The field study data are in the field_study subdirectory. Inside each 
directory, the user will find a schema file, which describes in detail 
the format and proper interpretation of the data files in each dataset.

In the field_study directory, there are two data files for each collection run:

trace.<location>:      human-readable trace output of the field study script.
              Keywords "APSCAN begin" and "APSCAN end" delineate the
              results of a new scan for AP beacons. The test results
              of each AP are then presented in turn, separated by
              lines of "#####".

ap_info.<location>:   comma-separated value file, one line per AP encountered
              in the data collection session. The only information
              not present in this file that is in the trace.* file
              are timestamps for all operations, and the information
              on what groups of AP were discovered at the same time
              (in the same beacon scan set).
              Each line of data in the file maps to these values:
         
         struct ap_db_entry {
            ssid,       AP SSID
        mac_addr,       AP MAC address
        encryption,     is WEP enabled? {ON,OFF}
        bitrate,        Bitrate, in Mb/s, from iwconfig
        linkquality,    link quality, from iwconfig
        signallevel,    signal level, from iwconfig
        noiselevel,     noise level, from iwconfig
        channel,        frequency (GHz) of the AP
        dhcpsuccess,    did AP grant DHCP address? (yes=1, no=0)
        
        /* Note: everything past this point is always 0 if
         * dhcpsuccess is 0 */
        
        rtt,        round-trip-time to reference server (ms)
        /* port tests: 0=closed, 1=open, 2=redirected */
        port_21,        ftp
        port_22,        ssh
        port_23,        telnet
        port_25,        smtp
        port_79,        finger
        port_80,        http
        port_88,        kerberos
        port_115,       sftp
        port_119,       nntp
        port_123,       ntp
        port_135,       rpc
        port_109,       pop-2
        port_110,       pop-3
        port_143,       imap2
        port_194,       irc
        port_201,       appletalk
        port_369,       coda
        port_443,       https
        port_445,       samba
        port_389,       ldap
        port_636,       secure ldap
        port_750,       kerberos
        port_993,       secure imap
        port_994,       secure irc
        port_995,       secure pop3
        port_1080,      socks proxy
        port_1214,      kazaa
        port_1434,      ms sql server
        port_2049,      nfs
        port_2430,      venus (Coda)
        port_3306,      mysql 
        port_5010,      yahoo messenger
        port_5190,      AOL instant messenger
        port_5680,      canna
        port_5800,      vnc
        port_6346,      gnutella
        port_7000,      afs
        /* finally, the bandwidth value */ 
        bw,         bandwidth to reference server (bytes/s)
     };

The trace.* file can be considered the "primary source". We provide the ap_info.* files as a convenience to the user, so that each user of the data need not write the same script to parse out the most useful information.
sanitization
Only the SSIDs and MAC addresses have been altered. Each MAC address
has been mapped from xx:xx:xx:xx:xx:xx format to a string of the form
mac:<neighborhood><uniqid> where uniqid is an increasing value
(starting at 0) for each neighborhood, determined by the order of
appearance in the trace of a given AP.

For example, if the 17th AP seen in the Wicker Park neighborhood had
the MAC address 01:02:03:04:05:06, wherever this value appears in all
log files, it would be replaced with the string "mac:wkpk:0016". The
uniqid namespace is unified across both runs for Wicker Park, despite
the logs remaining in separate files.

If you are interested in the actual MAC addresses (to determine the
popularity of various manufacturers' APs, for example) please contact
us. We can provide such aggregate information without disclosing
individual AP identities.

The anonymized SSIDs are not tied to AP mac address, because many
different APs often use the same mac address (linksys comes to
mind). Instead we use the 32-bit MD5 hash of each SSID string.
We did not anonymize the IP addresses granted by APs because, since these are wireless routers performing NAT, all IP addresses our device were given were private addresses in the 192.168.xx or 10.x.x.x ranges. This therefore leaks no information concerning the actual network endpoint of the AP.
parent dataumich/virgil/field_study (v. 2008-03-28)

[Trace] umich/virgil/eval_data/warwalk (v. 2008-03-28)

top

version v. 2008-03-28
changes
the initial version
bibtex
@MISC{umich-virgil-eval_data-warwalk-2008-03-28,
  author = {Anthony J. Nicholson and Yatin Chawathe and Mike Chen and Brian Noble and David Wetherall},
  title = {{CRAWDAD} trace umich/virgil/eval_data/warwalk (v. 2008-03-28)}, 
  howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/umich/virgil/eval_data/warwalk},
  month = mar,  
  year = 2008
}
					
metadata last modified2008-03-31
summary
War-walking traceset collected in different cities in the United States for
the evaluation of an access point selection system.
derivedfalse
release date2008-03-28
measurement start 2005-07-18
measurement end 2005-09-16
configuration
For our evaluation, data was collected in five different
neighborhoods, of three different cities in the United States. All
timestamps in the datafiles are UTC, so the local times must be
calculated accordingly. Because daylight savings time was in effect,
Ann Arbor was UTC-4, Chicago UTC-5, and Seattle UTC-7.

Neighborhoods:

Chicago Loop (loop): the central business district. Data was collected
during the day on a busy workday (Tuesday, 19 July 2005, 3:30-4:35 pm
local time).

Chicago, Wicker Park (wkpk): a high-density residential neighborhood
northwest of downtown. Data collected on Monday, 18 July 2005,
7:40-9:13 am local time.

Chicago, Evanston (evanston): a suburb and college town, north of the
city limits. Data collected on Monday, 18 July 2005, 11:44 am to 3:20
pm.

Downtown Seattle (seattle): the central business district. Data was
collected on Wednesday, 20 July 2005, 7:18pm until 12:03am on July
21st (five hours later).

Ann Arbor, Michigan: the downtown area. Friday, 16 September 2005,
9:41-10:44 am.

For all three neighborhoods, we walked a roughly 1/2 square-mile (1.3
square-kilometer) area on the sidewalk (following the street grid
pattern).
format
The evaluation data in the eval_data directory. Inside each directory, 
the user will find a schema file, which describes in detail the format 
and proper interpretation of the data files in each dataset.

In the eval_data directory, for each of the five neighborhoods, we provide a
scansets.<neighborhood> file. This file consists of a series of scan
sets. A scan set is defined as the test results for a given set of
APs, whose AP beacons the Virgil daemon detected when searching for a
new AP at a given physical spot.

The first line of each scan set is of the form:

SCAN_SET 3 |2005-07-21_02:19:16.808727

where the "3" denotes this is the third scan set in the neighborhood's
trace, and the remainder of the line is the time instant (in UTC) at
which the scan occured.

The remainder of each scan set consists of a series of lines, where
each line corresponds to an AP in the scan set. Each line is a series
of comma-separated values comprising the test result for the AP in
question:

struct ap_db_entry {
       ssid,        AP SSID
       mac_addr,    AP MAC address
       encryption,  is WEP enabled? {ON,OFF} 
       linkquality, link quality, x/92, from iwconfig
       signallevel, signal level, -x dBm, from iwconfig
       noiselevel,  noise level, -x dBm, from iwconfig
       channel,     frequency (GHz) of the AP
       dhcpsuccess, did AP grant DHCP address? (yes=1, no=0)
       test_results optional test results (described below)
};

If the AP did not grant a DHCP address (dhcpsuccess==0), then the line
terminates with the dhcpsuccess parameter. Otherwise, the next item is
the round-trip-time (RTT) estimate in ms, then the bandwidth estimate
in bytes/sec. Finally, there is a sequence of tuples (port,status),
where port is a TCP port number, and status is one of {CLOSED=1,
OPEN=2, REDIRECTED=3}. Note that these constants are different than
those defined in the field study dataset.
sanitization
Only the SSIDs and MAC addresses have been altered. Each MAC address
has been mapped from xx:xx:xx:xx:xx:xx format to a string of the form
mac:<neighborhood><uniqid> where uniqid is an increasing value
(starting at 0) for each neighborhood, determined by the order of
appearance in the trace of a given AP.

For example, if the 17th AP seen in the Wicker Park neighborhood had
the MAC address 01:02:03:04:05:06, wherever this value appears in all
log files, it would be replaced with the string "mac:wkpk:0016".

If you are interested in the actual MAC addresses (to determine the
popularity of various manufacturers' APs, for example) please contact
us. We can provide such aggregate information without disclosing
individual AP identities.

The anonymized SSIDs are not tied to AP mac address, because many
different APs often use the same mac address (linksys comes to
mind). Instead we use the 32-bit MD5 hash of each SSID string.
parent dataumich/virgil/eval_data (v. 2008-03-28)

[Author] Anthony J. Nicholson

top

emailtonynich@eecs.umich.edu
institutionUniversity of Michigan, Ann Arbor
departmentDepartment of Electrical Engineering and Computer Science
positionPh.D. student
addressSoftware Systems Laboratory, 2260 Hayward Ann Arbor, MI 48109-2121, USA
web site http://www.eecs.umich.edu/~tonynich/
related data/toolsumich/virgil (v. 2008-03-28)

[Author] Yatin Chawathe

top

email
institutionIntel Research Seattle
positionResearcher
related data/toolsumich/virgil (v. 2008-03-28)

[Author] Mike Chen

top

emailmike@ludic-labs.com
institutionLudic Labs
web site http://www.mikechen.com/
related data/toolsumich/virgil (v. 2008-03-28)

[Author] Brian Noble

top

emailbnoble@eecs.umich.edu
institutionUniversity of Michigan, Ann Arbor
departmentDepartment of Electrical Engineering and Computer Science
positionAssociate Professor
addressUniversity of Michigan, 1301 Beal Ave -- EECS 2245, Ann Arbor, MI 48109-2122
web site http://www.eecs.umich.edu/~bnoble/
related data/toolsumich/virgil (v. 2008-03-28)

[Author] David Wetherall

top

emaildjw@cs.washington.edu
institutionUniversity of Washington
departmentDepartment of Computer Science and Engineering
positionAssociate Professor
addressDepartment of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA 98195-2350
phone206-616-4367
fax206-616-3804
web site http://www.cs.washington.edu/homes/djw/
related data/toolsuw/sigcomm2004 (v. 2006-10-17)
umich/virgil (v. 2008-03-28)

[Paper] nicholson-access-point

top

category inproceedings
authorsAnthony J. Nicholson
Yatin Chawathe
Mike Y. Chen
Brian D. Noble
David Wetherall
titleImproved access point selection
booktitleMobiSys 2006: Proceedings of the 4th international conference on Mobile systems, applications and services
year2006
pages233-245
locationUppsala, Sweden
download urlhttp://doi.acm.org/10.1145/1134680.1134705
publisherACM Press
addressNew York, NY, USA
keywordsmeasurement
keywordswireless
keywordsumich/virgil
keywordscrawdad
related data/toolsumich/virgil