CRAWDAD metadata: nus/contact (v. 2006-08-01)

The authors obtained the contact patterns among 22341 students, which were inferred from the information on class schedules and class rosters for the Spring semester of 2006 in National University of Singapore.
[xml metadata]

Note: This metadata was prepared by the CRAWDAD team and verified by the data set (or tool) authors. We have made every effort to ensure its accuracy, but urge all users to consider the metadata and data carefully and be sure that their use in research is consistent with the nature and limitations of the data. We welcome any corrections. This metadata was prepared based on the following reference(s):


CRAWDAD metadata structure[what is CRAWDAD metadata]


[Dataset] nus/contact (v. 2006-08-01)

top

version v. 2006-08-01
changes
the initial version
bibtex
@MISC{nus-contact-2006-08-01,
  author = {Vikram Srinivasan and Mehul Motani and Wei Tsang Ooi},
  title = {{CRAWDAD} data set nus/contact (v. 2006-08-01)}, 
  howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/nus/contact},
  month = aug,  
  year = 2006
}
					
metadata last modified2006-11-09
summary
The authors obtained the contact patterns among 22341 students, which were inferred from the information on class schedules and class rosters for the Spring semester of 2006 in National University of Singapore.
release date2006-08-01
measurement start 2006-01-09
measurement end 2006-05-06
authorsVikram Srinivasan
Mehul Motani
Wei Tsang Ooi
web site http://www.comp.nus.edu.sg/~ooiwt/papers/contact-mobicom06-data/
keywordsocial network, DTN
measurement purposesUser Mobility Characterization
network typeDTN (Delay Tolerant Network)
network typesocial network
environment
We collected information on class schedules and class rosters
for the Spring 2006 semester in which there were 22341 enrolled.
Our university, the National University of Singapore, has different
colleges (e.g., Engineering, Science, Law) and within each college
there are departments (e.g., Electrical and Computer Engineering,
Computer Science). Every department offers graduate and undergraduate
degrees, and face to face classes are an integral part of
these programs. Many classes also have labs and recitations associated
with them. For large classes, there are several recitation sessions
offered and students sign up for the recitation session which
is most convenient to them. The same goes for the labs. At this
time of writing, all lessons are conducted on the main campus of
the university at Kent Ridge, spanning an area of 146 hectares.
network
Our insight is that accurate information of human contact patterns
is available in several special scenarios such as university
campuses. If one knows the class schedules and enrollment of
students for each class on a campus, it gives us extremely accurate
information about contact patterns between students over large
time scales. We obtain this information about student enrollment 
and class schedules from our university. 

We can now describe how we infer the contact patterns among
students inside classrooms. The rule is simple - 
two students are in contact with each other if and only if they are
in the same venue at the same time. In other words, we assume
that as long as two students are in the same classroom, they are
within Bluetooth range of each other. This assumption has been
validated inside large classrooms on our campus. We also assume
that two students who are in different classrooms are out of range
of each other, even if one classroom is just next door to the other.
We further assume that contacts take place only during business
hours, and ignore that fact that students hang around campus for
various activities after hours. We note that the last two assumptions
are conservative - the number of contacts we obtained is a lower
bound of the actual contacts that take place on campus.

The contact patterns among students that we obtained through
the procedure above, give us human contact patterns. From these
contact patterns, we can infer contact patterns between mobile devices
and explore hypothetical questions about the performance of
algorithms.
collection
Our university has a central Intranet portal for teaching, called
Integrated Virtual Learning Environment (IVLE). The Intranet portal
hosts a web site for every class that is taught on campus. Professors
manage the web site for their respective classes and post lecture notes, 
quizzes, solutions etc. on their class web site. Information
about students enrolled and the schedule for the class is
posted on the web site for each class. We wrote a Perl script to
harvest this data.
For each student we stored information about the classes he
was registered for, the start and end time of the class and its venue.
sanitization
We anonymize the identity of the students using MD5.
tracesets included nus/contact/sessions (v. 2006-08-01)

[Traceset] nus/contact/sessions (v. 2006-08-01)

top

version v. 2006-08-01
changes
the initial version
bibtex
@MISC{nus-contact-sessions-2006-08-01,
  author = {Vikram Srinivasan and Mehul Motani and Wei Tsang Ooi},
  title = {{CRAWDAD} trace set nus/contact/sessions (v. 2006-08-01)}, 
  howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/nus/contact/sessions},
  month = aug,  
  year = 2006
}
					
metadata last modified2006-10-17
summary
The authors obtained the contact patterns among 22341 students, which were inferred from the information on class schedules and class rosters for the Spring semester of 2006 in National University of Singapore.
release date2006-08-01
measurement start 2006-01-09
measurement end 2006-05-06
measurement purposesUser Mobility Characterization
methodology
For each class, we obtained the sessions associated with the class, 
and the students enrolled in the class. A session can be of a certain 
type, for instance, a lecture session, a recitation session or 
a laboratory session. A class can have multiple sessions of each type. 
Sessions of the same type can be grouped into a session group. 
For instance, a class may hold two lecture sessions (delivering different content) 
in a week for the same set of students. Both these lecture sessions are 
said to belong to the same session group. On the other hand, a class 
with large number of students, may hold two lecture sessions 
(delivering the same content) in a week for different batches of students. 
These lecture sessions are considered to be in different session groups. 
A student signs up for a session group for each type of session in a
class he is enrolled in, and is expected to attend all sessions within
that session group.

Our Intranet portal does not provide detailed information about
which session group a student has signed up for. To fill in these
details, we randomly assign a student to a session group. To be
more specific, given a student s, for each class c that s has enrolled
in, for each session type t of c, s randomly and independently signs
up for a session group of type t, and attends all sessions of that
session group.

Our random assignment of students to session groups might result
in conflicts - that is, a student might have signed up for two
sessions which are held at the same time. We adopt a simple approach
to deal with such conflicts. If a session group assigned to
a student leads to a conflict, the student is randomly assigned to
another session group of the same type. If it is impossible to resolve
a conflict, the student will not be attending any session group
of that type. In our trace, only 3% of all assignments resulted in
unresolved conflicts.

After both screen scraping 2 and session assignment, we have a
view of which student is attending which session at what time. This
data provides us with in-class activity of a student for a week. We
further simplify the model in several ways. Firstly, most sessions
start on the hour and end on the hour. For the few sessions which
are not, we round up the starting time and ending time of the sessions
to the nearest hour. This simplification allows us to use one
hour as one unit time. Secondly, we "compress" the time by removing
any idle time slots without any active sessions. For example,
suppose the last session of Monday ends at 9pm, and the first session
of Tuesday starts at 8am. If Monday 8pm to 9pm corresponds
to the 10th hour, then Tuesday 8am to 9am is the 11th hour in our
model. This concept is similar to business days, which counts the
number of days excluding weekends and public holidays. We refer
to our compressed time unit as a business hour. By compressing
the time, we can remove any effects introduced by idle hours during
the night and during weekends. For the rest of this paper, when
we use the unit hours, we are referring to business hours. Finally,
class activities which are held every fortnight are assumed to be
held weekly for simplicity.
hole
For a few classes, there are inconsistencies in the way data is stored 
on the class web sites. For example the schedule information is not 
available. Large classes (e.g., > 500 students) have different lecture 
sessions and we do not have information on which lecture sessions 
these students have signed up for. Also, for a given class, we do not 
have information on which students have signed up for which recitation 
and laboratory. We dealt with these issues by defining "session type" 
and "session group" and applying "random assignment" when the information
is not sufficient (see the methodology description above for details).
limitation
The data we obtained from the Intranet portal gives us the session
schedule of students, from which we can infer the contact patterns
of students inside the classrooms. Students, however, are likely
to come into contact with each other outside of class as well. For
instance, at dining halls or libraries. The class schedules and rosters
do not provide us with such information.
parent datanus/contact (v. 2006-08-01)
traces included nus/contact/sessions/spring06 (v. 2006-08-01)

[Trace] nus/contact/sessions/spring06 (v. 2006-08-01)

top

version v. 2006-08-01
changes
the initial version
bibtex
@MISC{nus-contact-sessions-spring06-2006-08-01,
  author = {Vikram Srinivasan and Mehul Motani and Wei Tsang Ooi},
  title = {{CRAWDAD} trace nus/contact/sessions/spring06 (v. 2006-08-01)}, 
  howpublished = {Downloaded from http://crawdad.cs.dartmouth.edu/nus/contact/sessions/spring06},
  month = aug,  
  year = 2006
}
					
metadata last modified2006-10-17
summary
The authors obtained the contact patterns among 22341 students, which were inferred from the information on class schedules and class rosters for the Spring semester of 2006 in National University of Singapore.
derivedfalse
release date2006-08-01
measurement start 2006-01-09
measurement end 2006-05-06
configuration
We obtain the class schedules and class rosters from a university-wide 
Intranet learning portal, and use this information to infer contacts made between
students. This trace contains the contact patterns among 22341 students. 
See the methodology description of the traceset nus/contact/sessions
for more details.
format
The trace data is stored as a text file containing a series of integers.

The first line in the text file consists of two integers. The first integer 
gives the total number of sessions n, and the second integer gives the total 
number of students k.

The subsequent 2n lines give information about sessions, sorted by the start time 
of sessions. Each session is described in two lines. The first line gives four 
integers, giving the start time of a session (in business hours, starting from hour 0), 
the session id i (numbered from 0 to n-1), the number of students in the session s, 
and the duration (in hours). The next line lists sstudent ids (numbered from 0 to k-1) 
of students who registered for session i.
download urlDownload (500 KB txt.gz) from US UK
parent datanus/contact/sessions (v. 2006-08-01)

[Author] Vikram Srinivasan

top

emailelevs@nus.edu.sg
institutionNational University of Singapore
departmentDepartment of Electrical and Computer Engineering
positionAssistant Professor
addressE4-05-11, 4 Engineering Drive 3, Singapore 117576
phone+65-6874-5569
fax+65-6875-1103
web site http://www.ece.nus.edu.sg/stfpage/elevs/index.htm
related data/toolsnus/contact (v. 2006-08-01)
nus/bluetooth (v. 2007-09-03)

[Author] Mehul Motani

top

emailmotani@nus.edu.sg
institutionNational University of Singapore
departmentDepartment of Electrical and Computer Engineering
positionAssistant Professor
phone+65-6874-6918
fax+65-6779-1103
web site http://www.ee.nus.edu.sg/ee/view1.asp?user=elemm
related data/toolsnus/contact (v. 2006-08-01)
nus/bluetooth (v. 2007-09-03)

[Author] Wei Tsang Ooi

top

emailooiwt@comp.nus.edu.sg
institutionNational University of Singapore
departmentDepartment of Computer Science
positionAssistant Professor
addressSOC1-04-20, National University of Singapore, 3 Science Drive 2, Singapore 117543.
phone+65-6516-4463
fax+65-6779-1610
web site http://www.comp.nus.edu.sg/~ooiwt/
related data/toolsnus/contact (v. 2006-08-01)

[Paper] srinivasan-contact

top

category inproceedings
authorsVikram Srinivasan
Mehul Motani
Wei Tsang Ooi
titleAnalysis and Implications of Student Contact Patterns Derived from Campus Schedules
booktitleProceedings of the Twelfth Annual International Conference on Mobile Computing and Networking (MobiCom)
month--09--
year2006
addressLos Angeles, CA
download urlhttp://www.comp.nus.edu.sg/~ooiwt/papers/contact-mobicom06-final.pdf
abstract
Characterizing mobility or contact patterns in a campus environment is of 
interest for a variety of reasons. Existing studies of these patterns can be 
classified into two basic approaches - model based and measurement based. The 
model based approach involves constructing a mathematical model to generate 
movement patterns while the measurement based approachmeasures locations and 
proximity of wireless devices to infer mobility patterns. In this paper, we 
take a completely different approach. First we obtain the class schedules and 
class rosters from a university-wide Intranet learning portal, and use this 
information to infer contacts made between students. The value of our approach 
is in the population size involved in the study, where contact patterns among 
22341 students are analyzed. This paper presents the characteristics of these 
contact patterns, and explores how these patterns affect three scenarios. We 
first look at the characteristics from the DTN perspective, where we study 
inter-contact time and time distance between pairs of students. Next, we 
present how these characteristics impact the spread of mobile computer viruses, 
and show that viruses can spread to virtually the entire student population 
within a day. Finally, we consider aggregation of information from a large 
number of mobile, distributed sources, and demonstrate that the contact 
patterns can be exploited to design efficient aggregation algorithms, in which 
only a small number of nodes (less than 0.5%) is needed to aggregate a large 
fraction (over 90%) of the data.
keywordsmeasurement
keywordswireless
keywordsnus/contact
keywordscrawdad
related data/toolsnus/contact