June 1, 2007
Generating topic detection training corpora from social bookmarking sites

Fall 2006 with Chris Harman. Advisor: Rich Wicentowski

This project focused on generating training/testing documents (corpora, for those in the know) for automated tagging using the social bookmarking site del.icio.us.

Social bookmarking sites provide a wealth of heavily cross-referenced tagging data. However, only a very particular subset of the Web gets added to these sites. So our motivation is to build an engine that leverages the human-verified data (namely, tagged sites) to tag novel text (the rest of the internet). We generated a sizable corpus of about 19,000 documents for about 1,000 different tags and trained topic detection algorithm that uses latent semantic analysis.

Download: paper (PDF)

8:00pm  |   URL: http://tmblr.co/Zn_4by9Tph0
Filed under: nlp academic 
June 1, 2007
Parallel interpolation of elevation grids

Fall 2006 with Scott Blaha. Advisor: Andy Danner

Real life geographic elevation data comes in three-dimensional point clouds, meaning data is not aligned along a grid or even has uniform distribution. Geographic Information Systems take elevation grids for most processing tasks (viewshed computation, watershed computation, flow routing etc.). The natural way of getting grids from point clouds is interpolation. However, interpolation is extremely computationally intensive and GIS data sets are getting bigger by the second.

Thankfully interpolation is RP (ridiculously parallelizable), which can theoretically give us nx speed up in n-way parallelization. We implemented a few interpolation algorithms, parallelize them and observed the improvement in performance. Programmed in C using LAM/MPI.

Download: paper (PDF)

8:00pm  |   URL: http://tmblr.co/Zn_4by9TpOZ
Filed under: academic parallel mpi 
June 1, 2007
Audit logs for computer security monitoring

Summer ‘05 with Ben Kuperman

During an undergraduate research fellowship at Swarthmore College I worked with Professor Kuperman on expanding a preliminary system written for Sun Solaris. We first ported it over to Debian Linux. Afterwards, I redesigned and reimplemented the log recording mechanism with easily synchronizable log entry commit semantics, and wrote a highly modular log reader/processor from scratch.

I wrote and presented a poster on this project for Sigma Xi September ‘05.

Download (PDF’s): abstract and summary or poster

5:53pm  |   URL: http://tmblr.co/Zn_4by9TdKI
Filed under: academic security 
June 1, 2007
Integrating Moodle with Marratech Video Conferencing

Summer ‘06 at Southwestern University for NITLE

This work was done with a team of nine other undergraduates. We wrote a module for the Moodle Course Management System that interfaces with the Marratech Video Conferencing Suite to allow seamless video conferencing from within moodle. This project involved Java EE programming on the Marratech side and PHP and MySQL on the Moodle side.

I presented a poster on this project for Sigma Xi September ‘06.

Download (PDF): abstract and summary or poster.

June 1, 2007
Distributed execution systems: to centralize or not to centralize

Spring ‘06 with Javier Prado. Advisor: Tia Newhall

We wrote two distributed execution systems, one centralized and one decentralized. We tested the performance of these systems under different experimentally induced conditions.

Download: final paper (PDF)

June 1, 2007
The ultimate RISC computer

Spring ‘05 with Adem Kader. Advisor: Bruce Maxwell

We wrote a One Instruction Set Computer in VHDL and developed a simple assembly language for it with a virtual machine and an assembler written in Java 5. Adem keeps the final write-up on his site. Documentation for the compiler and virtual machine generated with Javadoc can be found here.

Liked posts on Tumblr: More liked posts »