Friday, December 12, 2008

Database Design Final Project

Yesterday, I took my last final exam for my Bachelor's of Science in Computer Science from Utah State University. I have been working at Novell for the last 2 years (that went quickly) and I'm glad they were willing to hire the unlearned me and reimburse my tuition and text book costs for the last bit of my degree. Anyway, that's not what this blog is about...

I finished my "Database Design" final project this Monday. Normally I don't blog about things like this, but I thought this particular case was kind of neat. The project requirements were to provide an application and database that allows the following use case using some provided USGS text-file data (and some other data):
The user searches the database by college/university name or zip code. Then, the database returns the information on the nearest five nearest features for each feature type (e.g., colleges/universities, towns, airports, hospitals, streams, etc.) selected by the user with each feature's name, rank, city, state, zip code, and distance.
I liked the open-endedness of the problem; the only restriction I can think of was that we were restricted to (PostgreSQL, MySQL, or SQL Server).

My idea was to first create a Python command-line interface that prints output to a terminal and optionally writes output in XML, and then create a front-end GUI using PyGTK that executes the CLI program, parses its XML, and displays it in a nice GUI environment.

The point is that I was able to do a major final project for a major university using 100% free open source software. Here is a list of the software that I used and what it was used for. Thanks to everyone who made it possible! Behold the power of open source!
  • mdbtools - to extract some needed data from a Microsoft Access database file
  • PostgreSQL (including psql and pgadmin3) - to store and manage the data
  • PostGIS - support for geographic objects to PostgreSQL
  • openSUSE 11.0 - to operate my system
  • PostgreSQL Autodoc - to generate documentation and diagrams for my database
  • OpenOffice - to write my final report and summary.
  • Python - to code
  • Gtk+ and PyGTK - to code the GUI
  • The Gimp - to create a GUI model and take screenshots for my write-up
  • Fluxbox - to manage my windows
  • GNOME - duh!
  • Glade - to design my GUI
  • Firefox - to display my XML using XSLT and CSS
  • PyGreSQL - to interface with Postgres
  • vim - to write the code
  • Evince - to read the assignment
  • Others I'm sure, including all of the GNU core utils we always take for granted ;)
I never had to turn on a Windows machine. I didn't have to port it to Windows or even test it on Windows like I have had to do several times in the past. We were even asked to submit the write-up in OpenOffice format. I'm sure many of you have done this before and I'm happy for you, but having been a Linux user for 11 years now, I am very pleased to see its evolution (in most respects) and its acceptance. The future is definitely open.

Being a Database Design class, the bulk of the work involved getting the appropriate data into the database (I used PostgreSQL), getting the database normalized, adding constraints, etc. This left me with basically two days to hack together the application. With the limited time I had, I didn't quite get the CLI and the GUI as integrated as I would have liked, and there is some weirdness in the GUI code, but that's okay; I was pretty happy with the results:



Here are a couple of other screenshots:
And here's the code and database diagram if anyone cares. It's only useful for educational purposes, because we weren't required to determine which zip code the features are actually *in*, we only had to determined the nearest zipcode centroid to the latitude/longitude point of a feature. This means zip codes, cities, and possibly states can be wrong. The didn't get the GUI as complete as I had hoped, the idea was to parse the XML results and display the actual results in the right pane of the GUI; however, I ran out of time so I simply dumped the path to the XML output file to the GUI instead. I'm sure there are lots of other bugs and hacks, etc. etc. I don't think I normalized the entire database either.