2.25.2007

Now for Something Completely Different

I am starting a new blog, "#tail -f findings.out", which can be found here. I have collected various gotchas and helpful tips during my time working in information technology, and I hope to share the more useful such items on the new blog. I wanted to separate out such things from more theoretical ponderings found in The Future of Possibilities, as they may not be of interest to all.

12.31.2006

Personal Photography 2.0

I have been considering solutions to the data overload problem that I mentioned in my last post. For now, I will hold off on dealing with paper documents and the issue of multiple online content repositories. Concerning digital images, however, here is an example of what I would consider a highly optimized system which could be created using current technology.

The image capture device itself need not be any different than current digital cameras except in a single essential and several optional aspects. The essential aspect is wireless data transfer capability. As soon as a picture is taken (and perhaps pending a 5-20 second waiting period to allow for deletion of images locally that are not wanted), it should be transferred wirelessly and automatically through the internet to either a personal computer or, perhaps preferably, to a hosted server or service. The latter would likely be better since a hosted service would have a higher percentage of uptime (unless the individual has a very stable machine running with a UPS and their ISP never goes down). Now even at this point, before any of the "magical", i.e. extremely helpful and cool, elements come into play, a great benefit has been realized. Viz.:
  • If anything happens to the camera or memory stick while out, the data is preserved. From dropping a camera to the memory becoming corrupted, things can happen.
The camera and the memory stick are no longer single points of failure. The data captured will be preserved almost immediately after capture such that it cannot easily be lost.

In addition to this, however, one can imagine many other potential features that could be added to the system. The camera could be fitted with additional sensors, or additional data could be associated with the images captured. Normally, digital cameras simply associate the time of the shot with the file created. But the temperature, say, or weather conditions, or altitude, could also be added as metadata. This would allow for a more coherent whole of the events captured to be constructed later. Of greatest assistance, I think, would be GPS-based location data for each image. This could be incorporated into the wireless data transfer system added to the camera. In this way, a real "trip" could be reconstructed from the images, via a mashup with Google Maps, for instance.

The final elements of this system would be performed by the user's computer or the hosted service, for performance reasons. The simplest step at this point would be to display the images captured in a web portal, associated and organized by their metadata, time, location, etc. In this fashion, one could make images from a vacation or outing available while still out on the trip! And that without having to download them to a laptop and have internet access on it at the time. Or if one wished the images to be kept private, one would minimally already have a photo album waiting upon returning from the trip.

The more complicated, but also more interesting, step now would be to analyze the images for content and provide them with appropriate names. This could utilize preferences set by the user concerning the naming, length, title versus tags, etc. Automated image recognition technology has not been deployed for individual use in many capacities as of yet, but this feature would provide for great strides in personal organization. Instead of sorting through hundreds of images and coming up with names for them such that they can be searched, imagine an automated process which would scan the images, find and assign likely tags or titles, and organize the images into albums divided by topic or by location!

I currently lack the skill to implement the steps of this process, but this entire system could be realized based on presently-existing commercial technology.

12.30.2006

Data Overload

Month by month, I have increasing difficulty with an overload of data. Case in point: I just purchased a digital camera for myself and Anna. It will be great to have a photo-record of our life at this time, trips we take, etc. But, getting that record is one thing (a rather simple thing with the amazing features of current digital cameras). Organizing that record into a form that is at all useful is quite another. We just finished a trip to visit my grandparents over Christmas (along with touring Williamsburg, VA and nearby historic sites), and came back with around 400 images. All dated and of great quality yes, but with useless names, and completely unorganized.

I imported them all onto my computer quickly, no longer worrying about storage space, as I have a relatively immense amount and can obtain more easily. Having gotten even this far is much better than nothing. But, I still need to rename all of the images to useful titles and organize them in folders thematically (or do so with an organizational program), to be near a point where I can repeatedly view and use those images with ease.

This is similar to the problem I face with the piles of papers and notebooks that I have collected over the years and now wish to digitize. Some of these contain thoughts and ideas that either I believe to still hold value in themselves, or which would minimally be instructive in developing an understanding of how my thought progressed to its current state. But to be such, I would need to normalize and order them.

Additionally, the same conundrum arises on an everyday basis, with an annoying additional issue. I now record thoughts and observations in a variety of digital locations, from Google Docs to Yahoo! Notebook, etc. These too must be unified and ordered to be of use. Now, however, that I fully recognize this problem, my thought is hampered by approaching its solution. I spend time thinking about how to better organize and catalog my records, and this takes away from the time I would otherwise spend thinking of and creating those records.

More on possible solutions next.

9.05.2006

Feedorific design, part 1: Feedreader

Here are my thoughts on the design of Feedorific, Django-version:

The various apps will be:
  • Feedreader -
    Accepts and stores feeds from users, gets the xml, parses out and displays entries and descriptions. Feeds entered will also be stored to DB.
  • Structureparser -
    Parses stored feeds for their structure.
  • Contentparser -
    Parses stored feeds for their content.
  • Organization -
    Very unsure about this last section's design. It will display the fully-parsed feeds, allow searching, tagging, etc. This may also be integrated with a visually-organized display of feed articles by content.
Here is what I have on the design of the feedreader element so far:

Feedreader design

Django to the rescue

I found myself having much difficulty getting what I had of the feedparser setup in such a way that I could return html code that looked good. Also, there are no real tutorials, since no one seems to just use Python and html. It seems a lot more common to use some templating language or some other framework to link everything.

So, I decided to rewrite it all with Django. I don't know Django very well, but it seems powerful enough without having to be another enterprise level CMS that I don't need. It is also well-documented, Python-centric, and designed for fast-paced development.

It's (a)live! Feedorific!

Since the old immortalcuriosity.com was all in Plone, after switching to a different server, I had a blank slate to work with. What better to fill this void with than my feedparsing project! So with a little Apache, a little Python, and a little help from my friends, the current form of the feedparsing system is accessible at www.immortalcuriosity.com. A warning: It looks like junk and barely works in IE. It's perfect in Firefox. Sorry, but I just don't care about a bad browser enough to put the hacks in yet. Maybe later.

Pythonic Discoveries, Part 2

While working on the feed parsing program, I wanted to display a numbered list. In C++, I would have used an incrementing variable and thought nothing of it. But I found this a little more complicated in Python. When I tried to simply set a variable and then display it along with an item in the list I was iterating through, I got an error that an int and a string cannot be concatenated. So I ended up doing it thus:

entrylist = []
for entry in current.entries:
entrylist.append(entry.title)
bullet = 1
for x in entrylist:
print str(bullet) + "- " + x
bullet+=1
I don't know if this was a bad hack, and if there is an easier way to do this, but it worked. It seem interesting that an increment operator is not built into Python.\

Feed Parsing, Attempt 1

I wrote a basic program tonight which accepts an RSS feed, displays its entries, and gives the user the option to view a given entry's description. It's not very complex, but it allowed me to figure out some basic Python stuff, and I am pleased with how well it works.
Things to add:
  • Escape out html in the description or display it differently.
  • Make it work on immortalcuriosity.com instead of just the command line.
  • Store the feeds to a database and give the option to refresh data for a feed previously entered.
I think once I get these 3 completed, I will have a good start on the first component of my system.

Oh, here's the code:
# A test program to learn about feedparser. It accepts a
# feed, displays its entries, and gives the option to
# display a given entries description. At least works with
# Slashdot and KurzweilAI feeds.

import feedparser

# Get the feed to parse
uri = raw_input("Please enter the feed to be parsed: ")

# Grab the feed
current = feedparser.parse(uri)

# Parse the feed
title = current.feed.title
description = current.feed.description

# Print data on the feed
print
print
print uri + " aka " + title + " is described by its owner as: "
print description + "."
print

# Store entry titles and print them
print "The current items are: "
entrylist = []
for entry in current.entries:
entrylist.append(entry.title)
bullet = 1
for x in entrylist:
print str(bullet) + "- " + x
bullet+=1
print

# Store item descriptions
entrydescs=[]
for desc in current.entries:
entrydescs.append(desc.description)

# See if any additional data is desired
contin = raw_input("Would you like to view any of those (Y or N)?: ")

# Find the item and print its description
if contin == "Y":
checkme = raw_input("Ok, which item do you want to view? ")
print
print entrylist[int(checkme)-1] + ": "
print entrydescs[int(checkme)-1]

Helpful Components

I discovered 2 helpful Python modules that will most likely prove very useful in my parsing project.
  • Feedparser - This is a Python module which can parse a wide variety of the most common syndication formats. It is well documented, and seems well suited for the component I will need to take a feed and parse it according to fixed components.
  • pyparsing - This module allows for the creation of grammars directly in Python code.

I also came across a powerful new technique for extracting information from text: text-mining. Instead of the tedious formation of grammars and topics through supervised learning, this technique uses "topic modeling" to form topics and appropriate divisions based on a system of combinations of words which are common.

Picture!

Although there may be different components added later, here is a basic diagram of the major components of the feed parsing system I aim to create.