The Future of Possibilities: feedorific

Feedorific design, part 1: Feedreader

Here are my thoughts on the design of Feedorific, Django-version:

The various apps will be:

Feedreader -
Accepts and stores feeds from users, gets the xml, parses out and displays entries and descriptions. Feeds entered will also be stored to DB.
Structureparser -
Parses stored feeds for their structure.
Contentparser -
Parses stored feeds for their content.
Organization -
Very unsure about this last section's design. It will display the fully-parsed feeds, allow searching, tagging, etc. This may also be integrated with a visually-organized display of feed articles by content.

Here is what I have on the design of the feedreader element so far:

Django to the rescue

I found myself having much difficulty getting what I had of the feedparser setup in such a way that I could return html code that looked good. Also, there are no real tutorials, since no one seems to just use Python and html. It seems a lot more common to use some templating language or some other framework to link everything.

So, I decided to rewrite it all with Django. I don't know Django very well, but it seems powerful enough without having to be another enterprise level CMS that I don't need. It is also well-documented, Python-centric, and designed for fast-paced development.

Feed Parsing, Attempt 1

I wrote a basic program tonight which accepts an RSS feed, displays its entries, and gives the user the option to view a given entry's description. It's not very complex, but it allowed me to figure out some basic Python stuff, and I am pleased with how well it works.
Things to add:

Escape out html in the description or display it differently.
Make it work on immortalcuriosity.com instead of just the command line.
Store the feeds to a database and give the option to refresh data for a feed previously entered.

I think once I get these 3 completed, I will have a good start on the first component of my system.

Oh, here's the code:

# A test program to learn about feedparser. It accepts a
# feed, displays its entries, and gives the option to
# display a given entries description. At least works with
# Slashdot and KurzweilAI feeds.

import feedparser

# Get the feed to parse
uri = raw_input("Please enter the feed to be parsed: ")

# Grab the feed
current = feedparser.parse(uri)

# Parse the feed
title = current.feed.title
description = current.feed.description

# Print data on the feed
print
print
print uri + " aka " + title + " is described by its owner as: "
print description + "."
print

# Store entry titles and print them
print "The current items are: "
entrylist = []
for entry in current.entries:
entrylist.append(entry.title)
bullet = 1
for x in entrylist:
print str(bullet) + "- " + x
bullet+=1
print

# Store item descriptions
entrydescs=[]
for desc in current.entries:
entrydescs.append(desc.description)

# See if any additional data is desired
contin = raw_input("Would you like to view any of those (Y or N)?: ")

# Find the item and print its description
if contin == "Y":
checkme = raw_input("Ok, which item do you want to view? ")
print
print entrylist[int(checkme)-1] + ": "
print entrydescs[int(checkme)-1]

Helpful Components

I discovered 2 helpful Python modules that will most likely prove very useful in my parsing project.

Feedparser - This is a Python module which can parse a wide variety of the most common syndication formats. It is well documented, and seems well suited for the component I will need to take a feed and parse it according to fixed components.
pyparsing - This module allows for the creation of grammars directly in Python code.

I also came across a powerful new technique for extracting information from text: text-mining. Instead of the tedious formation of grammars and topics through supervised learning, this technique uses "topic modeling" to form topics and appropriate divisions based on a system of combinations of words which are common.

Picture!

Although there may be different components added later, here is a basic diagram of the major components of the feed parsing system I aim to create.

The Future of Possibilities

9.05.2006

Feedorific design, part 1: Feedreader

Django to the rescue

Feed Parsing, Attempt 1

Helpful Components

Picture!

My Sites

About Me

Growth

Tags

Blog Archive