Showing posts with label feedorific. Show all posts
Showing posts with label feedorific. Show all posts

9.05.2006

Feedorific design, part 1: Feedreader

Here are my thoughts on the design of Feedorific, Django-version:

The various apps will be:
  • Feedreader -
    Accepts and stores feeds from users, gets the xml, parses out and displays entries and descriptions. Feeds entered will also be stored to DB.
  • Structureparser -
    Parses stored feeds for their structure.
  • Contentparser -
    Parses stored feeds for their content.
  • Organization -
    Very unsure about this last section's design. It will display the fully-parsed feeds, allow searching, tagging, etc. This may also be integrated with a visually-organized display of feed articles by content.
Here is what I have on the design of the feedreader element so far:

Feedreader design

Django to the rescue

I found myself having much difficulty getting what I had of the feedparser setup in such a way that I could return html code that looked good. Also, there are no real tutorials, since no one seems to just use Python and html. It seems a lot more common to use some templating language or some other framework to link everything.

So, I decided to rewrite it all with Django. I don't know Django very well, but it seems powerful enough without having to be another enterprise level CMS that I don't need. It is also well-documented, Python-centric, and designed for fast-paced development.

Feed Parsing, Attempt 1

I wrote a basic program tonight which accepts an RSS feed, displays its entries, and gives the user the option to view a given entry's description. It's not very complex, but it allowed me to figure out some basic Python stuff, and I am pleased with how well it works.
Things to add:
  • Escape out html in the description or display it differently.
  • Make it work on immortalcuriosity.com instead of just the command line.
  • Store the feeds to a database and give the option to refresh data for a feed previously entered.
I think once I get these 3 completed, I will have a good start on the first component of my system.

Oh, here's the code:
# A test program to learn about feedparser. It accepts a
# feed, displays its entries, and gives the option to
# display a given entries description. At least works with
# Slashdot and KurzweilAI feeds.

import feedparser

# Get the feed to parse
uri = raw_input("Please enter the feed to be parsed: ")

# Grab the feed
current = feedparser.parse(uri)

# Parse the feed
title = current.feed.title
description = current.feed.description

# Print data on the feed
print
print
print uri + " aka " + title + " is described by its owner as: "
print description + "."
print

# Store entry titles and print them
print "The current items are: "
entrylist = []
for entry in current.entries:
entrylist.append(entry.title)
bullet = 1
for x in entrylist:
print str(bullet) + "- " + x
bullet+=1
print

# Store item descriptions
entrydescs=[]
for desc in current.entries:
entrydescs.append(desc.description)

# See if any additional data is desired
contin = raw_input("Would you like to view any of those (Y or N)?: ")

# Find the item and print its description
if contin == "Y":
checkme = raw_input("Ok, which item do you want to view? ")
print
print entrylist[int(checkme)-1] + ": "
print entrydescs[int(checkme)-1]

Helpful Components

I discovered 2 helpful Python modules that will most likely prove very useful in my parsing project.
  • Feedparser - This is a Python module which can parse a wide variety of the most common syndication formats. It is well documented, and seems well suited for the component I will need to take a feed and parse it according to fixed components.
  • pyparsing - This module allows for the creation of grammars directly in Python code.

I also came across a powerful new technique for extracting information from text: text-mining. Instead of the tedious formation of grammars and topics through supervised learning, this technique uses "topic modeling" to form topics and appropriate divisions based on a system of combinations of words which are common.

Picture!

Although there may be different components added later, here is a basic diagram of the major components of the feed parsing system I aim to create.