Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

9.05.2006

Feedorific design, part 1: Feedreader

Here are my thoughts on the design of Feedorific, Django-version:

The various apps will be:
  • Feedreader -
    Accepts and stores feeds from users, gets the xml, parses out and displays entries and descriptions. Feeds entered will also be stored to DB.
  • Structureparser -
    Parses stored feeds for their structure.
  • Contentparser -
    Parses stored feeds for their content.
  • Organization -
    Very unsure about this last section's design. It will display the fully-parsed feeds, allow searching, tagging, etc. This may also be integrated with a visually-organized display of feed articles by content.
Here is what I have on the design of the feedreader element so far:

Feedreader design

Django to the rescue

I found myself having much difficulty getting what I had of the feedparser setup in such a way that I could return html code that looked good. Also, there are no real tutorials, since no one seems to just use Python and html. It seems a lot more common to use some templating language or some other framework to link everything.

So, I decided to rewrite it all with Django. I don't know Django very well, but it seems powerful enough without having to be another enterprise level CMS that I don't need. It is also well-documented, Python-centric, and designed for fast-paced development.

Pythonic Discoveries, Part 2

While working on the feed parsing program, I wanted to display a numbered list. In C++, I would have used an incrementing variable and thought nothing of it. But I found this a little more complicated in Python. When I tried to simply set a variable and then display it along with an item in the list I was iterating through, I got an error that an int and a string cannot be concatenated. So I ended up doing it thus:

entrylist = []
for entry in current.entries:
entrylist.append(entry.title)
bullet = 1
for x in entrylist:
print str(bullet) + "- " + x
bullet+=1
I don't know if this was a bad hack, and if there is an easier way to do this, but it worked. It seem interesting that an increment operator is not built into Python.\

Feed Parsing, Attempt 1

I wrote a basic program tonight which accepts an RSS feed, displays its entries, and gives the user the option to view a given entry's description. It's not very complex, but it allowed me to figure out some basic Python stuff, and I am pleased with how well it works.
Things to add:
  • Escape out html in the description or display it differently.
  • Make it work on immortalcuriosity.com instead of just the command line.
  • Store the feeds to a database and give the option to refresh data for a feed previously entered.
I think once I get these 3 completed, I will have a good start on the first component of my system.

Oh, here's the code:
# A test program to learn about feedparser. It accepts a
# feed, displays its entries, and gives the option to
# display a given entries description. At least works with
# Slashdot and KurzweilAI feeds.

import feedparser

# Get the feed to parse
uri = raw_input("Please enter the feed to be parsed: ")

# Grab the feed
current = feedparser.parse(uri)

# Parse the feed
title = current.feed.title
description = current.feed.description

# Print data on the feed
print
print
print uri + " aka " + title + " is described by its owner as: "
print description + "."
print

# Store entry titles and print them
print "The current items are: "
entrylist = []
for entry in current.entries:
entrylist.append(entry.title)
bullet = 1
for x in entrylist:
print str(bullet) + "- " + x
bullet+=1
print

# Store item descriptions
entrydescs=[]
for desc in current.entries:
entrydescs.append(desc.description)

# See if any additional data is desired
contin = raw_input("Would you like to view any of those (Y or N)?: ")

# Find the item and print its description
if contin == "Y":
checkme = raw_input("Ok, which item do you want to view? ")
print
print entrylist[int(checkme)-1] + ": "
print entrydescs[int(checkme)-1]

Helpful Components

I discovered 2 helpful Python modules that will most likely prove very useful in my parsing project.
  • Feedparser - This is a Python module which can parse a wide variety of the most common syndication formats. It is well documented, and seems well suited for the component I will need to take a feed and parse it according to fixed components.
  • pyparsing - This module allows for the creation of grammars directly in Python code.

I also came across a powerful new technique for extracting information from text: text-mining. Instead of the tedious formation of grammars and topics through supervised learning, this technique uses "topic modeling" to form topics and appropriate divisions based on a system of combinations of words which are common.

Picture!

Although there may be different components added later, here is a basic diagram of the major components of the feed parsing system I aim to create.

Pythonic discoveries, Part 1

The "item1.function(variable)" form in Python was confusing me, until I got something to work in the interpreter. I made a list:
>>> countries = ['USA', 'Russia', 'Cuba', 'Iceland', 'Greenland', 'Atlantis'])
and wanted to add 'France' to it. Trying "append('France')" did not work, as the append() function did not know where to act. However, "countries.append('France')" did work. I realized I had been taking the "x.y" notation as something more complex than it actually is. It can more easily understood after thinking about List Comprehension. List Comprehension takes an expression and applies a for conditional within it, followed by zero or more for and if conditionals. Thus in:
>>> num = [2, 4, 6]
>>> [3*x for x in num]
[6, 12, 18]
>>> [3*x for x in num if x > 3]
[12, 18]
the expression "3*x" is applied to each term x in the list "num".

When something is imported, say a module called "fruit", then one can say "import fruit", and "fruit.peel()" (assuming that peel was defined in "fruit"), the same x.y() form. The "x.y" means: "do, or look for, y in x, or in the context of x."