January 28, 2011

Synchronizing youtube user's videos filtered by tags

Today I will write about synchronizing your app with youtube and then filtering movies of specific user by tags. But let's get back to the beginning. I assume you already have django-syncr installed. I also assume that you're using video as attribute of some other object. Basic usage is pretty straightforward, so I'll proceed to some examples.

First the easiest one - we provide id of video as a string and as a result we receive Video instance :

from django-syncr.youtube import YoutubeSyncr

def sync_video_id(id):
    """ Synchronize single video by given YoutubeID    
    Takes youtube id as a string. Returns Video instance.   
    """
    
    y = YoutubeSyncr()
    try:
        video = y.syncVideo(id)
        return video
    except:    
        return video

That was easy. Let's try something more complicated then. Next there's a synchronization when provided url to video. Using python's built in urlparse module we divide url into components. Because my actual code needed to be compatible with version 2.4 of python, I'm using cgi module here (the same functions just stored in different place). Here's how this looks step by step. Everything is rather self-explanatory so I will not go through it line by line :

>>> url = 'http://www.youtube.com/watch?v=mPXxI1uyVEE&feature=rec-LGOUT-exp_fres
h+div-1r-3-HM'
>>> url
'http://www.youtube.com/watch?v=mPXxI1uyVEE&feature=rec-LGOUT-exp_fresh+div-1r-3
-HM'
>>> import urlparse
>>> url_data = urlparse.urlparse(url)
>>> url_data
ParseResult(scheme='http', netloc='www.youtube.com', path='/watch', params='', q
uery='v=mPXxI1uyVEE&feature=rec-LGOUT-exp_fresh+div-1r-3-HM', fragment='')
>>> query = urlparse.parse_qs(url_data[4])
>>> query
{'feature': ['rec-LGOUT-exp_fresh div-1r-3-HM'], 'v': ['mPXxI1uyVEE']}
>>> id = query["v"][0]
>>> id
'mPXxI1uyVEE'

and that's how we will do this in django :

from django-syncr.youtube import YoutubeSyncr

def sync_video_url(url):
    """ synchronize single video by it's url    
    Takes video url as a string, returns Video object.

    """
    import urlparse
    url_data = urlparse.urlparse(url)
    try:
        query = cgi.parse_qs(url_data[4])
    except:
        query = urlparse.parse_qs(url_data[4])
    id = query["v"][0]
    
    y = YoutubeSyncr()
    try:
        return video = y.syncVideo(id)
    except:    
        return none

When you're finished warm up It's time for more complex functionality. Now we will sync videos of specific users meeting specified tags requirements. First the core function that will do the work here. Because I tried to make it reusable django's ContentType module is used. :

from django.contrib.contenttypes import generic
from django.contrib.contenttypes.models import ContentType
from django-syncr.youtube import YoutubeSyncr

def sync_video_user(parent,model_name,user,tags):
    """ Sync all videos of user matching given tags pattern .
    
    Takes parent object, object to sync videos (as a string), string with username and string with tags.
    Returns list of synchronized videos. Takes care of already existing videos as well as prevents duplication of slugs.
    """
    
    from django.template.defaultfilters import slugify
    
    #first we need to get the actual model for object holding video.
    model_class = ContentType.objects.get(model=model_name).model_class()
    
    #we create a queryset for searching for duplicates of video we're currently worknig on
    queryset = model_class._default_manager.all()
    
    #parse string of tags to get proper validating url string
    fmt_tags = parse_tags(tags)

    #request for youtube feed
    feed = get_youtube_feed_url(user, fmt_tags)
    
    #search for videos parsing returned feed
    sync = search_youtube(feed)

    #create a temporary instance (notice not using save() anywhere here) to get object methods available as well as queryset
    instance = model_class()

    for vid in sync:
        
        #if our queryset does not already contain object with this video id
        if not queryset.filter(**{"yt_video_id": vid.video_id}):

            #check if slug is free
            free = try_slug(instance, slugify(vid.title))
            if not free:
                
                #if not use function which we had already discussed some time ago - unqiue_slugify
                free = unique_slugify(instance, slugify(vid.title))
            else:
                free = slugify(vid.title)

            #create new object
            new = model_class(parent=parent,
                               slug=free,
                               name = vid.title,
                               yt_video_id=vid.video_id,
                               video=vid,
                               publication_date=vid.published, active=True,
                              )
            new.save()

    return 1

Now that we have the main part of the code, we can look on the smaller pieces :

def parse_tags(tags):
    """
    Parse received string of tags to tags list. Then append them to yt search API address.
    Return path to search api.    
    
    parse_tags(tags):
        url(string)        
    """

    tags_list = tags.split(',')
    tags_list = [tag.strip() for tag in tags_list]
    tags_list = [re.sub(r"\s+", "+", tag) for tag in tags_list
    parse_string = '&category=%s&v=2' % '%2C+'.join(tag.strip() for tag in tags_list)

    return parse_string

Not much magic going on here. We split string of tags and use simple regex to get single tags from it (tags are either separated by commas or more than two whitespaces. Multi-words tags allowed). We then put all tags to query url in the proper form.

Next function adds our tags to the rest of the query string containing youtube username :

def get_youtube_feed_url(user, url):
    """ Add username to youtube feed address.
    """
    
    feed = '/feeds/api/videos?author=%s&alt=rss' % user
    feed += url
    return feed

Finally when we have the whole query url we can perform the search using youtube api :

def search_youtube(path):
    """ Search videos on youtube
    
    If path is given, connects to youtube feed returns any videos matching query.
    All videos are then synched.
    
    """
    
    if path:
        import feedparser
        import urlparse
        
        YTSearchFeed = feedparser.parse("http://gdata.youtube.com" + path)
        videos = []
        for yt in YTSearchFeed.entries:

            #the only new part is here. Because returned feed is a really complex json file, we search for 'link' keys that store url to our desired videos       
            url_data = urlparse.urlparse(yt['link'])
            try:
                query = cgi.parse_qs(url_data[4])
            except:
                query = urlparse.parse_qs(url_data[4])
            id = query["v"][0]
            videos.append(id)
        
        synched = []
        for video in videos:
            try:
                sync = sync_video_id(video)
                synched.append(sync)
            except:
                pass
        return synched
    else:
        return []

And basically we're finished.

January 9, 2011

Using ProxyPass with django project

Once again I need to postpone my post about django-syncr and youtube since I came across interesting problem recently. Here's the deal : user has his django app running on some server under some link. Application itself is visible under different link. So apache conf got 'ProxyPass' and 'ProxyPassReverse' added. All seemed to work fine apart of links created with '{% url %}' templatetag (and from what I was able to test also get_absolute_urls could cause problems depending on the method used to create them). Url tag was adding original django application name to the link. Because I'm no good in setting apache I've came up with different solution. After skimming through django/template/defaulttags and django/core/urlesolvers I've realized that rewriting both 'reverse' function and 'url' tag would be tedious and probably unnecessary. Instead I've just added this 'try' block that replaces first unwanted occurence of our app name (which is defined in settings under PROJECT_NAME variable) with empty sign to 'url' templatetag and saved it as custom tag named 'urlc'. Not sure if overriding default tags works, gotta check it some day.

else:
                if self.asvar is None:
                    raise e
        
        #code starts here
        try:
            url = url.replace(settings.PROJECT_NAME+'/', '', 1)
        except:
            pass
        #code ends here    

        if self.asvar:
            context[self.asvar] = url
            return ''

Not sure if it works for OP, but worked at my test config without bigger problems.

January 8, 2011

Provide unique slugs for model class

Sorry my dear readers for this long break between posts, but I have had lots of work lately finishing previous jobs. Recently I've shown you how to synchronize your app with Twitter and also parse list of tweets changing hash tags to proper links. Today I wanted to show you how to synchronize application with youtube but for this purpose I need a way to provide unique slugs for models. So here's the whole code which I'll comment below :

import re

from django.template.defaultfilters import slugify

def try_slug(instance, slug):
    """Check if the slug is free for corresponding model instance
    """
    queryset = instance.__class__._default_manager.all()
    if not queryset.filter(**{"slug": slug}):
        return True
    else:
        return False

def unique_slugify(instance, value, slug_field_name='slug', queryset=None, slug_separator='-'):
    """Create unique slug
    
    Creates unique slug across model class.
    
    """
    slug_field = instance._meta.get_field(slug_field_name)
    slug_len = slug_field.max_length

    slug = slugify(value)
    slug = slug[:slug_len]
    slug = _slug_strip(slug, slug_separator)
    original_slug = slug
    
    if queryset is None:
        queryset = instance.__class__._default_manager.all()
    if instance.pk:
        queryset = queryset.exclude(pk=instance.pk)
 
    next = 2
    while not slug or queryset.filter(**{slug_field_name: slug}):
        slug = original_slug
        end = '%s%s' % (slug_separator, next)
        if slug_len and len(slug) + len(end) > slug_len:
            slug = slug[:slug_len-len(end)]
            slug = _slug_strip(slug, slug_separator)
        slug = '%s%s' % (slug, end)
        next += 1
    return slug

def _slug_strip(value, separator='-'):
    """
    Cleans up a slug by removing slug separator characters that occur at the
    beginning or end of a slug.
    """
    separator = separator or ''
    if separator == '-' or not separator:
        re_sep = '-'
    else:
        re_sep = '(?:-|%s)' % re.escape(separator)

    if separator != re_sep:
        value = re.sub('%s+' % re_sep, separator, value)

    if separator:
        if separator != '-':
            re_sep = re.escape(separator)
        value = re.sub(r'^%s+|%s+$' % (re_sep, re_sep), '', value)
    return value

So now some explanations to the code. First function gives us possibility to check if slug is free for specific model class. It takes object instance and slug and returns boolean value. Inside the code we take default 'all' manager of the model and filter queryset with slug.
Second function is the key to solving our problem. The unique_slugify function takes shown parameters. At first we check the max length of the slug field, so that we won't exceed its value. Then we slugify our slug with built in django function, cut it if needed and strip of sepparators different than '-' or 'separator' parameter received by function. Next step is to get the queryset on which we will be working, excluding our model instance. Finally the fun begins. In our while loop condition we check existence of newly created slug or result of filtering our queryset. If both are none we take the separator and add next integer to it.
If the length of concatenated slug and identifier is too long, we cut it off. Finally we're creating slug that will be used in while loop condition filter.
Last function is stripping our slug by separators appearing at the beginning and end of slug. Because it uses many regexes I will leave in your hands work of decyphering it :)

And now for some short sample usage :

free = try_slug(self.instance, self.cleaned_data['slug'])
if not free:
    self.cleaned_data['slug'] = unique_slugify(self.instance, self.cleaned_data['slug'])

December 9, 2010

Parsing tweets links, users and hash tags with django and python-twitter.

Hello again. Today I'm going to continue my previous post concerning django and twitter. Last time we've finished after synchronising with twitter. This time I'm gonna show you how to parse list of received tweets for rendering it in our template so that links,hash tags and users mentions are treated as clickable links. So here's the code, I've added some comments to clarify what we're doing currently:

def parse_tweets(tweets):
    import re
    
    result = []
    for tw in tweets:
        tweet = tw.text;
        #compile regexes as objects
        hash_regex = re.compile(r'#[0-9a-zA-Z+_]*',re.IGNORECASE) 
        user_regex = re.compile(r'@[0-9a-zA-Z+_]*',re.IGNORECASE)
        
        #first deal with links. Any http://... string change to a proper link
        tweet = re.sub('http://[^ ,]*', lambda t: "%s" % (t.group(0), t.group(0)), tweet)
        
        #for all elements matching our pattern...
        for usr in user_regex.finditer(tweet):
            
            #for each whole match replace '@' with ''
            url_tweet = usr.group(0).replace('@','')

            #in tweet's text replace text with proper link, now without '@'
            tweet = tweet.replace(usr.group(0),
                ''+usr.group(0)+'')

        #do the same for hash tags
        for hash in hash_regex.finditer(tweet):
            url_hash = hash.group(0).replace('#','%23')
            if len ( hash.group(0) ) > 2:
                tweet = tweet.replace(hash.group(0),
                    ''+hash.group(0)+''); 

        #append each tweet to our list. Here I'm creating a dict containing tweet's text and date
        posts.append({'tweet': tweet, 'date': s.pub_time})   
    return posts

And now to show it in template :

    {% for t in tweets %}

  • {% for k, v in t.items %}
    {% ifequal k 'summary' %}
    {{ v|safe }}
    {% else %}
    {% ifequal k 'date' %}
    {{v|timesince}} ago
    {% endifequal %}
    {% endifequal %}
    {% endfor %}
  • {% endfor %}
Probably there are better ways of doing this, but this one works for me. If you have any questions just write a comment or follow me via twitter

December 8, 2010

Check object's parameter value in django template

I'm just finishing probably the most complexed project I'll ever build in django, at least by myself. During the development I've encountered numerous problems, some of without satisfying solutions on the web. So now for the next few posts I will share the results of my research, thinking and developing. Topic for today is :

How to check value of object's parameters in a template ?

Why do I ever need to do this kind of thing you may think. Yeah that's what I thought till now. But just think of my exact example - We have a site with some objects, that people can vote on. If someone has already voted we show him the result - if not, we render a voting button. Things are easy when rendering static page that is refreshed after each vote. But what if we need (just as I did) it to be dynamic ? I'm pretty sure that this problem can be solved using custom templatetag. But I've thought about a simpler solution using filters. So let's start.
I have created a context processor storing IP address of user. In my template I'm returning list of objects in a for loop, and on each object I'm using my filter, with IP as parameter:

{{ submission|check_voting:IP_ADDRESS|safe }}

And now the filter part :

@register.filter
def check_voting(obj, ip):
    result = obj.check_vote(ip)
    if result:
        result = "
%s votes
" % obj.votes else: result = "" % (settings.SITE_URL, obj.get_absolute_url() ) return result

As you can see my filter takes two arguments - object on which it was called, and ip. Then I'm doing some processing on my object's method 'check_vote' that returns me boolean value depending on query result for vote by ip. According to it's value I'm either showing a facebook like button used for voting, or just plain vote counter.
Hope you'll find this useful.
stat4u