From zero to code hero: Synchronizing youtube user's videos filtered by tags

Today I will write about synchronizing your app with youtube and then filtering movies of specific user by tags. But let's get back to the beginning. I assume you already have django-syncr installed. I also assume that you're using video as attribute of some other object. Basic usage is pretty straightforward, so I'll proceed to some examples.

First the easiest one - we provide id of video as a string and as a result we receive Video instance :

from django-syncr.youtube import YoutubeSyncr

def sync_video_id(id):
    """ Synchronize single video by given YoutubeID    
    Takes youtube id as a string. Returns Video instance.   
    """
    
    y = YoutubeSyncr()
    try:
        video = y.syncVideo(id)
        return video
    except:    
        return video

That was easy. Let's try something more complicated then. Next there's a synchronization when provided url to video. Using python's built in urlparse module we divide url into components. Because my actual code needed to be compatible with version 2.4 of python, I'm using cgi module here (the same functions just stored in different place). Here's how this looks step by step. Everything is rather self-explanatory so I will not go through it line by line :

>>> url = 'http://www.youtube.com/watch?v=mPXxI1uyVEE&feature=rec-LGOUT-exp_fres
h+div-1r-3-HM'
>>> url
'http://www.youtube.com/watch?v=mPXxI1uyVEE&feature=rec-LGOUT-exp_fresh+div-1r-3
-HM'
>>> import urlparse
>>> url_data = urlparse.urlparse(url)
>>> url_data
ParseResult(scheme='http', netloc='www.youtube.com', path='/watch', params='', q
uery='v=mPXxI1uyVEE&feature=rec-LGOUT-exp_fresh+div-1r-3-HM', fragment='')
>>> query = urlparse.parse_qs(url_data[4])
>>> query
{'feature': ['rec-LGOUT-exp_fresh div-1r-3-HM'], 'v': ['mPXxI1uyVEE']}
>>> id = query["v"][0]
>>> id
'mPXxI1uyVEE'

and that's how we will do this in django :

from django-syncr.youtube import YoutubeSyncr

def sync_video_url(url):
    """ synchronize single video by it's url    
    Takes video url as a string, returns Video object.

    """
    import urlparse
    url_data = urlparse.urlparse(url)
    try:
        query = cgi.parse_qs(url_data[4])
    except:
        query = urlparse.parse_qs(url_data[4])
    id = query["v"][0]
    
    y = YoutubeSyncr()
    try:
        return video = y.syncVideo(id)
    except:    
        return none

When you're finished warm up It's time for more complex functionality. Now we will sync videos of specific users meeting specified tags requirements. First the core function that will do the work here. Because I tried to make it reusable django's ContentType module is used. :

from django.contrib.contenttypes import generic
from django.contrib.contenttypes.models import ContentType
from django-syncr.youtube import YoutubeSyncr

def sync_video_user(parent,model_name,user,tags):
    """ Sync all videos of user matching given tags pattern .
    
    Takes parent object, object to sync videos (as a string), string with username and string with tags.
    Returns list of synchronized videos. Takes care of already existing videos as well as prevents duplication of slugs.
    """
    
    from django.template.defaultfilters import slugify
    
    #first we need to get the actual model for object holding video.
    model_class = ContentType.objects.get(model=model_name).model_class()
    
    #we create a queryset for searching for duplicates of video we're currently worknig on
    queryset = model_class._default_manager.all()
    
    #parse string of tags to get proper validating url string
    fmt_tags = parse_tags(tags)

    #request for youtube feed
    feed = get_youtube_feed_url(user, fmt_tags)
    
    #search for videos parsing returned feed
    sync = search_youtube(feed)

    #create a temporary instance (notice not using save() anywhere here) to get object methods available as well as queryset
    instance = model_class()

    for vid in sync:
        
        #if our queryset does not already contain object with this video id
        if not queryset.filter(**{"yt_video_id": vid.video_id}):

            #check if slug is free
            free = try_slug(instance, slugify(vid.title))
            if not free:
                
                #if not use function which we had already discussed some time ago - unqiue_slugify
                free = unique_slugify(instance, slugify(vid.title))
            else:
                free = slugify(vid.title)

            #create new object
            new = model_class(parent=parent,
                               slug=free,
                               name = vid.title,
                               yt_video_id=vid.video_id,
                               video=vid,
                               publication_date=vid.published, active=True,
                              )
            new.save()

    return 1

Now that we have the main part of the code, we can look on the smaller pieces :

def parse_tags(tags):
    """
    Parse received string of tags to tags list. Then append them to yt search API address.
    Return path to search api.    
    
    parse_tags(tags):
        url(string)        
    """

    tags_list = tags.split(',')
    tags_list = [tag.strip() for tag in tags_list]
    tags_list = [re.sub(r"\s+", "+", tag) for tag in tags_list
    parse_string = '&category=%s&v=2' % '%2C+'.join(tag.strip() for tag in tags_list)

    return parse_string

Not much magic going on here. We split string of tags and use simple regex to get single tags from it (tags are either separated by commas or more than two whitespaces. Multi-words tags allowed). We then put all tags to query url in the proper form.

Next function adds our tags to the rest of the query string containing youtube username :

def get_youtube_feed_url(user, url):
    """ Add username to youtube feed address.
    """
    
    feed = '/feeds/api/videos?author=%s&alt=rss' % user
    feed += url
    return feed

Finally when we have the whole query url we can perform the search using youtube api :

def search_youtube(path):
    """ Search videos on youtube
    
    If path is given, connects to youtube feed returns any videos matching query.
    All videos are then synched.
    
    """
    
    if path:
        import feedparser
        import urlparse
        
        YTSearchFeed = feedparser.parse("http://gdata.youtube.com" + path)
        videos = []
        for yt in YTSearchFeed.entries:

            #the only new part is here. Because returned feed is a really complex json file, we search for 'link' keys that store url to our desired videos       
            url_data = urlparse.urlparse(yt['link'])
            try:
                query = cgi.parse_qs(url_data[4])
            except:
                query = urlparse.parse_qs(url_data[4])
            id = query["v"][0]
            videos.append(id)
        
        synched = []
        for video in videos:
            try:
                sync = sync_video_id(video)
                synched.append(sync)
            except:
                pass
        return synched
    else:
        return []

And basically we're finished.

From zero to code hero

January 28, 2011

Synchronizing youtube user's videos filtered by tags

No comments:

Post a Comment