Ben++

2007年01月


以前小学每年都会拿到三个奖状, 所以我家里就有了个奖状墙, 绝对是我家最亮丽的风景线. 不过高中时家里装修, 而那些奖状都是直接粘到墙上的, 于是奖状墙就消失了.

今天去外婆的老房子看看(我在那长大), 居然发现还有一张, 二年级时的, 15年前, 应该是幸存的最后一张.









Hello I am a Mac, and I was a PC. :)

Well I guess everybody likes the Apple “Get A Mac” series ads, so do I, but I think watching online sucks since the China-US network is not really good. So I wrote a Python program “Get The Ads” to claw all the “Get A Mac” .mov file url.


1. First version


At the beginning I didn’t know where the Apple guys store the file info in, so my program was going in this way:


class AdsParser(SGMLParser):

    def reset(self):
        # extend (called from __init__ in ancestor)
        # Reset all data attributes                        
        SGMLParser.reset(self)
        self.urls = {}

    def start_a(self, attrs):
        # called for every <a> tag in HTML source
        # Find the links
        href = [v for k, v in attrs if k=='href']
        if href and href[0].rfind('.mov')!=-1 :
            l = href[0].rfind('/')+1
            r = href[0].rfind('_')
            name = href[0][l:r]
            self.urls[name] = href[0]

This is the parser class, which extends the SGMLParser. So this class downloads html pages, then parses them, when <a> start tag is found, start_a(attrs) method will be called. attrs is a list storing the attributes in this way:

[('href', '/getamac/works.html'), ('id', 'navmoreswap')]

start_a(attrs) filters the attrs list, find out the link with “.mov” ending, then save into into a dictionary named urls. The purpose of choosing dictionary here is to avoid duplicate urls.


2. Second version


But finally I found Apple guy are storing the “.mov” info in a single xml file, wow, it makes my program much easier, so here is my second version:


#!/usr/bin/env python

__author__ = "Ben Feng(benplusplus#gmail.com)"
__copyright__ = "Copyright (c) 2007 Ben Feng"

import urllib
import sys
import os

class AdsParser:
  
    def __init__(self):
        self.site = ""
        self.urls = []

    def getfile(self):
        # Return the xml source
        try:
            sock = urllib.urlopen(self.site)
            source = sock.read()
            sock.close()
        except:
            print "Can not connect to Apple.com, \
                please check the internet connection."
            sys.exit(2)
        return source
          
    def start(self, site):
        # parse the resource from getfile() method
        self.site = site
        source = self.getfile()
        import re
        self.urls = re.findall('http(?:[^ \n\r\"]+)[.]mov',source)
          
def output(urls):  
    lsize = [ ("HD", "848x496"), ("Large", "640x496"), ("Medium", "480x376"), ("Small", "320x256") ]

    default = "480x376"
    outfile = "output.html"

    fsock = open(outfile, 'w')
    fsock.write("""
    <html>
    <head>
    <title>Get The Ads</title>
    </head>
   
    <body>
    <p>Get The Ads<br>-Ben Feng @ 2007<br>-benplusplus#gmail.com</p>
    """)
    fsock.write("%d Ads" % len(urls))
    for (k, v) in lsize:
        fsock.write("<p><br><br>%s resolution (%s) :<br></p>" % (k, v))
        for link in urls:
            link = re.sub('_(?:[^ /]+)\.', '_'+v+'.', link)
            fsock.write("<a href=%s target=_blank>%s</a><br />" % (link, link))
    fsock.write("</body></html>")
    fsock.close()
    import webbrowser
    s = "file://"+os.getcwd()+"/"+outfile
    webbrowser.open(s)
  
def main():
    parser = AdsParser()
    print "Connecting...Just a second"
    parser.start("http://www.apple.com/getamac/ads.xml")
    output(parser.urls)
    print "Finished.\n"
          
if __name__ == "__main__":
  
    s ='\nGet The Ads    \
        \n-Ben Feng @ 2007\n'      
    print s
    main()


So you can see the AdsParser class has become much more slim. All what I do, is just using this regular expression:

self.urls = 
re.findall('http(?:[^ \n\r\"]+) [.]mov', source )

No loop, but all the links I am searching for will be picked out. It just works!

So now you can see now the output(urls) method even has more codes than AdsPasrser class. ulrs is a list, stores the links returned by AdsParser, but they are only the medium resoluton ones.

output(urls) takes care of the output stuffs.

The two for loops are generating the links for other resolutions, and writing the output to a html file.  Then with this lines, the html file will be displayed on the browser:

    import webbrowser

    s = "file://"+os.getcwd()+"/"+outfile

    webbrowser.open(s)  

OK, the program has been went through, I have to say Python is really good at this!

If you want the file links, you can run this program by yourself, or I attached all the links of Get A Mac in this post, including all resolutions, I like the HD ones, enjoy it! :)


http://www.elesson.com.cn/modules/ipboard/index.php?s=&showtopic=42313


 




在S60手机上运行Python程序, 不是开玩笑哦, Nokia终于干了一件让我兴奋的事了...

Check this out~
Nokia Python for S60

Python for S60是Nokia提供的在S60执行Python脚本的工具, OpenSource的, 现在版本是是V1.2, 支持的功能有

  • GPRS和蓝牙的网络支持
  • 本地及远程执行Python程序
  • 支持本地GUI Widget
  • 发送短信
  • 安装包生成工具
  • 二维的图形图像, 以及全屏应用程序.
  • 摄像头和截屏API
  • 联系人和日程表API
  • 声音录制和回放
  • 读取系统信息, 如IMEI号码, 存储空间, 可用内存等
  • 富文本显示(字体, 颜色, 样式)
  • 支持矢量UI
  • 扩展了键盘事件
  • 拨打号码
  • ZIP模块
PS: 虽然不是很喜欢Nokia的手机, 但现在又多了一个买Nokia手机的理由了... 以前用过7650(应该是第一台S60手机), Nokia的手机太粗糙了, 不过Siemens不复存在了, 没办法~


news



我的豆瓣
我的Flickr

订阅Rss到我的MixWeb个性化主页

New Posts

导航

blog stats

文章

收藏

相册

Friends

Other

存档


正在读取评论……