2009年02月04日

前文 北京四个区二手房走势 中所作统计的对象是北京的几个区,  效果经检验并不好, 除了通州区外其余几个区的R square 值为0.3左右,  通州区为0.6. 这直接影响了回归方程的确定性.  原因主要有: 海淀朝阳昌平等区面积较大, 楼盘新老不一, 产权复杂等.
更确定的方法, 应该是基于子地区, 户型, 甚至小区的统计, 不过这么精确的划分无疑带来很多工作量, 不是我个人能完成的.

因此, 本人仅仅统计了一个子地区: 朝阳区望京地区的两万多条二手房挂牌记录(大多来自于中介). 由于众所周知的原因, 望京的房价最近降的比较厉害.
同时对数据采用了一定的预处理, 比如掐头去尾等.

在这里 一元回归模型的R square值为0.8166834, 属于可接受的范围.
回归方程的系数为
1.2653403387, -0.0009800008
这里就不预测了, 免得到时候失算, 就糗大了.

2009年02月01日

今天休息在家, 研究一下二手房, 突然想可以用统计的方法研究一下当前二手房价的走势.  互联网上, 有很多数据, 只要会挖掘, 就会发现价值吧.

思路是选取了互联网上北京市四个区(海淀, 朝阳, 昌平, 通州) 的二手房历史市价.  求得每日的均价, 对其进行线性回归, 力图达到预测未来房价的目的.

数据从赶集网的二手房出售页面抓取, 使用Python语言. 抓到本地存储成csv格式的文件.   感谢赶集网, 其数据即丰富又正规, 大大减少了分析的工作量使得整个工作在大半天内完成. 统计分析使用GNU R软件,包括基本的数据读入和绘图(Plot)和简单的线性模型.

好了, 闲话少说, 上数据和图表.
海淀区的二手房房价
海淀区的线性回归系数

     1.5440157415,     -0.0008758534
这两个系数构成一个线性方程, 其直观的意义就是:
从采样的起始日期2008年6月20日, 以后的日期t的价格均值为 1.544万 – 0.0008758534 * ( 需要计算的日期 – 2008年6月20日的天数), 那么2009年2月1日的房价均值约为 1.3451万元 = 1.544 – 0.0008758534 * 227.

朝阳区的二手房房价
朝阳区的线性回归系数为
     1.3147709213,     -0.0005277809

昌平区二手房房价
昌平区的线性回归系数
0.965554124,     -0.000528159
通州区二手房房价
通州区的线性回归系数
0.7966942687,   -0.0006909318

那么我们来预测一下半年后的二手房房价, 半年后, 天数为400.
海淀区:  1.19万元
朝阳区:  1.10万元
昌平区:  0.754万
通州区:  0.54万

一年后呢, 设天数为580
海淀区:  1.03万元
朝阳区:  1.00万元
昌平区:  0.659万
通州区:  0.396万

能看到的是, 海淀区的降速度最大(-0.0008758534) 昌平朝阳的较小.  一年后, 房价能降到相对合理的价格, 不过从个人心理上看, 二手房的房价降价速度还是比较慢 另我不太满意.

当然, 线性的回归毕竟比较粗糙,  真实房价的影响因素也非常复杂, 因此这些结论只能作为参考的说法, 到时候如果和此处的预测不一致, 请别来找我麻烦.

以区为划分, 还是比较粗糙,  本系统还支持对区再划分进行统计和分析, 如海淀区还可以有牡丹园, 上地, 等等. 不过太麻烦了, 这里就不做了.

2005年11月01日

  Today I have made a primitive ruby MAS running!  This tool goes after java’s JADE framework , Being tried of java’s verbose syntax I decided to make yet another MAS platform using dynamic languages so the ruby language was choosen for its strongly support on distributed and concurrent programming. The actual working time spent on the yet primitive MAS until now is about two work days. However, much more works should be done to consummate the works. The main target of this platform is to become a real system where various of works can be built on. If it is suitable my future works will be added onto the framework. The initial estimation of time schedule is about 1 month.
  Here is a sample code in which two agents, namely Jake and Mary say love to each other.

require "agent"
require "behaviour"
require "container"

class JakeAgent < Agent
  def initialize
    super("Jake")
  end
  def setup
    msg = ACLMessage.new(ACLMessage.REQUEST)
    msg['sender'] = AID.new @name
    msg['receivers'] << AID.new("Mary")
    msg['language'] = "English"
    msg['ontology'] = "Loving"
    msg['content'] = "Hello, I love u"
    puts "Jake>>>Mary"
    puts msg
    send_msg(msg)
  end
end

class MaryBehaviour < Behaviour
  def initialize
    @skip = false
  end
  def skip
    @skip
  end

  def action
    msg = @agent.receive
    if msg
      reply = ACLMessage.new(ACLMessage.INFORM)
      reply['sender'] = AID.new @agent.name
      reply['receivers'] << msg['sender']
      reply['language'] = "English"
      reply['ontology'] = "Loving"
      reply['content'] = "I love u too"
      puts "Mary>>>Jake"
      puts reply
      @agent.send_msg(reply)
      @skip = true
    end 
  end
end

class MaryAgent < Agent
  def initialize
    super("Mary")
  end
  def setup
    add_behaviour(MaryBehaviour.new)
  end
end

Thread.abort_on_exception = true
p = Platform.new("LovingRoom", 7777)
s = p.start_service
p.add_agent(MaryAgent.new)
p.add_agent(JakeAgent.new)

t = Thread.new(s) { |tid|
  gets
  Thread.kill(tid)
  puts "finished" 
}

t.join
s.join

2005年10月11日

Maxima is really a good tool to help me making mathemetical deductions. The following few lines lead to the mod of two polynomials

h(x) := x^5 + x^3 + 1;
t(x) := x^8 + x^5 + x^3 + 1;
expand(t(x) – quotient(t(x), h(x), x) * h(x));

2005年08月16日

A Naive Bayes classifier is used to classfy a document into some categories, so it can be used in email classification, document classification, stock predication, etc. There are some more efficient classification algorithms but they are too complex to understand.

So I wrote a script of python just to have a programming exercise. firstly some documents are feet in to train the classifier, here the classifier is an in-memory dictionary, a realistic storage should be some larger persistent media such disk file or else. secondly a document is feed in to be categorized.

This is a very primative algorithm implementation. so more works are needed to make it robust.

usage:

     %python bayes.py <train 1> <train 2> … <train n> <document>

import sys, os, bsddb, math
import re

def fil(word):
    return not word.lower() in (‘is’, ‘a’, ‘and’, ‘or’, ‘not’, ‘if’, ‘while’, ‘at’)

def wordseg(filename):
    return filter(fil, re.findall(r’(\w+)’, open(filename).read()))

def trainCategory((num, wordseg)):
    db = {}
    for word in wordseg:
        db[word]  = db.get(word, 0) + 1       
    len_w = len(wordseg)
    return db, len_w    

def categorizeDoc((db, len_w), total_len_w, wordseg):   
    a = math.log(float(len_w) / total_len_w)   
    len_db = len(db)
    for word in wordseg:
        if word in db:
            pw = float(db[word])
            a += math.log(pw/len_db)
        else:
            a += math.log(0.01 / len_db)
    return a
   
def train(wordsegs):
    return map(trainCategory, enumerate(wordsegs))
   
   
def categorize(total_len_w, doc, trains):
    lresult = []
    i = 0
    for cat in trains:
        lresult.append((categorizeDoc(cat, total_len_w, doc), i))
        i += 1
    print lresult
    return max(lresult)

if __name__ == ‘__main__’:
    trains = train([wordseg(trainname) for trainname in sys.argv[1:-1]])
    total_len_w = reduce(lambda x, y : x + y[1], trains, 0)
    print categorize(total_len_w, wordseg(sys.argv[-1]), trains)

2005年08月15日

The JADE(JAVA Agent Development Framework) is an agent dev platform that complies with the FIPA standards. It uses the Java language with many of its utilities, such as RMI, thread.  correspondign aspects can be found easily in ruby languages, RMI -> DRB. Ruby has an excellent Thread library and language mechanics.
Further more, the features of ruby as a script language make the ruby agents easier to develop than the java counterparts. Ruby’s performance may be better than java in a distributed environment, at least it spends much less memory then java.
Compared with other script languages , python and perl. ruby is more convinent to be used in the fields concurrent applications. its built-in concurrent mechanisms just taste good. while neither python and perl are thread friendly.
So this would be my next interest unless some exceptions occures. I’d like to study the jade code, FIPA sl specifications to know what the agents platform is and how agents react and communicate. An UML diagrams should be drawn first to illustrate the JADE/FIPA.

2005年06月25日

Petri net modelling is a marvellous technique to simulate concurrent , descrete event systems that exist broadly in the real world.  I found in the logistics project I am taking part in that a lot of process can be modelled using petri nets. So I decide to make a tool to create and manipulate petri net  model easily.

The tool I have made is far from mature. But it allows basic control of a petri net model, ie, to create a new petri net. to load existing petri net from file, to add a transition , to remove a place, to let the petri net go one step or multiple steps, and so on. Those are the basic beheaviors of a petri net modelling. So that a model can be visulized to convince customers the correctness of their systems.

An editor is provided for the convinence of editing petri nets, the naive screen shot is following.

The language I used to implement the system is of course  the almighty python( I become more and more dependant on the Python language, is it good news or bad one?).  A python function can be hooked to a transition , when it  is triggered , the hook is called back, That makes a dynamic running system. The hooks can be extended not only in Python language but also in any languages that support XMLRPC or SOAP.

An interface is also provided via SOAP/XMLRPC for a petri net to add a token to a place or retrieve a token from a place so that the petri net’s state be changed. That may leads to different beheavior and causes different result. The interface can be counted on as a basic external interface. With these methods multiple petri nets can communicate with each other.

2005年06月14日

An agent has actions, communication and knowledges, thus handling them are the key beheaviors of agent development.
  The actions of usual agent system is defined in programming languages, such as Java, Agent’tcl, …. those languages are mostly imperical and used for executing but not expression. On the countary , those expressing languages are not suitable for doing somthing real. My idea is that: the beheaviors of agents can be defined via the Petri Net system. a petri net system is a modeling language that describes concurrent system using direct Images. I have already made a software to draw a petri net and made it runnable. To intergrate it seems easy.
  The knowledge an agent owns can be hold in the form of KIF(knowledge interchange format), My rete system is just right to do this tasks.
  The communication sub system of agent has not been study fully before, seems the ACL is OK. It should be the next job I have to do .
  The current tasks is to combine action defination and knowledge expression to make a real system, the highland transport simulation. so that the convince can be given to my boss so that more investigation be heapped on the MAS. 
 

2005年06月07日

Usual intelligent systems, such as clips system, lisp language, prolog language have an advanced data structure support, at least the LIST type should be involved in the gramma and semantics level. That’s what my rete expert system is short of, so I need to add list support into it. The management of rule database need not be touched. But the match function have to be rewritten, so does the language interface parser. The list support will be added in several days if  sourceforge.net’s censorship comes out , whether the result is denial or acceptance.

2005年06月06日

  After reading the RETE algorithm which can promote the perfermence of forward-chained rule reasoning system last friday (3rd June 2005),  I suddenly met with an idea to write a little Expert system using RETE algorithm. Yes, there are some other expert systems, such as clips, Jess… But I think the most familiar tools is the one that is written by ones own.
  I spent last weekend (4th and 5th June) writting about that, everything went on smoothly. Of cource the language I choose is the powerful python. Today a simple CLI based rule engine emerged with version 0.1.0, You can add facts, define rules and execute queries. It has a simple syntax like clips, so the learning curve is assumed to be flat.
  I registed a project named zerorule at sourceforge.net 
, the registration page have been submitted , hope that it passes the censorship, or I’ll have to submit it elsewhere if it failes.
  I hope the software be fully used in small business reasoning and personal knowledge management(The same role as KIF).I’ll try to apply it in my logistic  projects to look at what a role rule engines plays.
  Now this tool is difficult to be applied in large reasoning systems without optimizations because the large memory cost of RETE algorithm. The next things about it are:
  1: Optimization on perfermance, especially the memory usage.
  2: Rich user interface
  3: Libraries functionalities.