Concurrency Programming 相關報告
一. 我會接觸Erlang的緣由
1.RFID Middleware
2.jabber (xml::stream http://zh.wikipedia.org/wiki/Jabber)
3.ejabber (http://www.process-one.net/en/ )
二. 現在的商業環境(web server)所面臨的問題
1.連線的數量不斷的攀升
2.連線的時間很長
傳統上httpd 使用Prefork的方式來解決,短時間時密集連線的問題,在現在的環境愈到了嚴重的挑戰,比如: HTTP_Streaming、Server Push、COMET 這些需要長時間連線的架構,使得httpd 能夠服務的連線變少了,而fork process 最大的問題是,他所需要佔用記憶體的空間過於龐大,於是其他的伺服器架構崛起(lighthttpd ghttpd …)
The C10K problem( http://www.kegel.com/c10k.html )
It's time for web servers to handle ten thousand clients simultaneously, don't you think? After all, the web is a big place now.
And computers are big, too. You can buy a 1000MHz machine with 2 gigabytes of RAM and an 1000Mbit/sec Ethernet card for $1200 or so. Let's see - at 20000 clients, that's 50KHz, 100Kbytes, and 50Kbits/sec per client. It shouldn't take any more horsepower than that to take four kilobytes from the disk and send them to the network once a second for each of twenty thousand clients. (That works out to $0.08 per client, by the way. Those $100/client licensing fees some operating systems charge are starting to look a little heavy!) So hardware is no longer the bottleneck???
三. Concurrency Programming
1. fork
原始的程式
(程式+資料) --fork(複製一份)(程式+資料)
當程式fork 後,child 繼承原來的資料,此後彼此不相關,如果要傳遞資訊,需要使用pipe sharememory 或是 unix socket 來做資料交換
2. thread
事實上在Linux 系統下,執行緒只是一個light weight process:Linux 核心是以fork() system call 來產生一個新的行程(process),而執行緒是以clone() system call 產生的。fork()和clone()的差別只是在clone()可以指定和父行程共用的資源有哪些,當所有資源都和父行程共用時就相當於一個執行緒了。因為Thread 的使用會讓子父行程共用資源,因此非常容易引發dead lock / race condition …這類的問題
3. lightweight Threads ( http://www.defmacro.org/ramblings/concurrency.html)
Erlang process 是一個輕量級的Thread,因此他可以非常輕易的去開啟或是結束且快速在彼此做切換,因為掀開他的底層,他只是一個簡單的function罷了,process節省了大量的context switching浪費僅在一些function上做切換的動作(Erlang 的Thread 是 vm level thread)
這份文件簡單的提到了Erlang的概觀
http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks
/252.pdf
四. Erlang ( http://www.erlang.org/ )
1.以下是 about Erlang 對他自己的簡述
Erlang is a programming language which has many features more commonly associated with an operating system than with a programming language: concurrent processes, scheduling, memory management, distribution, networking, etc.
The initial open-source Erlang release contains the implementation of Erlang, as well as a large part of Ericsson's middleware for building distributed high-availability systems.
Erlang is characterized by the following features:
Concurrency - Erlang has extremely lightweight processes whose memory requirements can vary dynamically. Processes have no shared memory and communicate by asynchronous message passing. Erlang supports applications with very large numbers of concurrent processes. No requirements for concurrency are placed on the host operating system.
Distribution - Erlang is designed to be run in a distributed environment. An Erlang virtual machine is called an Erlang node. A distributed Erlang system is a network of Erlang nodes (typically one per processor). An Erlang node can create parallel processes running on other nodes, which perhaps use other operating systems. Processes residing on different nodes communicate in exactly the same was as processes residing on the same node.
Soft real-time - Erlang supports programming "soft" real-time systems, which require response times in the order of milliseconds. Long garbage collection delays in such systems are unacceptable, so Erlang uses incremental garbage collection techniques.
Hot code upgrade - Many systems cannot be stopped for software maintenance. Erlang allows program code to be changed in a running system. Old code can be phased out and replaced by new code. During the transition, both old code and new code can coexist. It is thus possible to install bug fixes and upgrades in a running system without disturbing its operation.
Incremental code loading - Users can control in detail how code is loaded. In embedded systems, all code is usually loaded at boot time. In development systems, code is loaded when it is needed, even when the system is running. If testing uncovers bugs, only the buggy code need be replaced.
External interfaces - Erlang processes communicate with the outside world using the same message passing mechanism as used between Erlang processes. This mechanism is used for communication with the host operating system and for interaction with programs written in other languages. If required for reasons of efficiency, a special version of this concept allows e.g. C programs to be directly linked into the Erlang runtime system.
2.Erlang 語言上的概觀
書籍: ( http://pragmaticprogrammer.com/titles/jaerlang/index.html )
[ Sequential Erlang ]
Exam1:
Consider the factorial function N! defined by:
N!=N*(N-1) when N>0
N!=1 when N=0
-module(math1).
-export([fac/1]).
fac(N) when N > 0 -> N * fac(N-1);
fac(0)-> 1.
Exam2:
-module(math2).
-export([sum1/1, sum2/1]).
sum1([H | T]) -> H + sum1(T);
sum1([]) -> 0.
sum2(L) -> sum2(L, 0).
sum2([], N) -> N;
sum2([H | T], N) -> sum2(T, H+N).
[ Concurrency Programming ]
Exam3:
-module(concurrency).
-export([start/0, say /2]).
say (What, 0) ->
done;
say (What, Times) ->
io:format("~p~n", [What]),
say_something(What, Times - 1).
start() ->
spawn(tut14, say, [hello, 3]),
spawn(tut14, say, [goodbye, 3]).
Exam4:
-module(area_server).
-export([loop/0]).
loop() ->
receive
{rectangle, Width, Ht} ->
io:format("Area of rectangle is ~p~n",[Width * Ht]),
loop();
{circle, R} ->
io:format("Area of circle is ~p~n", [3.14159 * R * R]),
loop();
Other ->
io:format("I don't know what the area of a ~p is ~n",[Other]),
loop()
end.
We can create a process which evaluates loop/0 in the shell:
Pid = spawn(area_server,loop,[]).
Pid ! {rectangle, 6, 10}.
Pid ! {circle, 23}.
Pid ! {triangle,2,4,5}.
4. Erlang –style process or event-based model for actors ( http://lambda-the-ultimate.org/node/1615 )
( http://lamp.epfl.ch/~phaller/doc/haller07coord.pdf )
Message passing
Each process has its own input queue for messages it receives. New messages received are put at the end of the queue. When a process executes a receive, the first message in the queue is matched against the first pattern in the receive, if this matches, the message is removed from the queue and the actions corresponding to the the pattern are executed.
However, if the first pattern does not match, the second pattern is tested, if this matches the message is removed from the queue and the actions corresponding to the second pattern are executed. If the second pattern does not match the third is tried and so on until there are no more pattern to test. If there are no more patterns to test, the first message is kept in the queue and we try the second message instead. If this matches any pattern, the appropriate actions are executed and the second message is removed from the queue (keeping the first message and any other messages in the queue). If the second message does not match we try the third message and so on until we reach the end of the queue. If we reach the end of the queue, the process blocks (stops execution) and waits until a new message is received and this procedure is repeated.
Of course the Erlang implementation is "clever" and minimizes the number of times each message is tested against the patterns in each receive.
五. Erlang相關資源
Website:
Open Source Erlang
http://www.erlang.org
http://www.process-one.net/en/projects/
Mail List:
Erlang-questions -- Erlang/OTP discussions
http://www.erlang.org/mailman/listinfo/erlang-questions
BOOK:
Concurrent programming in Erlang
http://www.erlang.org/download/erlang-book-part1.pdf
Programming Erlang Software for a Concurrent World
http://pragmaticprogrammer.com/titles/jaerlang/index.html
MY BLOG: http://rd-program.blogspot.com
- 07:07
- 浏览 (12315)
- 评论 (46)
- 分类: Concurrency-Programming erlang
- 进入论坛
- 相关推荐
评论
g(result) { }
f(....,g)
{ .....
result=.....;
g(result);
}
这是一个最简单的continuation style.
call/cc只是实现continuation style的一种语法糖.
ucontext / fiber / callcc / yield(python) 都是可以用于实现coroutine的某种技术.
erlang和coroutine是不同的,coroutine是由代码决定何时进行调度.erlang的process则由erlang scheduler来实现分时调度.
subroutine 实则上就是function,只是一个不返回result的函数,在pascal中叫做procedure.
qiezi 写道:
多谢! 你总是能成为别人的指路明灯
ucontext / fiber / callcc / yield(python) 和erlang的进程调度是不是应该算是轻量级(用户级)线程了?
能不能再解释下continuation / coroutine / subroutine 是否描述的是同一个东西?这些词很是绕人呢。。
多谢! 你总是能成为别人的指路明灯
ucontext / fiber / callcc / yield(python) 和erlang的进程调度是不是应该算是轻量级(用户级)线程了?
能不能再解释下continuation / coroutine / subroutine 是否描述的是同一个东西?这些词很是绕人呢。。
[quote="Trustno1"]可以通过windows下的fiber或者linux下的ptx_switch_context来做coroutine. [/quote]
这个ptx_switch_context哪里有资料?完全搜不到呢,搜到的结果都指向这个帖子。。
相较Ruby/Rails的share nothing,Ruby/Rails真的是share NOTHING,也就是说如果多核CPU下采用每个CPU起一个Rails实例的方法,每个实例就要重新装载、编译一套代码,内存消耗会大得多。
我的看法是如果想用每个核一个Rails实例的方法来简单解决Rails今后在多核CPU下的并行问题是有问题的。
至于Erlang,自OTP-11B加入SMP支持后,多核、单核你都不用自己操心,一切在OTP后台处理好了,而且,很简单就处理好了。
<p>比较关心erlang的线程调度效率,自己管理也得保存上下文吧,估计肯定比操作系统调度效率低。 </p>
<p>如果这样,在并发性能上erlang应该低于java(NIO,native thread),那使用erlang的唯一理由就在它的天生集群支持能力了,如何实现的谁来给扫扫盲。</p>
<p>在处理并发 IO 这一块,我现在见过比较好的解决方案还是 windows 下面的 IO Completion Port + Fiber 的方式,可以接近做到在利用异步 IO 提高效率的同时,允许书写上层代码的程序员将 IO 视为一个同步的操作。这个已经相当理想了,主要问题还是在于 1.工作量,2.可移植性,3.无法自然扩展到多台机器构成的集群上去。</p>
dcaoyuan 写道:
lists:flatten([$a, $b, "A String", $z])
avindev 做了测试,结果是flatten的效率比++差,看来我主观了,因为ejabbed里喜欢用flatten。连接:
http://avindev.javaeye.com/blog/82560
最近用Erlang多了,对list的了解也好了些,在这里更正一下有关字符合并为list的描述:
1、lists:reverse([A|Acc])效率最高,但用于合并比较多的字符不够直观;
2、lists:flatten(DeepList)应该用于展平depth > 1的list
3、对于depth = 1的list,展平应该用lists:append,效率比lists:flatten好;
4 、lists:append与++是等价的。
这些在Efficiency Guide和lists的文档中有讲。
[quote="qiezi"]C++写多线程,为了减少线程数量就得大量使用异步方式,这个也很难写,算法就更难分片了,特别是要平均分配到多个CPU上,最后还要把结果汇总。erlang不知道有没有这方面的优势,估计还是免不了要自己spawn,看那个pmap的实现应该是可以方便地做任务分割再结果汇总了。 上次看程序员杂志有一篇介绍,erlang在发送消息时会把process调度到同一个线程里,不知道发完了还会不会放回去,不会造成一轮消息过后全跑到一个线程上去了吧?[/quote]
这样的细节应该是应用自己来取舍和解决的。
上次看程序员杂志有一篇介绍,erlang在发送消息时会把process调度到同一个线程里,不知道发完了还会不会放回去,不会造成一轮消息过后全跑到一个线程上去了吧?
[quote="AvinDev"] 对于每个job自己的一个队列这种方式,我同事认为它存在线程切换开销以及锁的开销,调试也不方便。[/quote]
一般来说,每个job自己一个队列的方式用linux自己写调度是一个比较好的办法可以做到没有context_switch没有lock开销,我前一个公司是做softswitch的,他们就是这么干的.当然这其实相当于自己写了一个erlang的调度器.
这是其一,其二流程的拆分取决于每个状态响应的时间,如果一个状态下操作过长你就不得不把原来一个单一的逻辑拆成两个.而且这种分拆往往不是系统设计,编码时候会遇到的问题,而是往往在最后性能调优阶段,你总是会发现某一个状态在大并发量下响应缓慢而把消息队列给撑爆,这个时候你又要在这个状态上进行切分,这个工作量就会非常之大.
当然,可以通过windows下的fiber或者linux下的ptx_switch_context来做coroutine,但是复杂度降低的仍然很有限.
btw,potian有空的话,不妨谈谈在公司项目中使用Erlang的一些经验和想法吧:)
<p>每一个任务并不一定是和一个线程对应,使用线程池或者单线程都可以,我指的是每一个任务相关的需要处理的数据的一个队列 </p>
<p>当你需要解藕,并且任务之间需要相互通讯、相互同步的时候,这些队列极有可能是需要的 </p>
对于每个job自己的一个队列这种方式,我同事认为它存在线程切换开销以及锁的开销,调试也不方便。
这中间必然涉及到队列的维护和数据的拷贝,或者类似于erlang的大数据引用等等,而一旦这样做,你就需要考虑引用的计数维护(或者拷贝的算法),队列的同步维护等等一系列问题,这本来就是Erlang的强项,当然,你可以针对某一个应用进行特殊的处理,但是一般我们都会偏向于逐步抽象,形成一定积累的内部框间,我很怀疑绝大多数人能够处理得比Erlang更好。
复杂的网络应用会有很多做法,例如我们的流媒体除了支持TCP方式的“组播”外,由于客户端经常需要轮巡,也就是支持同一个连接的视频源发生变化,用process和message来构造这个模型,非常轻易地就实现了,并且效率很高
[quote="potian"]我不认为复杂网络应用程序的性能,在并发量比较大的时候C++还能占有优势 如果系统逻辑比较简单,例如连接到服务器的客户端互相之间关系不大的时候,那么可能C++网络会有优势。但是在复杂的网络应用程序中,网络处理的速度、逻辑的复杂性、同步处理都是影响到性能的重要原因。 采用异步方式处理网络IO的程序处理复杂逻辑就非常困难,不但难以调试和扩展,而且本身就会造成性能下降。从处理并发的角度来看,消息处理可以大大提高系统的并发能力。另外,在并行能力的可伸缩性方面,Erlang更具有得天独厚的优势。[/quote]
不赞同。
从处理 IO 的角度,直接调用操作系统核心提供的异步 API 是效率最高的方式(如果核心的实现不是太烂的话),做这个事情是 C/C++ 的专长。另一方面,数据 IO 和对数据的逻辑处理是可以解耦的,因此我不认为复杂的逻辑是个无法解决的问题。这个地方真正的问题是两个:一是工作量,要达到同样效果,C/C++ 的实现工作量会大不少;二是 C/C++ 的实现无法自然地扩展到集群上去。
- 浏览: 78824 次
- 性别:

- 来自: 台北

- 详细资料
搜索本博客
最近加入圈子
链接
最新评论
-
Concurrency Programming ...
continuation,实则上是一种函数调用方式或者说代码风格 g(resu ...
-- by Trustno1 -
Concurrency Programming ...
qiezi 写道: [quote="Trustno1" ...
-- by pi1ot -
Concurrency Programming ...
[quote="Trustno1"]Sorry API 记错 ...
-- by qiezi -
Concurrency Programming ...
Sorry API 记错了应该是ucontext.h下面的makecontext ...
-- by Trustno1 -
Concurrency Programming ...
[quote="Trustno1"]可以通过windows下 ...
-- by qiezi






评论排行榜