Windreamer Is Not a DREAMER
main(){main(puts("Hello,stranger!"));}

2006年3月21日

发件人: Andrei Alexandrescu (See Website For Email)  
日期: 2006年3月18日(星期六) 下午12时13分
电子邮件: "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEm...@erdani.org>
论坛: comp.lang.c++.moderated

The recent thread "Can GC be beneficial" was quite beneficial :o) - to
me at least. I've reached a number of conclusions that allow me to
better place the conciliation between garbage collection and
deterministic finalization in the language design space - in C++ and in
general.

The following discussion focuses on C++-centric considerations, with
occasional escapes into "the right thing to do if we could break away
with the past.

Basic Tenets, Constraints, and Desiderata
=========================================

Garbage collection is desirable because:

(1) It automates a routine and error-prone task

(2) Reduces client code

(3) Improves type safety

(3) Can improve performance, particularly in multithreaded environments

On the other hand, C++ idioms based on constructors and destructors,
including, but not limited to, scoped resource management, have shown to
be highly useful. The large applicability of such idioms might actually
be the single most important reason for which C++ programmers shy away
from migrating to a garbage-collected C++ environment.

It follows that a set of principled methods that reconcile C++-style
programming based on object lifetime, with garbage collection, would be
highly desirable for fully exploiting garbage collection's advantages
within C++. This article discusses the challenges and to suggests
possible designs to address the challenges.

The constraints include compatibility with existing C++ code and styles
of coding, a preference for type safety at least when it doesn't
adversely incur a performance hit, and the functioning of today's
garbage collection algorithms.

A Causal Design
===============

Claim #1: The lifetime management of objects of a class is a decision of
the class implementer, not of the class user.

In support of this claim we come with the following examples:

a) A class such as complex<double> is oblivious to destruction
timeliness because it does not allocate scarce resource that need timely
release;

b) A class such as string doesn't need to worry about destruction
timeliness within a GC (Garbage Collected) environment;

c) A class such as temporary_file does need to worry about destruction
timeliness because it allocates scarce resources that transcend both the
lifetime of the object (a file handle) and the lifetime of the program
(the file on disk that presumably temporary_file needs to delete after
usage).

In all of these examples, the context in which the objects are used is
largely irrelevant (barring ill-designed types that employ logical
coupling to do entirely different actions depending on their state).
There is, therefore, a strong argument that the implementer of a class
decides entirely what the destruction regime of the class shall be. This
claim will guide design considerations below.

We'll therefore assume a C++ extension that allows a class definition to
include its destruction regime:

 

//  garbage collected  
 
class [collected] Widget {...}
// deterministically destroyed  
 
class [deterministic] Midget {...}


 

These two possible choices could be naturally complemented by the other
allowed storage classes of a class:

 

//  garbage collected or on stack  
  
class [collected, auto] Widget {...}
// deterministically destroyed, stack, or static storage  
  
class [deterministic, auto, static] Midget {...}

It is illegal, however, that a class specifies both collected and
deterministic regime:

 

//  illegal  
  
class [collected, deterministic] Wrong {...}


 

Claim #2: Collected types cannot define a destruction-time action.

This proposal makes this claim in wake of negative experience with
Java's finalizers.

Claim #3: Collected types can transitively only embed fields of
collected types (or pointers thereof of any depth), and can only derive
from such types.

If a collected type would have a field of a non-collected type, that
type could not be destroyed (as per Claim #2).

If a collected type would have a field of pointer to a non-collected
type, one of two things happens:

a) A dangling pointer access might occur;

b) The resource is kept alive indeterminately and as such cannot be
destroyed (as per claim #2).

If a collected type would have a field of pointer to pointer to (notice
the double indirection) deterministic type, inevitably that pointer's
destination would have to be somehow accessible to the garbage-collected
object. This implies that at the some place in the points-to chain, a
"jump" must exist from the collected realm to the uncollected realm (be
it automatic, static, or deterministic) that would trigger either
post-destruction access, or would prevent the destructor to be called.

Design fork #1: Weak pointers could be supported. A collected type could
hold fields of type weak pointer to non-collected types. The weak
pointers are tracked and are zeroed automatically during destruction of
the resource that they point to. Further dereference attempts accesses
from the collected realm become hard errors.

Claim #4: Deterministic types must track all pointers to their
respective objects (via a precise mechanism such as reference counting
or reference linking).

If deterministic types did allow untracked pointer copying, then
post-destruction access via dangling pointers might occur. The recent
discussion in the thread "Can GC be beneficial" has shown that it is
undesirable to define post-destruction access, and it's best to leave it
as a hard run-time error.

Design branch #2: For type safety reasons, disallow type-erasing
conversions from/to pointers to deterministic types:

 

   
   class [deterministic] Widget {...}
   Widget 
* p = new Widget; 
   void * p1 = p; // error  
   
= static_cast<Widget *>(p1); // error, too 

Or: For compatibility reasons, allow type-erasing conversion and incur
the risk of dangling pointer access.

Design branch #3: For purpose of having a type that stands in as a
pointer to any deterministic type (a sort of "deterministic void*"), all
deterministic classes could be thought as (or required to) inherit a
class std::deterministic.

Design branch #3.1: std::deterministic may or may not define virtuals,
and as such confines or not all deterministic classes to have virtuals
(and be suitable for dynamic_cast among other things).

Claim #5: When an object of deterministic type is constructed in
automatic or static storage, its destructor will automatically issue a
hard error if there are any outstanding pointers to it (e.g., the
reference count is greater than one).

If that didn't happen, dangling accesses to expired stack variables
might occur:

 

 class [deterministic] Widget {...}
 Widget 
* p; 
int Fun() 
    Widget w; 
    p 
= &w; 
    
// hard runtime error upon exiting this scope 



}
 



 

Discussion of the basic design
==============================

The desiderata set up and the constraints of the current C++ language
created a causal chain that narrowly guided the possible design of an
integrated garbage collection + deterministic destruction in C++:

* The class author decides whether the class is deterministic or garbage
collected

* As a natural extension, the class author can decide whether objects of
that type are allowed to sit on the stack or in static storage. (The
regime of automatic and static storage will be discussed below.)

* Depending on whether a type is deterministic versus collected, the
compiler generates different code for copying pointers to the object.
Basically the compiler automates usage of smart pointers, a
widely-followed semiautomatic discipline in C++.

* The heap is conceptually segregated into two realms. You can hold
unrestricted pointers to objects in the garbage-collected realm, but the
garbage-collected realm cannot hold pointers outside of itself.

* The operations allowed on pointers to deterministic objects are
restricted.

Regime of Automatic Storage
===========================

Claim 6: Pointers to either deterministic or collected objects that are
actually stack allocated should not escape the scope in which their
pointee object exists.

This obvious claim prompts a search in look for an efficient solution to
a class of problems. Here is an example:

 

 class [auto, collected] Widget {...}
void Midgetize(Widget & obj) 
    obj.Midgetize(); 


}
 


void Foo() 
    Widget giantWidget; 
    Midgetize(giantWidget); 


}
 



 

To make the example above work, Foo is forced to heap-allocate the
Widget object even though the Midgetize function works on it
transitorily and stack allocation would suffice.

To address this problem a pointer/reference modifier, "auto", can be
defined. Its semantics allow only "downward copying": an
pointer/reference to auto can only be copied to lesser scope, never to
object of larger scope. Examples:

 

void foo() 
    Widget w; 
    Widget 
*auto p1 = &w1; // fine, p1 has lesser scope 
    
      Widget 
*auto p2 = &w; // fine 
      p2 = p1; // fine 
      p1 = p2; // error! Escaping assignment! 
    }
 



}
 



 

Then the example above can be made modularly typesafe and efficient like
this:

 

 class [auto, collected] Widget {...}
void Midgetize(Widget &auto obj) 
    obj.Midgetize(); 


}
 


void Foo() 
    Widget giantWidget; 
    Midgetize(giantWidget);  
// fine 


}
 


 

Claim #6: "auto"-modified pointers cannot be initialized or assigned
from heap-allocated deterministic objects.

If "auto"-modified pointers manipulated the reference count, their
efficiency advantage would be lost. If they didn't, a type-unsafe
situation can easily occur.

Does operator delete still exist?
=================================

For collected objects, delete is inoperant, as is for static or
automatic objects. On a heap-allocated deterministic object, delete can
simply check if the reference count is 1, and if so, reassign zero to
the pointer. If the reference count is greater than one, issue a hard  
error.

Note that this makes delete entirely secure. There is no way to have a
working program that issues a dangling access after delete has been  
invoked.

Regime of Static Storage
========================

Static storage has the peculiarity that it can easily engender
post-destruction access. This is because the order of module
initialization is not defined, and therefore cross-module dependencies
among objects of static duration are problematic.

This article delays discussion of the regime of static storage.
Hopefully with help from the community, a workable solution to the
cross-module initialization would ensue.

Templates
=========

Claim #7: The collection regime of any type must be accessible during
compilation to templated code.

Here's a simple question: is vector<T> deterministic or collected?

If it were collected, it couldn't hold deterministic types (because at
the end of the day vector<T> must embed a T*). If it were deterministic,
collected types couldn't hold vectors of pointers to collected types,
which would be a major and gratuitous restriction.

So the right answer is: vector<T> has the same regime as T.

 

template <class T, class A> 
class [T::collection_regime] vector // or some other syntax 
   ... 

}


 

The New World: How Does it Look Like?
=====================================

After this design almost happening as a natural consequence of an
initial set of constraints, the natural question arises: how would
programs look like in a C++ with these amenities?

Below are some considerations:

* Pointer arithmetic, unions, and casts must be reconsidered (a source
of unsafety not thoroughly discussed)

* Most types would be [collected]. Only a minority of types, those that
manage non-memory resources, would live in the deterministic realm.

* Efficiency of the system will not degrade compared to today's C++. The
reduced need for reference-counted resources would allow free and fast
pointer copying for many objects; the minority that need care in
lifetime management will stay tracked by the compiler, the way they
likely were manipulated (by hand) anyway.

* Given that the compiler can apply advanced analysis to eliminate
reference count manipulation in many cases, it is likely that the
quality of built-in reference counting would be superior to
manually-implemented reference counting, and on a par with advanced
manual careful manipulation of a mix of raw and smart pointers.

----------------------

Whew! Please send any comments you have to this group. Thanks!

Andrei

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

posted @ 2006-03-21 10:01 Windreamer Is Not DREAMER 阅读(564) | 评论 (1)编辑 收藏

2005年12月16日

Is it a mistake in TAOCP  
Maggie McLoughlin <mam@theory.stanford.edu> to Windreamer

Sequences with n=0 are empty. It's important in mathematics
to deal with empty sets and strings etc in a meaningful way.
If n = 0 and you're supposed to do something for 1 <= j <= n,
you don't have to do anything.

Thanks for your interest in my book! -- Don Knuth

呵呵,原来是我年少无知了,再次赞一下Knuth爷爷写书的精致
posted @ 2005-12-16 10:05 Windreamer Is Not DREAMER 阅读(535) | 评论 (1)编辑 收藏

2005年12月12日

要说的话好多,列个提纲先


TAOCP初读感受

        《The Art of Computer Programming》的第一卷,大理石花纹的封皮,拿在手里沉甸甸的,这部书给我的第一印象就是这样--"厚重"--带有着神秘感和历史感。

        其实这部书的中文版前言,我早就有幸拜读过,不过和英文原文相比较,在中文翻译的味道真的是差了很多,我觉得只有读原文才能感到Knuth略带诙谐的而又同是不是严谨的风格,他写文章的风格其实真的挺天马行空的,从写程序扯到做饭,从算法这个词聊起,追着这个词的来历,竟然还带出了莱布尼茨?真晕,开句玩笑,Knuth绝对是那种老顽童型的人物,他这本书达到如此厚度估计此类"废话"功不可没。

        从Algorithm到Euclid's Algorithm也就是我们熟悉的辗转相除求最大公约数法,我这个算法小白开始进入了他打开的算法世界......

        Knuth行文很喜欢比较、比喻、对比,这让读者看起来很轻松愉悦,不过当他真的玩起数学来,我就有点吃不消了,最后面对算法的一个形式化描述,消耗了我不少精力,不过目前看来还是大致明白了

         总之,这本盛名之下的书,也的确有很多独到的地方,作为计算机科学领域的史诗,它给我的第一印象的确很棒。希望我能坚持着看下去,从中吸收营养。




今天的收获

           虽然只看了一节,不过也消耗了我不少的时间和精力(看来别的一些事情也不能太耽误,也要抓紧了)

            今天的收获很多,首先对算法这个名词有了更多一些的感性认识,Knuth提出的“有限、明确定义、有输入、有输出、有效率”这几个原则总结得真是不错,尤其最前面的两点和效率问题,往往构成了很多复杂的问题,著名的图灵机停机问题大概就是在说这个问题吧……

            另外对于辗转相除法的一些数学上的推导也给了我不错的感觉,虽然书上没有明确的给一个严格的证明,但是根据他的叙述我马上就体会到了用比较严格的方法如何写这个证明,以及这个证明的关键点(我觉得证明中其实用到了通过双包含来争相等的手法,这个是关键)

            算法的形式化描述应起了我大的兴趣,回来的路上想,貌似这个好像形成了某种数学结构,而其上的f映射,构成了某种代数结构,没有仔细想过,不过好像是这样子的哦,我觉得貌似算法的本质就是某种自动状态机,只不过不一定是有限状态的吧,至少从他的意思上看是这样的

            开始没有理解第二个,加上了效率约束的的形式化表达方法的意思,后来花了点时间看了下Ex1.1.8,我觉得我似乎明白了点

我认为Ex1.1.8是这样的一个状态表

            

j Theta_j Phi_j a_j b_j
0 a a 5 1
1 ab c 3 2
2 bc cb 1 2
3 b a 4 3
4 c b 0 4
5 c c 5 5

        为了验证,我写了个简单的程序来试验我的状态表(真是不行了,好多东西要翻看手册,写程序的速度总是上不来)

 1#include    <iostream>
 2#include    <string>
 3
 4using namespace std;
 5int main ( int argc, char *argv[] )
 6{
 7    //                   0,     1,     2,     3,     4,     5
 8    string theta[]={   "a",  "ab",  "cb",      "b",   "c",   "c"};
 9    string phi  []={   "a",   "c",  "bc",    "a",   "b",   "c"};
10    int    a    []={     5,     3,     1,     4,     0,     5};
11    int    b    []={     1,     2,     2,     3,     4,     5};
12
13    int j=0;
14    int i=0;
15    string stat;
16    getline (cin,stat);
17    while(true)
18    {
19        unsigned int loc=stat.find(theta[j],0);
20        if (loc==string::npos)
21        {
22            j=a[j];
23        }

24        else
25        {
26            string temp=stat.substr(0,loc)+phi[j]+stat.substr(loc+theta[j].length());
27            stat=temp;
28            j=b[j];
29        }

30        cout<<i++<<":\tj("<<j<<")\tloc("<<loc<<")\t"<<stat<<endl;
31        cin.get();
32    }

33    return EXIT_SUCCESS;
34}
                /* ----------  end of function main  ---------- */
35


         最后一定要提的是,我好像发现了书里的一处小Bug,而且好像官方网站里的Errata里面没有这个(中文版同样有这个问题),我已经写信给Knuth了,希望我是真的找到了一个没人发现的Bug啊(其实我知道这个不可能)




关于Galgo库的"瞎想"

         念叨做一个泛型的算法库已经有好长时间了,我觉得这个事情与其一直这么YY,还不如高兴了就写一点,不高兴,就扔着,

         其实,这个世界是不缺泛型算法库的,STL,Boost,Blitz++中的泛型算法很全面了,我的计划是把他们中间缺少的部分补起来,不能互操作的地方粘合起来,再有就是增加对MetaProgramming的支持
         呵呵,应该还算是一个比较雄伟的计划吧
         我希望这套库能尽可能的高效率、容易使用、同事保证安全,理想的境地是能够代替ACM集训队使用的模块

         目前我的设想是整个库放在Galgo这个namespace里,这个namespace分为两个子namespace,分别是泛型算法Generic和元编程算法Meta

          我觉得这样一个库的建立与维护,任重而道远不说,没准前人已经作过360遍了,不过没关系,权当娱乐了。



First Step——Euclid GCD的一个实现


           不说什么废话了,先贴代码:
 1//-------------------------------BEGIN:GAlgo_Euclid_GCD.hpp--------------------------//
 2#ifndef _GAlgo_Euclid_GCD_H_
 3#define _GAlgo_Euclid_GCD_H_
 4namespace GAlgo
 5{
 6    namespace Generic
 7    {
 8        template <typename T>
 9        T Euclid_GCD(const T& a,const T& b)
10        {
11            return ((a%b)==0)?b:Euclid_GCD(b,a%b);
12        }

13    }

14    namespace Meta
15    {
16        template <int A,int B>
17        struct Euclid_GCD
18        {
19            static const int value=Euclid_GCD<B,A%B>::value;
20        }
;
21
22        template <int A>
23        struct Euclid_GCD<A,0>
24        {
25            static const int value=A;
26        }
;
27    }

28}

29#endif
30
31//-------------------------------END:GAlgo_Euclid_GCD.hpp--------------------------//

         应该没什么好说的,比较中规中矩,常规手法,不过根据TAOCP上的说法,可能在某些m,n的取值上需要很多重的递归这时候Meta的方法可能会遇到困难(其实第一种也有运行时堆栈溢出的危险),所以说……说什么好呢,就这样了

下面是个简单的测试
 1#include "GAlgo_Euclid_GCD.hpp" 
 2#include <iostream>
 3using namespace std;
 4int main()
 5{
 6    cout<<GAlgo::Generic::Euclid_GCD(6,9)<<endl;
 7    cout<<GAlgo::Meta::Euclid_GCD<6,9>::value<<endl;
 8    return 0;
 9}

10



个人觉得今后有研究价值的方向

         我觉得对于算法描述和图灵机、有限状态机、以及隐隐约约我看到的马尔科夫的某些工作(马尔科夫链)之间的关系深入挖掘一下应该会有不少收获,那个我对这个问题可能会有一个数学结构的猜想估计也可能可以在这个方向上证实或证伪……
         突然想去向偶像黄兆镇请教一下……还是等我把胆子先练大再去吧……
posted @ 2005-12-12 21:48 Windreamer Is Not DREAMER 阅读(1392) | 评论 (4)编辑 收藏

2005年12月10日

终于无聊到来写书评,最近的项目一直都没和C++有什么关系,不过看的书却都是C++方面的,而最近看到的几本书中感觉最好的莫过于这本《C++ Templates》

Nicolai M. Josuttis的书我很喜欢,从他的那本《The C++ Standard Template Library》就看出了他很多独特的风格,令我爱不释手,所以这本《C++ Template》   也进入了我的必看书单。粗读之后,感觉整本书绝对将成为C++泛型领域的圣经级著作

  1. 这本书角度选得很好,全书分三个部分,分别介绍模板基础、模版的编译器实现、模板的高级技巧,三个部分相辅相成、相互照应,由浅入深而又自然而然,还方便分开阅读(比如我就重点看了第一第三部分,模版实现被我略过了)却又全面覆盖了这一领域
  2. 这本书英文很浅显(比《Modern C++ Design》浅显了不知多少倍),语言严谨而又不晦涩,尤其要赞的就是废话尤其地少!
  3. 章节安排很合理,很方别作为工具书应急查阅(《C++STL》就有这个优点,与这本书科学家+工程师的组合不无关系)
  4. 书中好多技术,我是闻所未闻,惊为天人,尤其第三部分,可以算得上眼花缭乱,而且给出的实现感觉既符合标准、实用、而且没有炫技的成分

同类书籍据我所知没有可以达到这个高度的,大部分C++泛型方面的专著只局限于怎么用STL,将模板基础的书,也仅限于最表面的语法,像模版参数推导这种问题鲜有涉及,更不用提关于Metaprogramming,这本书圣经的地位估计后人也是难以企及了。

下面是我看书时画下来的一些觉得自己平时应该注意的地方,放在这里做备忘好了

  1. (P12) [Argument Deducion] If we pass two ints to the parameter type T const&  the C++ compiler must conclude that T must be int. Note that no automatic type conversion is allowed here,Each T must match exactly.

    template <typename T>
    inline T 
    const& max (T const& a,T const& b);

    max(
    4,7)//OK:T is int for both arguments
    max(4,4.2)//ERROR:first T is int,second T is double

  2. (P13)[Template Parameters] In function templates(unlike class template) no default template arguments can be specified
  3. (P14)[Template Parameters]Deducation can be seen as part of  overlaod resolution-a process tha is not based on selection of return type either.The sole exception is the return type of conversion operator members.
  4. (P18)[Overloading Function Template] The fact that not all overloaded functions are visible when a corresponding function call is made may or may not matter.
  5. (P39)[Nontype Function Template Parameters] Function templates are considered to name a set of overloaded function.However,according to the current standard,sets of overload functions cannot be used for template parameter deducation.Thus you have to cast to the exactly type of the function template arguments

    template <typename T,int VAL>
    T addValue (T 
    const& x)
    {
        
    return x+VAL
    }


    std::transform(source.begin(),source.end(),
    //start and end of source
    dest.begin(),//start of destination
    (int(*)(int  const&))addValue<int,5>);//operation

  6. (P40)[Restrictions for Nontype Template Parameters] 太长了,略过
  7. (P44)[The .template Construct]

    template <int N>
    void printBitset (std::bitset<N> const& bs)
    {
        std::cout
    <<bs.to_string<char,char_traits<char>,allacator<char> >();//ERROR:can't recogonize the template
    }


    template 
    <int N>
    void printBitset (std::bitset<N> const& bs)
    {
        std::cout
    <<bs.template to_string<char,char_traits<char>,allacator<char> >();//OK
    }

  8. (P45)[Using this->]

    template <typename T>
    class Base
    {
    public:
        
    void bar();
    }
    ;

    template 
    <typename T>
    class Derived : Base<T>
    {
    public:
        
    void foo()
        
    {
            bar();
    //call external bar() or error
        }

    }


    template 
    <typename T>
    class Derived : Base<T>
    {
    public:
        
    void foo()
        
    {
            
    this->bar();//OK
        }

    }

  9. 同样精彩的还有(P57)[Using String Literals as Arguments for Function Templates]
  10. 令我惊异的SFINE技术(substitution-failure-is-not-an-error)

    template <typename T>
    class IsClassT
    {
    private:
        typedef 
    char One;
        typedef 
    struct {char a[2];} Two;
        template 
    <typename C> static One test (int::C*);
        template 
    <typename C> static Two test();
    public:
        
    enum {Yes=sizeof(IsClassT<T>::test<T>(0))==1};
        
    enum {No=!Yes};
    }
    ;

总而言之,此书带给了我前所未有的阅读享受......我今年震撼大奖一定会投它一票
posted @ 2005-12-10 12:36 Windreamer Is Not DREAMER 阅读(558) | 评论 (3)编辑 收藏

2005年12月5日

主要喜欢他的语法着色功能,真的很方便,RSS等方面的功能也很全面......

测试一下:

//////////////////////////////
//Prime.cpp
//////////////////////////////

template
<int Val>
struct IntType
{
 
const static int value = Val ;
}
;
template
<bool flag, typename T, typename U>
struct Select
{
 typedef T Result;
}
;

template
<typename T, typename U>
struct Select<false, T, U>
{
 typedef U Result;
}
;
template 
<unsigned int N,unsigned int x>
struct FindRoot
{
 
const static int value=Select<(N/x)==x||((N/x+x)/2==x),IntType<x>,FindRoot<N,(N/x+x)/2> >::Result::value;
}
;

template 
<unsigned int N>
struct Sqrt
{
 
const static int value=FindRoot<N,N/2>::value;
}
;

template 
<>
struct Sqrt<0> ;

template 
<int N,int divider>
struct TestPrime
{
 
const static int value=Select<(N%divider)==0,IntType<0>,TestPrime<N,divider-1> >::Result::value;
}
;

template 
<int N>
struct TestPrime<N,1>
{
 
const static int value=1;
}
;

template 
<unsigned int N>
struct IsPrime
{
 
const static int value=TestPrime<N,Sqrt<N>::value+1>::value;
}
;

template 
<>
struct IsPrime<2>
{
 
const static int value=1;
}
;

template 
<>
struct IsPrime<1>;

int printf(const char*,);

int main()
{
 
const int yes=IsPrime<123127>::value;
 printf(
"%d\n",yes);
}

posted @ 2005-12-05 09:45 Windreamer Is Not DREAMER 阅读(342) | 评论 (1)编辑 收藏
仅列出标题