2016 年 5 月 – 每天进步一点点

Channel types

channel 类型通过发送和接受特定类型的值，为并行执行函数提供一种通信机制。 channel是先进先出类型的管道，默认情况下，在另一端准备好之前，发送和接收都会阻塞。这使得 goroutine 可以在没有明确的锁或竞态变量的情况下进行同步。

ChannelType = ( "chan" | "chan" "<-" | "<-" "chan" )

箭头是管道的方向，发送或接收，如果没有指定方向，管道是双向的。一个channel可能受限于只能发送或接收。

chan T          // can be used to send and receive values of type T
chan<- float64  // can only be used to send float64s
<-chan int      // can only be used to receive ints

The <- operator associates with the leftmost chan possible:

chan<- chan int    // same as chan<- (chan int)
chan<- <-chan int  // same as chan<- (<-chan int)
<-chan <-chan int  // same as <-chan (<-chan int)
chan (<-chan int)

channel可以使用内置的make函数创建并初始化，make参数指定类型和可选的容量。未初始化的channel值为nil。

make(chan int, 100)

容量，指定channel的buffer大小，元素个数。如果容量未指定或者为0，channel是没有buffer的，只有发送和接收都准备好的时候，通信成功。否则，channel是可缓冲的，通信不阻塞，只要空间不为空（接受者），或者空间不满（发送者）。但是值为nil的channel会一直阻塞。

channel可以使用内置函数close关闭。接受者可以使用多值赋值的方法判断channel是否关闭。当channel关闭时，ok值为false。

x, ok = <-ch
x, ok := <-ch
var x, ok = <-ch

注意

给一个 nil channel 发送数据，造成永远阻塞
从一个 nil channel 接收数据，造成永远阻塞
给一个已经关闭的 channel 发送数据，引起 panic
从一个已经关闭的 channel 接收数据，立即返回一个零值

给一个 nil channel 发送数据，造成永远阻塞

这第一个例子对于新来者是有点小惊奇的，它给一个 nil channel 发送数据，造成永远阻塞。

以下这个程序将在第5行造成死锁，因为未初始化的 channel 是 nil 的，其值是零

package main

func main() {
        var c chan string
        c <- "let's get started" // deadlock
}

从一个 nil channel 接收数据，造成永远阻塞

类似的，从一个 nil channel 接收数据，会造成接受者永远阻塞。

package main

import "fmt"

func main() {
        var c chan string
        fmt.Println(<-c) // deadlock
}

为什么会发生这样的情况？下面是一个可能的解释

channel 的 buffer 的大小不是类型声明的一部分，因此它必须是 channel 的值的一部分
如果 channel 未被初始化，它的 buffer 的大小将是0
如果 channel 的 buffer 大小是0，那么它将没有 buffer
如果 channel 没有 buffer，一个发送将会被阻塞，直到另外一个 goroutine 为接收做好了准备
如果 channel 是 nil 的，并且接收者和发送者没有任何交互，他们都会阻塞然后在各自的 channel 中等待以及不再被解除阻塞状态

给一个已经关闭的 channel 发送数据，引起 panic

以下程序将有可能 panic，因为在它的兄弟姐妹有时间完成发送他们的值之前，这第一个 goroutine 在达到10的时候将关闭 channel。

package main
import "fmt"
func main() {
        var c = make(chan int, 100)
        for i := 0; i < 10; i++ {
                go func() {
                        for j := 0; j < 10; j++ {
                                c <- j
                        }
                        close(c)
                }()
        }
        for i := range c {
                fmt.Println(i)
        }
}

因此为什么没有一个 close() 版本能让你检测 channel 是否关闭？

if !isClosed(c) {
        // c isn't closed, send the value
        c <- v
}

但是这个函数有一个内在的竞争，某个人可能在我们检查完 isClosed(c) 之后，但是代码获取 c <- v 之前关闭这个 channel。处理这个问题的方法参考Go Concurrency Patterns: Pipelines and cancellation 。

从一个已经关闭的 channel 接收数据，立即返回一个零值

这最后一个示例与前一个是相反的，一旦一个 channel 被关闭，它的所有的值都会从 buffer 中流失，channel 将立即返回0值。

package main

import "fmt"

func main() {
            c := make(chan int, 3)
            c <- 1
            c <- 2
            c <- 3
            close(c)
            for i := 0; i < 4; i++ {
                        fmt.Printf("%d ", <-c) // prints 1 2 3 0
            }
}

针对这个问题的正确的解决办法是使用 range 循环处理：

for v := range c {
            // do something with v
}

for v, ok := <- c; ok ; v, ok = <- c {
            // do something with v
}

这两个语句在函数中是相等的，展示 range 是做什么。

参考文献：

[1]. Go 指南

[2]. Go – Channel 原理

[3]. The Go Programming Language Specification

一致性哈希算法

在分布式缓存和分布式存储系统上经常用到一致性哈希算法，今天特地来学习一下。

一致哈希 是一种特殊的哈希算法。在使用一致哈希算法后，哈希表槽位数（大小）的改变平均只需要对 $K/n$ 个关键字重新映射，其中 $K$ 是关键字的数量， $n$ 是槽位数量。然而在传统的哈希表中，添加或删除一个槽位的几乎需要对所有关键字进行重新映射。

历史

一致哈希由MIT的Karger及其合作者提出，现在这一思想已经扩展到其它领域。在这篇1997年发表的学术论文中介绍了“一致哈希”如何应用于用户易变的分布式Web服务中[1]。哈希表中的每一个代表分布式系统中一个节点，在系统添加或删除节点只需要移动 $K/n$ 项。

一致哈希也可用于实现健壮缓存来减少大型Web应用中系统部分失效带来的负面影响。

一致哈希的概念还被应用于分布式散列表（DHT）的设计。DHT使用一致哈希来划分分布式系统的节点。所有关键字都可以通过一个连接所有节点的覆盖网络高效地定位到某个节点。

需求

在使用n台缓存服务器时，一种常用的负载均衡方式是，对资源o的请求使用 $\mbox{hash}(o) = o \mod n$ 来映射到某一台缓存服务器。当增加或减少一台缓存服务器时这种方式可能会改变所有资源对应的hash值，也就是所有的缓存都失效了，这会使得缓存服务器大量集中地向原始内容服务器更新缓存。因些需要一致哈希算法来避免这样的问题。一致哈希尽可能使同一个资源映射到同一台缓存服务器。这种方式要求增加一台缓存服务器时，新的服务器尽量分担存储其他所有服务器的缓存资源。减少一台缓存服务器时，其他所有服务器也可以尽量分担存储它的缓存资源。一致哈希算法的主要思想是将每个缓存服务器与一个或多个哈希值域区间关联起来，其中区间边界通过计算缓存服务器对应的哈希值来决定。（定义区间的哈希函数不一定和计算缓存服务器哈希值的函数相同，但是两个函数的返回值的范围需要匹配。）如果一个缓存服务器被移除，则它会从对应的区间会被并入到邻近的区间，其他的缓存服务器不需要任何改变。

实现

一致哈希将每个对象映射到圆环边上的一个点，系统再将可用的节点机器映射到圆环的不同位置。查找某个对象对应的机器时，需要用一致哈希算法计算得到对象对应圆环边上位置，沿着圆环边上查找直到遇到某个节点机器，这台机器即为对象应该保存的位置。当删除一台节点机器时，这台机器上保存的所有对象都要移动到下一台机器。添加一台机器到圆环边上某个点时，这个点的下一台机器需要将这个节点前对应的对象移动到新机器上。更改对象在节点机器上的分布可以通过调整节点机器的位置来实现。

环形Hash空间

按照常用的hash算法来将对应的key哈希到一个具有2^32次方个桶的空间中，即0~(2^32)-1的数字空间中。现在我们可以将这些数字头尾相连，想象成一个闭合的环形。如下图

把数据通过一定的hash算法处理后映射到环上

现在我们将object1、object2、object3、object4四个对象通过特定的Hash函数计算出对应的key值，然后散列到Hash环上。如下图：

Hash(object1) = key1；

Hash(object2) = key2；

Hash(object3) = key3；

Hash(object4) = key4；

将机器通过hash算法映射到环上

在采用一致性哈希算法的分布式集群中将新的机器加入，其原理是通过使用与对象存储一样的Hash算法将机器也映射到环中（一般情况下对机器的hash计算是采用机器的IP或者机器唯一的别名作为输入值），然后以顺时针的方向计算，将所有对象存储到离自己最近的机器中。

假设现在有NODE1，NODE2，NODE3三台机器，通过Hash算法得到对应的KEY值，映射到环中，其示意图如下：

Hash(NODE1) = KEY1;

Hash(NODE2) = KEY2;

Hash(NODE3) = KEY3;

通过上图可以看出对象与机器处于同一哈希空间中，这样按顺时针转动object1存储到了NODE1中，object3存储到了NODE2中，object2、object4存储到了NODE3中。在这样的部署环境中，hash环是不会变更的，因此，通过算出对象的hash值就能快速的定位到对应的机器中，这样就能找到对象真正的存储位置了。

机器的删除与添加

普通hash求余算法最为不妥的地方就是在有机器的添加或者删除之后会照成大量的对象存储位置失效，这样就大大的不满足单调性了。下面来分析一下一致性哈希算法是如何处理的。

1. 节点（机器）的删除

以上面的分布为例，如果NODE2出现故障被删除了，那么按照顺时针迁移的方法，object3将会被迁移到NODE3中，这样仅仅是object3的映射位置发生了变化，其它的对象没有任何的改动。如下图：

2. 节点（机器）的添加

如果往集群中添加一个新的节点NODE4，通过对应的哈希算法得到KEY4，并映射到环中，如下图：

通过按顺时针迁移的规则，那么object2被迁移到了NODE4中，其它对象还保持这原有的存储位置。通过对节点的添加和删除的分析，一致性哈希算法在保持了单调性的同时，还是数据的迁移达到了最小，这样的算法对分布式集群来说是非常合适的，避免了大量数据迁移，减小了服务器的的压力。

template <class Node, class Data, class Hash = HASH_NAMESPACE::hash<const char*> >
class HashRing
{
public:
    typedef std::map<size_t, Node> NodeMap;

    HashRing(unsigned int replicas)
        : replicas_(replicas), hash_(HASH_NAMESPACE::hash<const char*>())
    {
    }

    HashRing(unsigned int replicas, const Hash& hash)
        : replicas_(replicas), hash_(hash)
    {
    }

    size_t AddNode(const Node& node);
    void RemoveNode(const Node& node);
    const Node& GetNode(const Data& data) const;

private:
    NodeMap ring_;
    const unsigned int replicas_;
    Hash hash_;
};

template <class Node, class Data, class Hash>
size_t HashRing<Node, Data, Hash>::AddNode(const Node& node)
{
    size_t hash;
    std::string nodestr = Stringify(node);
    for (unsigned int r = 0; r < replicas_; r++) {
        hash = hash_((nodestr + Stringify(r)).c_str());
        ring_[hash] = node;
    }
    return hash;
}

template <class Node, class Data, class Hash>
void HashRing<Node, Data, Hash>::RemoveNode(const Node& node)
{
    std::string nodestr = Stringify(node);
    for (unsigned int r = 0; r < replicas_; r++) {
        size_t hash = hash_((nodestr + Stringify(r)).c_str());
        ring_.erase(hash);
    }
}

template <class Node, class Data, class Hash>
const Node& HashRing<Node, Data, Hash>::GetNode(const Data& data) const
{
    if (ring_.empty()) {
        throw EmptyRingException();
    }
    size_t hash = hash_(Stringify(data).c_str());
    typename NodeMap::const_iterator it;
    // Look for the first node >= hash
    it = ring_.lower_bound(hash);
    if (it == ring_.end()) {
        // Wrapped around; get the first node
        it = ring_.begin();
    }
    return it->second;
}

特性

David Karger及其合作者列出了使得一致哈希在互联网分布式缓存中非常有用的几个特性：

冗余少
负载均衡
过渡平滑
存储均衡
关键词单调

问题

上边描述的一致性Hash算法有个潜在的问题是:
将节点hash后会不均匀地分布在环上，这样大量key在寻找节点时，会存在key命中各个节点的概率差别较大，无法实现有效的负载均衡。可以使用虚拟节点的思想，为每个物理节点（服务器）在圆上分配100～200个点。这样就能抑制分布不均匀，最大限度地减小服务器增减时的缓存重新分布。用户数据映射在虚拟节点上，就表示用户数据真正存储位置是在该虚拟节点代表的实际物理服务器上。

参考文献

Saving a read-only file edited in vi / vim

We’ve all done it…opened a file using vi or vim to inspect the contents, and realize we need to alter it. Of course we totally ignored the message informing we didn’t have permission to edit, so we’re only allowed to view it as “read-only”. Then after we find the troublesome spot we hit “i” and happily edit the place needing changed, only to “face-palm” when we realize we cannot save the wonderful edit we just made.

In the past I handled this one of three ways: I either copied and pasted the change after reopening the file using sudo, or I reopened and retyped everything once again, or I save the file as a temp file and then rename it using sudo. Very stupid, stressful and time consuming.

However, now I know a better way. Using a combination of ‘tee’ and ‘sudo’ commands I can now save the read-only file rather than jumping through the hoops in the previous paragraph. Here is how:

Open a file as normal, forgetting to use “sudo”, and therefore viewing a read-only file.

Then mistakenly try to edit the read-only file in the traditional manner.

But when we try to save using ‘:w!’, SHIFT+ZZ, or :qw!, or whatever combination we normally fail with as an alternative. Here is sample output of what we see:

At this point is where the new magic can happen. Instead of the normal “face-palm” we do not “ENTER” and move on. We can enter a new command and successfully save the file after entering the sudo password:

At this point we will be presented with the content of the file and a prompt to press ENTER or type another command. To simply save the file and move on we just press ENTER, and then press the letter “O” (oh). (NOTE: “L” seems to do pretty much the same thing.) The file will be saved but remains open in vi/vim for more editing or reading. We can now exit normally by typing “:q!” since the file is still open as “read-only”.

What the command does:

:w = Write a file.
!sudo = Call shell sudo command.
tee = The output of the vi/vim write command is redirected using tee.
% = Triggers the use of the current filename.
Simply put, the ‘tee’ command is run as sudo and follows the vi/vim command on the current filename given.

[转载]http://www.geekyboy.com/archives/629

月度归档： 2016 年 5 月

Go channel