c plus plus11新特性

Veröffentlicht am 2016-12-21

1.新的关键字

auto:为了自动类型推导（在编译时对类型进行类型推导，不影响运行效率）
例：auto i=1;

decltype:可以从一个变量或表达式中得到类型
例：int x=3;
decltype(x) y=x;

nullptr:为解决原来c++中null的二义性问题而引进的一种新的类型，因为null实际代表0

void F(int a)
{
  cout<<a<<endl;
}
void F(int *p)
{
  assert(p!=NULL);
  cout<<p<<endl;
}
int main()
{
  int *p=nullptr;
  int *q=NULL;
  bool equal=(p==q);//equal的值为true,说明p和q都是空指针
  int a=nullptr;//编译失败，nullptr不能转型为int
  F(0);//在c++98中编译失败，有二义性；在c++11中调用F(int)
  F(nullptr);
  return 0;
}

2.更优雅的初始化方法

在引入c++11之前，只有数组能使用初始化列表，其他容器想要使用初始化列表，只能用以下方法：

int arr[3]={1,2,3};
vector<int> v(arr,arr+3);

在c++11中，我们可以使用以下语法来进行替换：

int arr[3]{1,2,3};
vector<int> iv{1,2,3};
map <int,string>{{1,"a"},{2,"b"}};
string str{"Hello World"};

3.可变模板参数

(1)
template<typename...T>
void fun(T...args)
{
  cout<<sizeof...(args)<<endl;
}
(2)
template<typename...Element> class tuple;
tuple<int,string> a;
(3)
template <typename T,unsigned PrimaryDimesion,unsign...Dimesions>
class array{/**/};
array<double,3,3> rotation_matrix;
(4)
template<typename...Elements> class tuple;//利用递归
template<typename Head,typename...Tail>
class tuple<Head,Tail...>:private tuple<Tail...>
{
    Head head;
  public:
    /*implementation*/
}
template<> class tuple<>
{
  /*zero-tuple implementation*/
};

4.模板

(1)通常有两种形式：函数模板和类模板；
函数模板针对仅参数类型不同的函数；
类模板针对仅数据成员和成员函数类型不同的类；
例：swap的模板函数形式为
template void swap(T &a,T &b)
例：类模板声明
template class A{public:T a;T b;T hy(T c,T &d);};
例：类模板外部定义成员函数的方法为
template void A::h(){}

(2)模板的特化
针对某个特定的类型，在定义的时候给出不同一般数据类型的逻辑实现。而在使用的时候，
这个特殊性完全被屏蔽，你仍然只需要按照模板来使用，但是编译器会根据你之前的设定，
给特别的数据类型以特定的代码逻辑。
例：
template class stack{};
template<> class stack{};//bool型比较节省空间
例：
template
T mymax(const T t1,const T t2)
{
return t1
const charmymax(const char t1,const char *t2)
{
return(strcmp(t1,t2)<0)?t2:t1;
}

(3)模板的偏特化

类模板的偏特化
例：c++标准库中的类vector的定义
template
class vector{/…/};
template
class vector{/…/};

函数模板的偏特化
严格的来说，函数模板不支持偏特化，但可以对函数进行重载，达到类似效果。
template void f(T); //(a)
对基模板(a)重载
template void f(*T);

最大熵马尔科夫模型

Veröffentlicht am 2016-12-14

In the new model,we redefine $\alpha_t(s)$ to be the probability of being in state s at time t
given the observation sequence up to time t.The recursive Viterbi step is then

$\alpha_{t+1}$

未完待续，敬请期待

各种优化方法总结比较

Veröffentlicht am 2016-12-13

1.SGD

SGD指stochastic gradient decent,即随机梯度下降，是梯度下降的batch版本。每次更新都利用
一个batch的数据，而非整个训练集。
$x_{t+1}=x_t+\Delta x_t$
$\Delta x_t=-\eta g_t$
$\mbox{其中，}\eta\mbox{为学习率，}g_t\mbox{为}x\mbox{在}t\mbox{时刻的梯度。}$

2.Momentum

SGD方法的一个缺点是，其更新方向完全依赖于当前的batch，因而其更新十分不稳定。解决这一问题
的一个简单的方法便是引入momentum。momentum即动量，它模拟的是物体运动时的惯性，即更新的时候
在一定程度上保留之前更新的方向，同时利用当前batch的梯度微调最终的更新方向。这样一来，可以在
一定程度上增加稳定性，从而学习得更快，并且还有一定摆脱局部最优的能力：
$k=x_{t-1}$
$\Delta x_t=\rho\Delta k-\eta g_t$
$\mbox{其中，}\rho\mbox{即momentum,表示要多大程度上保留原来的更新方向，这个值在0-1之间，}$
在训练开始时，由于梯度可能会很大，所以初始值一般选0.5；当梯度不那么大时，改为0.9。
$\eta\mbox{是学习率，即当前batch的梯度多大程度上影响最终更新方向，与普通SGD含义相同。}$

3.Nesterov Momentum

进行了一下修正，达到准确的梯度值
$k=x_{t-1}$

$t(x)=f(x_{t-1}+\rho\Delta k)$

$\Delta x_t=\rho\Delta k-\eta\Delta t(x)$

4.Adagrad

上面提到的方法对于所有参数都使用了同一更新速率，但是同一个更新速率不一定适合所有参数。
比如有的参数可能已经到了仅需要微调的阶段，但又有些参数由于对应样本少等原因，还需要较大
幅度的调动。Adagrad就是针对这一问题提出的，自适应地为各个参数分配不同学习率的算法。

$k=\sqrt{\sum_{r=1}^t g_r+\xi}$

$\Delta x_t=-\frac{\eta}{k}g_t$

$\mbox{其中}g_t\mbox{同样是当前的梯度，连加和开根号都是元素级别的运算。}$

$\eta\mbox{是初始学习率，由于之后会自动调整学习率，所以初始值就不像之前的算法那样重要了，}$

$\mbox{而}\xi\mbox{是一个比较小的数，用来保证分母非0。}$

$\mbox{其含义是，对于每个参数，随着其更新的总距离增多，其学习速率也随之变慢。}$

5.Adadelta

Adagrad算法存在三个问题
（1）其学习率是单调递减的，训练后期学习率非常小。
（2）其需要手工设置一个全局的初始学习率。
（3）更新$x_t$时，左右两边的单位不统一
Adadelta针对上述三个问题提出了比较漂亮的解决方案。
首先，针对第一个问题，我们可以只使用adagrad的分母中的累计项离当前时间点比较近的项，如下式：

$k=E[g^2]_{k-1}$

$E[g^2]_t=\rho k+(1-\rho)g_t^2$

$\Delta x_t=-\frac{\eta}{\sqrt{E[g^2]_t+\xi}} g_t$

这里$\rho$是衰减系数，通过这个衰减系数，我们令每一个时刻的$g_t$随着时间按照$\rho$指数衰减，
这样就相当于我们仅使用离当前时刻比较近的$g_t$信息，从而使得很长时间以后，参数仍然可以得到
更新。

6.Adam (Adaptive Moment Estimation)

Adam的思路和Adagrad相似，都使用梯度平方根归一化学习率。
维护一个一阶momentum，等价于梯度：

$a=m_{t-1}$

$m_t=\alpha \cdot a+(1-\alpha)\cdot \Delta Q(w)$

另一个二阶moentum，等价于梯度平方：

$b=v_{t-1}$

$v_t=\beta\cdot b+(1-\beta)\cdot\Delta Q(w)^2$

由于m,v都初始化为0，使用t次幂让其在头几次迭代中更大一些：

$\widehat{m_t}=\frac{m_t}{1-\alpha^t}$

$\widehat{v_t}=\frac{v_t}{1-\beta^t}$

使用梯度平方v归一化学习率，更新幅度为梯度m:

$w_{t+1}=w_t-\eta\frac{1}{\sqrt{\widehat{v_t}}}\widehat{m_t}$

effective c plus plus

Veröffentlicht am 2016-12-13

1.尽量以const,enum,inline替换#define

例如：
# define ASPECT_RATIO 1.653
const double AspectRatio=1.653
宏定义在编译前被预处理，编译器看不见

2.尽量使用const

在实际使用中,const对象大多用于passed by pointer-to-const或passed by 
reference-to-const的传递结果

const对象调用const方法,non-const对象调用non-const方法，这种效率太低，
需要non-const对象调用const方法(转型)

例：
class TextBlock{
    public:/*...*/
    const char & operator[](std::size_t position) const//一如既往
    {/*...*/ return text[position];}
    char & operator[](std::size_t position)//现在只调用const op[]
    {
        return const_cast<char &>(static_cast<const TextBlock &>(*this)[position]);
    }
}

3.确定对象被使用前已先被初始化

c++规定，对象的成员变量的初始化动作发生在进入构造函数本体之前，构造函数的一个
较佳写法是，使用所谓的member initialization list（成员初值列）替换赋值动作。
例：
ABEntry::ABEntry(const std::string &name,const std::string &address,
                 const std::list<PhoneNumber> & phones)
                 :theName(name),theAddress(address),thePhones(phones),numTimeConsulted(0)
{}//构造函数本体不必有任何动作，通常效率高

4.如果不想使用编译器提供的函数

为驳回编译器自动提供的机能，可将相应的成员函数声明为private并且不予实现。使用像Uncopyable
这样的base class 也是一种做法。
class Uncopyable{
    protected://允许derived对象构造和析构
        Uncopyable(){};
        ~Uncopyable(){};
    private:
        Uncopyable(const Uncopyable&);//阻止copy
        Uncopyable & operator=(const Uncopyable &));
};//只需继承这个类

5.为多态基类声明virtual析构函数

1.当derived class对象经由一个base class指针被删除，而该base class带着一个non-virtual析构函数，
其结果未有定义——实际上执行时通常发生的是对象的derived成分没被销毁。
2.如果class不含virtual函数，通常表示它并不意图被用做一个base class。如果class带有任何virtual
函数，它就应该拥有一个virtual析构函数。

6.析构函数绝对不要吐出异常

1.如果一个被析构函数调用的函数可能抛出异常，析构函数应该捕捉任何异常，然后吞下它们（不传播）
或结束程序。
2.如果客户需要对一个操作函数运行期间抛出的异常做出反应，那么class应该提供一个普通函数
（而非在析构函数中）执行该操作。

7.绝不在构造和析构过程中调用virtual函数

由于base class构造函数的执行更早于derived class构造函数，当base class构造函数执行时，
derived class的成员变量尚未初始化。

8.令operator=返回一个reference to *this

class Widget{
    public:
        Widget & operator=(const Widget & rhs){
            /*...*/
            return *this;
        }
};

9.在operator=中处理“自我赋值”

class Widget{
    /*...*/
    public:
        void swap(Widget &rhs);//交换*this和rhs是数据
};
Widget &Widget::operator=(const Widget &rhs)
{
    Widget temp(rhs);//为rhs数据制作一份副本
    swap(temp);//将*this数据和上述副本的数据交换
    return *this;
}

10.让接口容易被正确使用，不易被误用

比较安全的写法是预先定义所有有效的Months
class Month{
    public:
        static Month Jan(){return Month(1);}
        /*...*/
        static Month Dec(){return Month(12);}
    private:
        explicit Month(int m);//阻止生成新的月份
        /*...*/
};

Date d(Month::Mar(),Day(30),Year(1995));

11.宁以pass-by-reference-to-const替换pass-by-value

减少复制，提高效率

12.必须返回对象时，别妄想返回其reference

错误1：
const Rational & operator*(const Rational &lhs,const Rational &rhs)
{
    Rational result(lhs.n*rhs.n,lhs.d*rhs.d);
    return result;//引用的是已被销毁的local对象
}
错误2：
const Rational & operator*(const Rational &lhs,const Rational &rhs)
{
    Rational *result=new Rational(lhs.n*rhs.n,lhs.d*rhs.d);
    return *result; 
}//问题：必须付出一个“构造函数调用”代价，同时如何对new出来的对象实施delect?
正确写法：
inline const Rational operator*(const Rational &lhs,const Rational &rhs)
{
    return Rational(lhs.n*rhs.n,lhs.d*rhs.d);
}
绝不要返回pointer或reference指向一个local stack对象，或返回reference 指向
一个heap-allocated对象，或返回pointer或reference指向一个local static对象
而有可能同时需要多个这样的对象。

13.将成员变量声明为private,protected并不比public更具有封装性

14.宁以non-member，non-friend替换member函数

15.使用非成员函数（non-member）处理函数的所有参数都需要类型转换

因为成员函数的发起者必须是类，若出现double类型在前的情况，则无法执行乘法，
所以应使用非成员函数。
例：
class Rational{
    public:
        Rational(double numerator=0,double denominator=1)
                :m_n(numerator),m_d(denominator){}
        double numerator() const{return m_n;}
        double denominator() const{return m_d;}
        double value(){return (m_n/m_d);}
    private:
        double m_n;
        double m_d;
};
const Rational operator *(const Rational &lhs,const Rational &rhs)
{
    return Rational(lhs.numerator()*rhs.numerator(),lhs.denominator()*rhs.denominator());
}

16.inline函数

inline函数背后的整体观念是，将“对此函数的每一个调用”都以函数本体替换之。

17.绝对不要重新定义继承而来的non-virtual函数

18.如果派生类的函数与基类函数同名，但是参数不同。

此时，无论其有无virtual关键字，基类的函数将被隐藏。

19.如果派生类的函数与基类函数同名，而且参数也相同。

基类没有关键字，此时基类的函数被隐藏。

20.c++纯虚函数

virtual void function()=0;

21.new 和 malloc的区别

(1)new是c++中的操作符，malloc是c中的一个函数
(2)new不止是分配内存，而且会调用类的构造函数，同理delect会调用类的析构函数，
而malloc则只分配内存，不会进行初始化类成员的工作，同样free也不会调用析构函数
(3)new出来的指针是直接带类型信息的，而malloc返回的都是void指针

linux命令小结

Veröffentlicht am 2016-12-12

mkdir 创建目录
touch 创建空文件
cat 查看文件内容
find 在文件系统中搜索某文件
wc 统计文本中行数，字数，字符数
grep 在文本文件中查找字符串
pwd 显示当前目录
head 显示文件头部内容
tail 显示文件尾部内容
top 动态显示当前耗费资源最多进程信息
ps 显示瞬间进程状态
ifconfig 查看网络信息
man 查手册
chown 改变文件权限
gcc 把c语言的源程序文件编译成可执行文件

tensorflow学习总结

Veröffentlicht am 2016-12-12

embedding 都是随机初始化，然后在训练过程中不断更新embedding矩阵的值。
因为embedding matrix 是通过tf.Variables()产生，如果trainable flag设为true，就可以训练。
在LSTM中，forget_gate 加上一个forget_bias (默认是1)，可以防止开始训练时遗忘量过大。
BasicLSTM和LSTM的区别：BasicLSTM没有clipping,projection layer,peep-hole等部分。
Peep-hole把c的内容也加入到gate的生成中
Clipping:防止梯度爆炸，对梯度进行修剪
tf.nn.dynamic_rnn的input不是list of tensors，而是整个tensor.

EM算法

Veröffentlicht am 2016-10-29

EM算法的主要思想

E步就是估计隐含类别y的期望，M步调整其他参数使得在给定类别y的情况下，极大似然估计p(x,y)能够达到极大值。
然后在其他参数确定的情况下，重新估计y，周而复始，直至收敛。
举个EM算法中比较常见的例子，有100个男同学和100个女同学，这200个同学的身高信息已知，但是性别信息丢失。
让我们分别求男同学和女同学的平均身高和方差。这个问题，我们先假设100个男同学的平均身高为175cm，方差为10；
100个女同学的平均身高为160cm，方差为10（假设肯定是不对的）；在这个假设下，默认身高分布服从高斯分布，求出
令这个均值和方差极大释然的高斯分布。然后求出此时高斯分布的均值和方差，不断循环，直至均值和方差不在发生变化。

EM算法的数学推导

$ \mbox{令 }L(x)=\prod_{i=1}^Np(x_i;\theta),\theta=[\mu,\sigma]\mbox{表示高斯分布参数}$

$ \mbox{此时的极大似然估计：}H(\theta)=\ln L(\theta)=\sum_{i=1}^N \ln p(x_i;\theta)$

$ \mbox{我们可以将采样得到的数据表示成一个二元组:}(x_i,z_i),\mbox{其中}x_i\mbox{表示第i个人的身高，}z_i$

$ \mbox{表示这个被采样的人是男还是女,男用1表示，女用0表示，根据联合分布概率的概率和，我们有}$

$p(x_i;\theta)=\sum p(x_i,z_i;\theta)$

$H(\theta)=\sum_{i=1}^N\ln p(x_i;\theta)$

$H(\theta)=\sum_{i=1}^N\ln \sum p(x_i,z_i;\theta)$

$ 令Q_i表示变量Z_i的某种分布，且Q_i满足$

$\sum Q_i(Z_i)=1$

$Q_i(Z_i)\geq 0$

$可以将等式改写为：$

$H(\theta)=\sum_{i=1}^N\ln\sum Q_i(Z_i)\cdot \frac {p(x_i,z_i;\theta)}{Q_i(Z_i)} $

$H(\theta)\geq \sum_{i=1}^N \sum Q_i(Z_i)\ln \frac {p(x_i,z_i;\theta)}{Q_i(Z_i)}$

$等号成立条件:\frac {p(x_i,z_i;\theta)}{Q_i(Z_i)}=C$

$所以：p(x_i;\theta)=C$

$Q_i(Z_i)=\frac {p(x_i,z_i;\theta)}{p(x_i;\theta)}$

$Q_i(Z_i)=P(z_i|x_i;\theta)$

$E步:对于每一个i，计算Q_i(z_i)$

$M步：计算\theta=arg max \sum_{i=1}^N \sum Q_i(z_i) \ln \frac{p(x_i,z_i;\theta)}{Q_i(z_i)}$

Spark优化总结

Veröffentlicht am 2016-10-29

1.groupByKey

1
2
3

Avoid groupByKey when performing an associative reductive operation.
例如将rdd.groupByKey().mapValues(_.sum)改为rdd.reduceByKey()
尽量少使用，因为相当于少个combiner

2.reduceByKey

Avoid reduceByKey when the input and output value types are different.
例如将rdd.map(kv=>(kv._1,new Set[String]()+kv._2)).reduceByKey(_+_)转变为
Val zero=new collection.mutable.Set[String]()
Rdd.aggregateByKey(zero)((set,v)=>set+=v,(set1,set2)=>set1++=set2)

3.flatMap-join-groupBy pattern

1
2
3

Avoid the flatMap-join-groupBy pattern.
when two datasets are already grouped by key and you want to join them and keep them grouped,
you can just use cogroup.This avoids all the overhead associated with unpacking and repacking the groups.

4.suffles

When more suffles are better:
This trick is especially useful when the aggregation is already grouped by a key.
For example,consider an app that wants to count the occurences of each word 
in a corpus and pull the results into the driver as a map.
One approach,which can be accomolished with the aggregate action,is to compute a local map 
at each partition  and then merge the maps at the driver.
The alternative approach ,which can be accomplished with aggregateByKey ,
is to perform the count in a fully distributed way ,and them simply collectAsMap the results to driver.

5.repartitionAndSortWithInPartitions

1 2	This is more efficient than calling repartition and then sorting within each partition because it can push the sorting down into the shuffle machinery.

6.flatmap

flatmap函数是两个操作的集合——先映射后扁平化。
操作1：同map函数一样，对每一条输入进行指定操作，然后为每一条输入返回一个对象。
操作2：最后为每一条输入返回一个对象。

flatmap会将字符串看成是一个字符串数组。

7.foldByKey

在使用flodByKey算子时，需特别注意映射函数及zeroValue的取值

8.reduceByKeyLocally

该函数将RDD[k,v]中每个k对应的v值，根据映射函数来运算，运算结果映射到一个Map[k,v]中，而不是RDD[k,v]

9.foreach

如果对RDD执行foreach，只会在Executor端有效，而不是Driver端

10.persist()

If we also wanted to use lineLengths again later,we could add :lineLengths.persist() before the reduce.

11.checkpoint

RDD需要加检查点的原因：
1.DAG中Lineage过长，如果重算，则开销太大（如在PageRank中）
2.在shuffle dependency上做checkpoint 获得的收益更大

12.HashPartitioner和RangePartitioner

使用HashPartitioner和RangePartitioner来减少网络通信开销

Hello World

Veröffentlicht am 2016-10-29

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Quick Start

Create a new post

1	$ hexo new "My New Post"

More info: Writing

Run server

1	$ hexo server

More info: Server

Generate static files

1	$ hexo generate

More info: Generating

Deploy to remote sites

1	$ hexo deploy

More info: Deployment