哈夫曼树解压与压缩

哈夫曼树的压缩与解压

1.算法简要描述

1.哈夫曼算法

1.哈弗曼算法是根据给定的n个权值{w1，w2，w3.......wn}，构造由n棵

二叉树构成的深林F={T1,T2,。。。。Tn}，其中每个二叉树Ti分别都是只

含有一个权值wi的根结点，其左右子树为空（i=1，，，，，,2）。

2.在深林F中选取其根结点的权值最小的两棵二叉树，分别作其左右子树

构造一颗新的二叉树，并置这棵新的二叉树根结点的权值为其左右子树

的根结点之和。

3.从F中删去这两棵二叉树，同时刚新生成的二叉树加入到深林F中。

4.重复2,3,步骤，直至深林F中只含有一颗二叉树为止。

2.哈夫曼树的实现

函数String EnCode(Char Type ch)：表示哈夫曼树已存在，返回字符ch的编码。

函数LinkListUnCode(String strCode)：表示对哈夫曼树进行译码，返回编码前的字符序列。根据算法可以看出，在具有n个结点权值的哈夫曼树的构造过程中，每次都是从F中删去两棵树，增加一棵树，即每次结束后减少一棵树，经过n-1次处理后，F中就只剩下一棵树了。另外，每次合并都要产生一个新的结点，合并n-1次后共产生了n-1个新结点，并且这n-1个新节点都是具有左右子树的分支结点。则最终得到的哈夫曼树中共有2n-1个结点，并且其中没有度为1的分支结点，最后一次产生的新结点就是哈夫曼树的根结点。

源代码中创建了一个哈夫曼树结点类，其中有数据成员weight，parent，leftChild，rightChild分别代表了权值，双亲，左孩子，右孩子。

在哈夫曼树类中有数据成员*nodes，*LeafChars，*LeafCharCodes，curPos，num，分别用来存储结点信息，叶结点字符信息，叶结点字符编码信息，译码时从根结点到叶结点路径的当前结点，叶结点个数。哈夫曼树类中含有多个函数，有构造函数，析构函数等。由函数HuffmanTree(CharType ch[],WeightType w[],int n)来构造由字符，权值，和字符个数构造哈夫曼树，在根据哈夫曼算法很容易实现哈夫曼类的函数以及构造函数。在在算法中，求叶结点字符的编码时，需要从叶结点出发走一条从高叶结点到根结点的路径，而编码却是从根结点出发到叶结点的路径，由左分支为编码0，右分支为编码1，得到的编码，因此从叶结点出发到根结点的路径得到的编码是实际编码的逆序，并且编码长度不确定，又由于可以再线性链表中构造串，因此将编码的信息储存在一个线性立案标准，每得到一位编码都将其插入在线性链表的最前面。

在求某个字符的编码是由函数EnCode(CharType ch)来求，返回字符编码。在进行译码时，用一个线性链表存储字符序列，由函数Decode(String strCode)来求，对编码串strCode进行译码，返回编码前的字符序列。函数Compress()用哈夫曼编码压缩文件。函数Decompress()解压缩用哈夫曼编码压缩的文件。

在主函数中有两个选项，一个是选择编码压缩，一个是解压。在函数中使用了文件输入输出流，我们可以选择要压缩的文件名输入，在选出压缩文件保存的地方和文件类型，将压缩所得到的文件存储在另一个文件中，解压也是如此。

2.源代码

#ifndef __HUFFMAN_TREE_NODE_H__

#define __HUFFMAN_TREE_NODE_H__

// 哈夫曼树结点类模板

template

struct HuffmanTreeNode

{

WeightType weight; // 权数据域

unsigned int parent, leftChild, rightChild; // 双亲,左右孩子域

HuffmanTreeNode(); // 无参数的构造函数模板

HuffmanTreeNode(WeightType w, int p = 0, int lChild = 0, int rChild = 0);

// 已知权,双亲及左右孩子构造结构

};

// 哈夫曼树结点类模板的实现部分

template

HuffmanTreeNode::HuffmanTreeNode()// 操作结果：构造空结点

{

parent = leftChild = rightChild = 0;

}

template

HuffmanTreeNode::HuffmanTreeNode(WeightType w, int p, int lChild, int rChild) // 操作结果：构造一个权为w,双亲为p,左孩子为lChild,右孩子为rChild的结点

{

weight = w; // 权

parent = p; // 双亲

leftChild = lChild; // 左孩子

rightChild = rChild; // 右孩子

}

#endif

#ifndef __HUFFMAN_TREE_H__

#define __HUFFMAN_TREE_H__

#include"string.h"// 串类

#include"huffman_tree_node.h"// 哈夫曼树结点类模板

// 哈夫曼树类模板

template

class HuffmanTree

{

protected:

HuffmanTreeNode *nodes; // 存储结点信息,nodes[0]未用

CharType *LeafChars; // 叶结点字符信息,LeafChars[0]未用

String *LeafCharCodes; // 叶结点字符编码信息,LeafCharCodes[0]未用

int curPos; // 译码时从根结点到叶结点路径的当前结点int num; // 叶结点个数

// 辅助函数模板:

void Select(int cur, int &r1, int &r2); // nodes[1 ~ cur]中选择双亲为,权值最小的两个结点r1,r2

void CreatHuffmanTree(CharType ch[], WeightType w[], int n);

// 由字符,权值和字符个数构造哈夫曼树

public:

// 哈夫曼树方法声明及重载编译系统默认方法声明:

HuffmanTree(CharType ch[], WeightType w[], int n); // 由字符,权值和字符个数构造哈夫曼树virtual ~HuffmanTree(); // 析构函数模板

String Encode(CharType ch); // 编码

LinkList Decode(String strCode); // 译码

HuffmanTree(const HuffmanTree ©); // 复制构造函数模板HuffmanTree &operator=(const HuffmanTree& copy); // 重载赋值运算符

};

// 孩子兄弟表示哈夫曼树类模板的实现部分

template

void HuffmanTree::Select(int cur, int &r1, int &r2)

// 操作结果：nodes[1 ~ cur]中选择双亲为,权值最小的两个结点r1,r2

{

r1 = r2 = 0; // 0表示空结点

for (int pos = 1; pos <= cur; pos++)

{ // 查找树值最小的两个结点

if (nodes[pos].parent != 0) continue; // 只处理双亲不为的结点

if (r1 == 0)

{

r1 = pos; // r1为空,将pos赋值给r1

}

else if (r2 == 0)

{

r2 = pos; // r2为空,将pos赋值给r2

}

else if(nodes[pos].weight < nodes[r1].weight)

{

r1 = pos; // nodes[pos]权值比nodes[r1]更小,将pos赋为r1

}

else if (nodes[pos].weight < nodes[r2].weight)

{

r2 = pos; // nodes[pos]权值比nodes[r2]更小,将pos赋为r2

}

void HuffmanTree::CreatHuffmanTree(CharType ch[], WeightType w[], int n)

// 操作结果：由字符,权值和字符个数构造哈夫曼树

{

num = n; // 叶结点个数

int m = 2 * n - 1; // 结点个数

nodes = new HuffmanTreeNode[m + 1]; // nodes[0]未用

LeafChars = new CharType[n + 1]; // LeafChars[0]未用

LeafCharCodes = new String[n + 1]; // LeafCharCodes[0]未用

int pos; // 临时变量

for (pos = 1; pos <= n; pos++)

{ // 存储叶结点信息

nodes[pos].weight = w[pos - 1]; // 权值

LeafChars[pos] = ch[pos - 1]; // 字符

}

for (pos = n + 1; pos <= m; pos++)

{ // 建立哈夫曼树

int r1, r2;

Select(pos - 1, r1, r2); // nodes[1 ~ pos - 1]中选择双亲为,权值最小的两个结点r1,r2

// 合并以r1,r2为根的树

nodes[r1].parent = nodes[r2].parent = pos; // r1,r2双亲为pos

nodes[pos].leftChild = r1; // r1为pos的左孩子

nodes[pos].rightChild = r2; // r2为pos的右孩子

nodes[pos].weight = nodes[r1].weight + nodes[r2].weight;//pos的权为r1,r2的权值之和}

for (pos = 1; pos <= n; pos++)

{ // 求n个叶结点字符的编码

LinkList charCode; // 暂存叶结点字符编码信息

for (unsigned int child = pos, parent = nodes[child].parent; parent != 0;

child = parent, parent = nodes[child].parent)

{ // 从叶结点到根结点逆向求编码

if (nodes[parent].leftChild == child) charCode.Insert(1, '0');// 左分支编码为'0'

else charCode.Insert(1, '1'); // 右分支编码为'1'

}

LeafCharCodes[pos] = charCode; // charCode中存储字符编码

}

curPos = m; // 译码时从根结点开始,m为根

}

HuffmanTree::HuffmanTree(CharType ch[], WeightType w[], int n) // 操作结果：由字符,权值和字符个数构造哈夫曼树

{

CreatHuffmanTree(ch, w, n); // 由字符,权值和字符个数构造哈夫曼树

}

template

HuffmanTree::~HuffmanTree()

// 操作结果：销毁哈夫曼树

{

if (nodes != NULL) delete []nodes; // 释放结点信息

if (LeafChars != NULL) delete []LeafChars; // 释放叶结点字符信息

if (LeafCharCodes != NULL) delete []LeafCharCodes; // 释放叶结点字符编码信息

}

template

String HuffmanTree::Encode(CharType ch)

// 操作结果：返回字符编码

{

for (int pos = 1; pos <= num; pos++)

{ // 查找字符的位置

if (LeafChars[pos] == ch) return LeafCharCodes[pos];// 找到字符,得到编码}

throw Error("非法字符,无法编码!"); // 抛出异常

}

template

LinkList HuffmanTree::Decode(String strCode)

// 操作结果：对编码串strCode进行译码,返回编码前的字符序列

{

LinkList charList; // 编码前的字符序列

for (int pos = 0; pos < strCode.Length(); pos++)

{ // 处理每位编码

if (strCode[pos] == '0') curPos = nodes[curPos].leftChild; // '0'表示左分支

else curPos = nodes[curPos].rightChild; // '1'表示左分支

if (nodes[curPos].leftChild == 0 && nodes[curPos].rightChild == 0)

{ // 译码时从根结点到叶结点路径的当前结点为叶结点

charList.Insert(charList.Length() + 1, LeafChars[curPos]);

curPos = 2 * num - 1; // curPos回归根结点}

}

return charList; // 返回编码前的字符序列

}

template

HuffmanTree::HuffmanTree(const HuffmanTree ©)

// 操作结果：由哈夫曼树copy构造新哈夫曼树——复制构造函数模板

{

num = copy.num; // 叶结点个数

curPos = copy.curPos; // 译码时从根结点到叶结点路径的当前结点

int m = 2 * num - 1; // 结点总数

nodes = new HuffmanTreeNode[m + 1]; // nodes[0]未用

LeafChars = new CharType[num + 1]; // LeafChars[0]未用

LeafCharCodes = new String[num + 1]; // LeafCharCodes[0]未用

int pos; // 临时变量

for (pos = 1; pos <= m; pos++)

{ // 复制结点信息

nodes[pos] = copy.nodes[pos]; // 结点信息

}

for (pos = 1; pos <= num; pos++)

{ // 复制叶结点字符信息与叶结点字符编码信息

LeafChars[pos] = copy.LeafChars[pos]; // 叶结点字符信息

LeafCharCodes[pos] = copy.LeafCharCodes[pos];// 叶结点字符编码信息}

}

template

HuffmanTree &HuffmanTree

WeightType>::operator=(const HuffmanTree& copy)

// 操作结果：将哈夫曼树copy赋值给当前哈夫曼树——重载赋值运算符

{

if (© != this)

{

if (nodes != NULL) delete []nodes; // 释放结点信息

if (LeafChars != NULL) delete []LeafChars; // 释放叶结点字符信息

if (LeafCharCodes != NULL) delete []LeafCharCodes; // 释放叶结点字符编码信息

num = copy.num; // 叶结点个数

curPos = copy.curPos; // 译码时从根结点到叶结点路径的当前结点

int m = 2 * num - 1; // 结点总数

nodes = new HuffmanTreeNode[m + 1]; // nodes[0]未用

LeafChars = new CharType[num + 1]; // LeafChars[0]未用

LeafCharCodes = new String[num + 1]; // LeafCharCodes[0]未用

int pos; // 临时变量

for (pos = 1; pos <= m; pos++)

{ // 复制结点信息

nodes[pos] = copy.nodes[pos]; // 结点信息

}

for (pos = 1; pos <= num; pos++)

{ // 复制叶结点字符信息与叶结点字符编码信息

LeafChars[pos] = copy.LeafChars[pos]; // 叶结点字符信息

LeafCharCodes[pos] = copy.LeafCharCodes[pos];// 叶结点字符编码信息}

}

return *this;

}

#endif

#ifndef __HUFFMAN_COMPRESS_H__

#define __HUFFMAN_COMPRESS_H__

#include"huffman_tree.h"// 哈夫曼树类

struct BufferType// 字符缓存器

{

char ch; // 字符

unsigned int bits; // 实际比特数

};

class HuffmanCompress// 哈夫曼压缩类

{

protected:

HuffmanTree *pHuffmanTree;

FILE *infp,*outfp; // 输入/出文件

BufferType buf; // 字符缓存

void Write(unsigned int bit); // 向目标文件中写入一个比特

void WriteToOutfp(); // 强行将字符缓存写入目标文件public:

HuffmanCompress(); // 无参数的构造函数

~HuffmanCompress(); // 析构函数

void Compress(); // 压缩算法

void Decompress(); // 解压缩算法

HuffmanCompress(const HuffmanCompress ©); // 复制构造函数

HuffmanCompress &operator=(const HuffmanCompress& copy);// 赋值运算符

};

HuffmanCompress::HuffmanCompress()

{

pHuffmanTree = NULL; // 空哈夫曼树

}

HuffmanCompress::~HuffmanCompress()

{

if (pHuffmanTree != NULL)

delete []pHuffmanTree; // 释放空间

}

void HuffmanCompress::Write(unsigned int bit) // 操作结果：向目标文件中写入一个比特

{

buf.bits++; // 缓存比特数自增

buf.ch = (buf.ch << 1) | bit; // 将bit加入到缓存字符中

if (buf.bits == 8)

{ // 缓存区已满,写入目标文件

fputc(buf.ch, outfp); // 写入目标文件

buf.bits = 0; // 初始化bits

buf.ch = 0; // 初始化ch

}

void HuffmanCompress::WriteToOutfp() // 操作结果：强行将字符缓存写入目标文件{

unsigned int len = buf.bits; // 缓存实际比特数

if (len > 0)

{ // 缓存非空, 将缓存的比特个数增加到, 自动写入目标文件

for (unsigned int i = 0; i < 8 - len; i++) Write(0);

}

void HuffmanCompress::Compress()// 操作结果：用哈夫曼编码压缩文件

{

char infName[256], outfName[256]; // 输入(源)/出(目标)文件名

cout << "请输入源文件名(文件小于GB):"; // 被压缩文件小于GB

cin >> infName; // 输入源文件名

if ((infp = fopen(infName, "r+b")) == NULL)

throw Error("打开源文件失败!"); // 抛出异常fgetc(infp); // 取出源文件第一个字符if (feof(infp))

throw Error("空源文件!"); // 抛出异常

cout << "请输入目标文件:";

cin >> outfName;

if ((outfp = fopen(outfName, "w+b")) == NULL)

throw Error("打开目标文件失败!"); // 抛出异常

cout << "正在处理，请稍候..." << endl;

const unsigned long n = 256; // 字符个数

char ch[n]; // 字符数组

unsigned long w[n]; // 字符出现频度(权) unsigned long i, size = 0;

char cha;

for (i = 0; i < n; i++)

{ // 初始化ch[]和w[]

ch[i] = (char)i; // 初始化ch[i]

w[i] = 0; // 初始化w[i]

}

rewind(infp); // 使源文件指针指向文件开始处

cha = fgetc(infp); // 取出源文件第一个字符

while (!feof(infp))

{ // 统计字符出现频度

w[(unsigned char)cha]++; // 字符cha出现频度自加

size++; // 文件大小自加

cha=fgetc(infp); // 取出源文件下一个字符

}

if (pHuffmanTree != NULL) delete []pHuffmanTree; // 释放空间

pHuffmanTree = new HuffmanTree(ch, w, n); // 生成哈夫曼树rewind(outfp); // 使目标文件指针指向文件开始处

fwrite(&size, sizeof(unsigned long), 1, outfp); // 向目标文件写入源文件大小

for (i = 0; i < n; i++)

{ // 向目标文件写入字符出现频度

fwrite(&w[i], sizeof(unsigned long), 1, outfp);

}

buf.bits = 0; // 初始化bits

buf.ch = 0; // 初始化ch

rewind(infp); // 使源文件指针指向文件开始处

cha = fgetc(infp); // 取出源文件的第一个字符

while (!feof(infp))

{ // 对源文件字符进行编码，并将编码写入目标文件

String strTmp = pHuffmanTree->Encode(cha);// 字符编码

for (i = 0; i

{ // 向目标文件写入编码

if (strTmp[i] == '0')

Write(0); // 向目标文件写入

else Write(1); // 向目标文件写入

}

cha = fgetc(infp); // 取出源文件的下一个字符

}

WriteToOutfp(); // 强行写入目标文件

fclose(infp); fclose(outfp); // 关闭文件

cout << "处理结束." << endl;

}

void HuffmanCompress::Decompress()// 操作结果：解压缩用哈夫曼编码压缩的文件

{

char infName[256], outfName[256]; // 输入(压缩)/出(目标)文件名

cout << "请输入压缩文件名:";

cin >> infName;

if ((infp = fopen(infName, "r+b")) == NULL)

throw Error("打开压缩文件失败!"); // 抛出异常

fgetc(infp); // 取出压缩文件第一个字符

if (feof(infp))

throw Error("压缩文件为空!"); // 抛出异常

cout << "请输入目标文件名:";

cin >> outfName;

if ((outfp = fopen(outfName, "w+b")) == NULL)

throw Error("打开目标文件失败!"); // 抛出异常

cout << "正在处理，请稍候..." << endl;

const unsigned long n = 256; // 字符个数

char ch[n]; // 字符数组

unsigned long w[n]; // 权

unsigned long i, size = 0;

char cha;

rewind(infp); // 使源文件指针指向文件开始处fread(&size, sizeof(unsigned long), 1, infp); // 读取目标文件的大小

for (i = 0; i < n; i++)

{

ch[i] = (char)i; // 初始化ch[i]

fread(&w[i], sizeof(unsigned long), 1, infp); // 读取字符频度

}

if (pHuffmanTree != NULL)

delete []pHuffmanTree; // 释放空间

pHuffmanTree = new HuffmanTree(ch, w, n); // 生成哈夫曼树unsigned long len = 0; // 解压的字符数

cha = fgetc(infp); // 取出源文件的第一个字符

while (!feof(infp))

{ // 对压缩文件字符进行解码，并将解码的字符写入目标文件

String strTmp = ""; // 将cha转换二进制形式的串

unsigned char c = (unsigned char)cha; // 将cha转换成unsigned char类型

for (i = 0; i < 8; i++)

{ // 将c转换成二进制串

if (c < 128) Concat(strTmp, "0"); // 最高位为

else Concat(strTmp, "1"); // 最高位为

c = c << 1; // 左移一位

}

LinkList lkText = pHuffmanTree->Decode(strTmp);// 译码

String strTemp(lkText);

for (i = 1; i<=strTemp.Length(); i++)

{ // 向目标文件写入字符

len++; // 目标文件长度自加

fputc(strTemp[i-1], outfp); // 将字符写入目标文件中

if (len == size) break; // 解压完毕退出内循环

}

if (len == size) break; // 解压完毕退出外循环

cha = fgetc(infp); // 取出源文件的下一个字符}

fclose(infp); fclose(outfp); // 关闭文件

cout << "处理结束." << endl;

}

// 操作结果：由哈夫曼压缩类对象copy构造新哈夫曼压缩类对象——复制构造函数

{

if (copy.pHuffmanTree != NULL)

{ // copy为非空哈夫曼压缩类对象

pHuffmanTree = new HuffmanTree(*copy.pHuffmanTree);// 生成哈夫曼树

}

else pHuffmanTree = NULL; // 生成空哈夫曼压缩类对象

}

HuffmanCompress &HuffmanCompress::operator=(const HuffmanCompress& copy)

// 操作结果：将哈夫曼压缩类对象copy赋值给当前哈夫曼压缩类对象——赋值运算符

{

if (© != this)

{

if (pHuffmanTree != NULL) delete []pHuffmanTree; // 释放空间

if (copy.pHuffmanTree != NULL)

{ // copy为非空哈夫曼压缩类对象

pHuffmanTree = new HuffmanTree(*copy.pHuffmanTree);

// 生成哈夫曼树

}

else pHuffmanTree = NULL; // 生成空哈夫曼压缩类对象

}

return *this;

}

#endif

#include"utility.h"// 实用程序软件包

#include"huffman_compress.h"// 哈夫曼压缩算法

int main(void)

{

try// 用try封装可能出现异常的代码

{

HuffmanCompress obj;

int select = 0;

while (select != 3)

{

cout << endl << "1. 压缩";

cout << endl << "2. 解压";

cout << endl << "3. 退出";

cout << endl << "请选择:";

cin >> select; // 输入选择

while (cin.get() != '\n'); // 忽略用户输入的其他字符

switch (select)

{

case 1:

https://www.360docs.net/doc/3a4381914.html,press(); // 压缩

break;

case 2:

obj.Decompress(); // 解压缩

}

catch (Error err) // 捕捉并处理异常

{

err.Show(); // 显示异常信息

}

system("PAUSE"); // 调用库函数system()

return 0; // 返回值, 返回操作系统

}

3.测试结果

由于测试的文件字符太多，则就截下来一部分的压缩后的编码。下面的是压缩过后的编码截图。

下面为解压后的文件截图，解压后与压缩前的是一样的。