哈希表之单词查找并输出其上下文

哈希表之单词查找

英文文章，写入一个文本文档，作为基础测试数据。建立一个待查关键字文件，存储待查询的单词。
输入查询的单词，输出该单词的出现次数，及每个出现位置的上下文（前一句，单词/短语所在的句子，下一句）。
目前只支持查询一个单词。

以下为代码：

#include #include #include#include #include #include#include #include const int HASHNUM = 29989;typedef struct Hashtable{            //哈希表结构体    char *word;    int count;    struct Hashtable * next;}Hashnode;Hashnode* hashtable[HASHNUM];       //HASHNUMBER大小的指针数组(储存指针的数组) 作为hash表std::vectorlines;std::map<std::string,std::set> wordline;   //需要使用到迭代器匹配行号所以使用map嵌套setunsigned int Hashfuntion(const char *str);void Hashinsert(char *str);void readintohash(char *path);void findquery(char *path);void getlinecplusplus(char *path,char *str);void readintohash(char *path){    char ch[1024] = {NULL};              //存储一行字符串    char *p;    FILE *fp = fopen(path,"r");    if(fp == NULL)        exit(-1);    while(( fgets(ch,sizeof (ch),fp) != NULL ))                           //读取到换行符时，或者到达文件末尾时停止    {        if(strcmp( " " , ch ) == 0)                                       //空行继续            continue;        const char *ch1 = ", \n\t.";                                      //要分割的字符(逗号、空格、换行、制表符、句号)        p = strtok(ch,ch1);                                               //用strtok函数从一行字符串中分离出每个单词，分隔符设置为(空格、逗号、换行、制表符)        while(p != NULL)        {            Hashinsert(p);            p = strtok(NULL, ch1);                              //每次找到一个分隔符后，一个空（NULL）就被放到分隔符处，strtok函数用这种方法来连续查找该字符串，此处的NULL是一个指向后面还未分割的单词的指针，因此，首次调用时，第一个元素必须指向要分解的字符串，随后调用要把元素设成NULL。        }    }    fclose(fp);}void Hashinsert(char *str){    int hashcode = Hashfuntion(str);    Hashnode *newnode;    for (newnode = hashtable[hashcode]; newnode != NULL ; ++newnode) {        if(strcmp(str, newnode->word) == 0){       //查找单词是否已经在hash表中了            (newnode->count)++;            return;        }    }    //如果上面没返回，也就是说hash表中没有这个单词，添加新节点，加入这个单词    newnode = new Hashnode;    newnode->count = 1;    newnode->word = (char *)malloc(strlen(str) + 1);    strcpy(newnode->word,str);    newnode->next = hashtable[hashcode];        //将新生成的节点插入到hashcode为下标的链表中去    hashtable[hashcode] = newnode;}unsigned int Hashfuntion(const char *str){    unsigned int index = 0;    for (; *str != '\0' ; str++) {        index = index * 31 + *str;    }    return index % HASHNUM;}void findquery(char *path){    char ch[50] = {NULL};    FILE *fp = fopen(path,"r");    std::ofstream outfile1("E:\\answer.txt");    while(fgets(ch,sizeof (ch),fp)){                    //把待查关键字读入        int hashcode2 = Hashfuntion(ch);        outfile1 <word << "出现了" <count << "次" << std::endl;    }    getlinecplusplus("E:\\database.txt",ch);    fclose(fp);}void getlinecplusplus(char *path,char *str){    std::ifstream readfile(path);                                        //用ifstream读入txt文件path并与readfile相连    std::ofstream outfile("E:\\answer.txt",std::ofstream::app);         //用ofstream将答案输出到answer.txt    int linecount = 0;    std::string line;    std::string word;    while(std::getline(readfile, line)){        lines.push_back(line);                                        //将一行放入vector中        std::istringstream iss(line);                                 //获取一行        ++linecount;                                                  //行号加1        while(iss >> word){              wordline[word].insert(linecount);                        //加入词的行号        }    }    std::set::const_iterator itset;    std::vector::const_iterator itvector;              //新建两个迭代器    int i;    for(itset = (wordline.find(str)->second).begin();itset != (wordline.find(str)->second).end(); itset++)    {        outfile << "出现在第 "<< *itset <<"行" << std::endl;        i = 0;        for (itvector = lines.begin(); itvector != lines.end(); itvector++) {       //前一句，单词所在的句子，下一句            if (*itset == ++i)          //行号与vector循环次数匹配上了            {                if(itvector != lines.begin()){                                      //第一句不能打印前一句                    outfile << *(itvector-1) << std::endl;                }                if(itvector != lines.end()-1){                                      //最后一句不能打印后一句                    outfile << *(itvector+1) << std::endl;                }                outfile << *itvector << std::endl;                                  //单词本身句            }        }    }    outfile.close();}int main(){    SetConsoleOutputCP(65001);              //避免控制台中文乱码    readintohash("E:\\database.txt");      //自行更改路径    findquery("E:\\query.txt");    return 0;}

哈希表之单词查找并输出其上下文

最新关注

热文推荐

BrokerChain: A Cross-Shard Blockchain Protocol for Account/Balance-based State Sharding 阅读笔记

web期末作业设计网页：动漫网站设计——哆啦A梦(10页) HTML+CSS+JavaScript 学生DW网页设计作业成品动漫网页设计作业 web网页设计与开发 html实训大作业

RabbitMQ面试题大全含答案

数据库开发常见面试题

重庆高校平台刷课软件（小鱼高校平台助手）相关说明

100天精通Python（可视化篇）——第84天：matplotlib绘制不同种类炫酷直方图参数说明+代码实战（普通、多变量、堆叠、分组、多个子图、折线、曲线直方图）

哈希表之单词查找并输出其上下文

相关文章

最新关注

热文推荐