CS144-Lab1实验笔记

实验简介

Lab0中完成了有序字节流的写入和读出,这个实验需要完成乱序字节流的重组、写入和读出。也就是说,Lab1实验手册中的这幅图,蓝色和绿色部分我们已经在Lab0中完成了,我们在Lab1中只需要完成红色部分的重组即可。

思路

(这个思路有很大的性能优化空间)

  1. 使用map存储乱序字节流,key=(起始下标,结束下标),均从0开始,value=(字符串)pair<int,int>作为key的时候不需要自己写比较函数。
  2. 每当有新字符串push进来后,在map中确定要插入的位置(应该插入到low_bound返回的之前一个),不断判断当前字符串是否能合并之前一个键值对或者之后的一个。
  3. 如果当前map的第一个元素可以插入到有序字节流中,弹出完整的第一个元素,或者部分第一个元素。
  4. 只有当“当前不存在乱序字节,且已经收到了对端发来的EOF标志”,有序字节流的end_of_input才有效。有一种情况是,收到了对端发来的EOF标志(收到了FIN包),但是中间仍有未收到的片段(空洞),这时候有序字节流不能结束。
  5. 注意一些conner case,比如写入有序字节流的时候当前字符串不一定能全部写入。

实现

类内新增变量

1
2
3
4
5
6
7
8
9
10
11
12
// libsponge/stream_reassembler.hh  
size_t _unassembled_bytes{0};
size_t _first_unassembled_index{0};
bool _eof{false};
// key-> <start_idx, end_idx>, value -> str_to_assembled
using recv_bytes_t = pair<pair<size_t, size_t>, std::string>;
map<pair<size_t, size_t>, std::string> _str_to_assemble;

ByteStream _output; //!< The reassembled in-order byte stream
size_t _capacity; //!< The maximum number of bytes

recv_bytes_t merge_two_unassembled_strs(const recv_bytes_t &a, const recv_bytes_t &b) const;

push_string 方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
//! \details This function accepts a substring (aka a segment) of bytes,
//! possibly out-of-order, from the logical stream, and assembles any newly
//! contiguous substrings and writes them into the output stream in order.
void StreamReassembler::push_substring(const string &data, const size_t index, const bool eof) {
size_t cur_win_max_idx = _first_unassembled_index + _capacity - _output.buffer_size();
if (index >= cur_win_max_idx)
return;

if (eof) {
_eof = true;
}
// cut the chars which are out of window or have been assembled
size_t data_start_idx = max(index, _first_unassembled_index);
size_t data_end_idx = min(cur_win_max_idx, index + data.size());
if (data_end_idx >= data_start_idx) {
pair<size_t, size_t> cur_data_start_end_index = make_pair(data_start_idx, data_end_idx);
string cur_data = data.substr(data_start_idx - index, data_end_idx - data_start_idx + 1);
// insert current data to str_to_assemble map
recv_bytes_t cur_recv_bytes = {cur_data_start_end_index, cur_data};
while (!this->empty()) {
auto iter = _str_to_assemble.lower_bound(cur_data_start_end_index);
bool cur_data_could_merge_right =
(iter != _str_to_assemble.end()) && (cur_recv_bytes.first.second >= iter->first.first);
if (cur_data_could_merge_right) {
cur_recv_bytes = merge_two_unassembled_strs(cur_recv_bytes, *iter);
_unassembled_bytes -= iter->second.size();
_str_to_assemble.erase(iter);
iter = _str_to_assemble.lower_bound(cur_data_start_end_index);
}

bool cur_data_could_merge_left =
(iter != _str_to_assemble.begin()) && ((--iter)->first.second >= cur_recv_bytes.first.first);
if (cur_data_could_merge_left) {
cur_recv_bytes = merge_two_unassembled_strs(*iter, cur_recv_bytes);
_unassembled_bytes -= iter->second.size();
_str_to_assemble.erase(iter);
}
if (!cur_data_could_merge_right && !cur_data_could_merge_left) {
break;
}
}
// insert the unassembled str to map
_str_to_assemble.insert(cur_recv_bytes);
_unassembled_bytes += cur_recv_bytes.second.size();
}

// if the first chunk start index is smaller than _first_unassembled_index
// write the first chunk to output bytestream
auto iter = _str_to_assemble.begin();
if (!_str_to_assemble.empty() && (iter->first.first <= _first_unassembled_index)) {
auto temp_map_head = *iter;
_str_to_assemble.erase(iter);
size_t written_len = _output.write(temp_map_head.second);
_unassembled_bytes -= written_len;
if (written_len == temp_map_head.second.size()) {
// The first chunk was all written to the output stream
_first_unassembled_index = temp_map_head.first.first + temp_map_head.second.size();
} else {
// Part of first chunk was written to the output stream
size_t new_data_start_index = temp_map_head.first.first + written_len;
_str_to_assemble.insert(
{{new_data_start_index, temp_map_head.first.second}, temp_map_head.second.substr(written_len)});
_first_unassembled_index = new_data_start_index;
}
}
if (empty() && _eof) {
_output.end_input();
}
}

//! \details This function merge two recv_bytes_type pairs, the start index of a is always smaller than b.
StreamReassembler::recv_bytes_t StreamReassembler::merge_two_unassembled_strs(
const StreamReassembler::recv_bytes_t &a,
const StreamReassembler::recv_bytes_t &b) const {
recv_bytes_t res;
res.first.first = a.first.first;
// choose the bigger one for the end index of merged string.
if (a.first.second > b.first.second) {
res.first.second = a.first.second;
res.second = a.second;
} else {
res.first.second = b.first.second;
res.second = a.second.substr(0, b.first.first - a.first.first) + b.second;
}
return res;
}

CS144-Lab1实验笔记
https://gwzlchn.github.io/202205/cs144-lab1/
作者
Zelin Wang
发布于
2022年5月18日
更新于
2022年10月23日
许可协议