# 295. Find Median from Data Stream
Design a data structure that supports the following two operations:
void addNum(int num) - Add a integer number from the data stream to the data structure.
double findMedian() - Return the median of all elements so far.
Example:
addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3)
findMedian() -> 2
Follow up:
If all integer numbers from the stream are between 0 and 100, how would you optimize it?
If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?
# Solution
Approach 1: Sorting.
Approach 2: use 2 heaps.
Alternative approaches:
- keep a sorted list: O(N) to add (copy smaller portion, add new num, copy larger portion), O(1) to find median, O(N) space
- use a self balancing BST: O(logN) to add, O(1) to find median (the root), O(N) space
- can use reservior sampling to estimate the median
Follow ups:
- If all nums in range(0, 100), then we can bucket them and keep the cumulative frequencies. O(1) to add (increment count for every bucket >= current num), O(logN) to find median (binary search), O(1) space.
- if 99% of all nums are in range(0, 100), we can have buckets plus 1 heap per side (smaller than 0 and larger than 100).
# Code (Python)
Approach 2:
class MedianFinder:
def __init__(self):
"""
initialize your data structure here.
"""
# 2 heaps: min_heap stores the larger 1/2 of the data, max_heap stores the smaller 1/2
self.min_heap, self.max_heap = [], []
def addNum(self, num: int) -> None:
# O(logN) time, O(N) space
if self.min_heap and self.min_heap[0] <= num:
heapq.heappush(self.min_heap, num)
else:
heapq.heappush(self.max_heap, -num)
if len(self.min_heap) - len(self.max_heap) > 1:
heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap))
elif len(self.max_heap) - len(self.min_heap) > 1:
heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap))
def findMedian(self) -> float:
# O(1) time
if len(self.max_heap) == len(self.min_heap):
return (self.min_heap[0] - self.max_heap[0]) / 2
elif len(self.max_heap) > len(self.min_heap):
return -self.max_heap[0]
else:
return self.min_heap[0]
# Code (C++)
Approach 1:
// Sorting.
class MedianFinder {
private:
vector<int> buf;
public:
// initialize your data structure here.
MedianFinder() {}
void addNum(int num) {
int i = 0;
for (; i < buf.size(); ++i)
{
if (buf[i] >= num)
break;
}
buf.insert(buf.begin()+i, num);
}
double findMedian() {
int mid = buf.size() / 2;
if (buf.size() % 2 != 0)
return buf[mid];
else
return (double)(buf[mid] + buf[mid-1]) / 2;
}
};
/**
* Your MedianFinder object will be instantiated and called as such:
* MedianFinder* obj = new MedianFinder();
* obj->addNum(num);
* double param_2 = obj->findMedian();
*/
Approach 2:
class MedianFinder {
private:
priority_queue<int> lo; // max heap.
priority_queue<int, vector<int>, greater<int>> hi; // min heap.
public:
/** initialize your data structure here. */
MedianFinder() {}
void addNum(int num) {
lo.push(num);
hi.push(lo.top());
lo.pop();
if (lo.size() < hi.size())
{
lo.push(hi.top());
hi.pop();
}
}
double findMedian() {
if (lo.size() > hi.size())
return lo.top();
else
return (lo.top() + hi.top()) / 2.0;
}
};