## # 295. Find Median from Data Stream

Design a data structure that supports the following two operations:

void addNum(int num) - Add a integer number from the data stream to the data structure.
double findMedian() - Return the median of all elements so far.

Example:

``````addNum(1)
findMedian() -> 1.5
findMedian() -> 2
``````

If all integer numbers from the stream are between 0 and 100, how would you optimize it?
If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?

### # Solution

Approach 1: Sorting.

Approach 2: use 2 heaps.

Alternative approaches:

• keep a sorted list: O(N) to add (copy smaller portion, add new num, copy larger portion), O(1) to find median, O(N) space
• use a self balancing BST: O(logN) to add, O(1) to find median (the root), O(N) space
• can use reservior sampling to estimate the median

• If all nums in range(0, 100), then we can bucket them and keep the cumulative frequencies. O(1) to add (increment count for every bucket >= current num), O(logN) to find median (binary search), O(1) space.
• if 99% of all nums are in range(0, 100), we can have buckets plus 1 heap per side (smaller than 0 and larger than 100).

### # Code (Python)

Approach 2:

``````class MedianFinder:
def __init__(self):
"""
"""
# 2 heaps: min_heap stores the larger 1/2 of the data, max_heap stores the smaller 1/2
self.min_heap, self.max_heap = [], []

def addNum(self, num: int) -> None:
# O(logN) time, O(N) space
if self.min_heap and self.min_heap <= num:
heapq.heappush(self.min_heap, num)
else:
heapq.heappush(self.max_heap, -num)
if len(self.min_heap) - len(self.max_heap) > 1:
heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap))
elif len(self.max_heap) - len(self.min_heap) > 1:
heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap))

def findMedian(self) -> float:
# O(1) time
if len(self.max_heap) == len(self.min_heap):
return (self.min_heap - self.max_heap) / 2
elif len(self.max_heap) > len(self.min_heap):
return -self.max_heap
else:
return self.min_heap
``````

### # Code (C++)

Approach 1:

``````// Sorting.
class MedianFinder {
private:
vector<int> buf;
public:
// initialize your data structure here.
MedianFinder() {}

int i = 0;
for (; i < buf.size(); ++i)
{
if (buf[i] >= num)
break;
}
buf.insert(buf.begin()+i, num);
}

double findMedian() {
int mid = buf.size() / 2;
if (buf.size() % 2 != 0)
return buf[mid];
else
return (double)(buf[mid] + buf[mid-1]) / 2;
}
};
/**
* Your MedianFinder object will be instantiated and called as such:
* MedianFinder* obj = new MedianFinder();
* double param_2 = obj->findMedian();
*/
``````

Approach 2:

``````class MedianFinder {
private:
priority_queue<int> lo; // max heap.
priority_queue<int, vector<int>, greater<int>> hi; // min heap.
public:
/** initialize your data structure here. */
MedianFinder() {}

lo.push(num);
hi.push(lo.top());
lo.pop();
if (lo.size() < hi.size())
{
lo.push(hi.top());
hi.pop();
}
}

double findMedian() {
if (lo.size() > hi.size())
return lo.top();
else
return (lo.top() + hi.top()) / 2.0;
}
};
``````