Monday, November 24, 2008

faults with measuring

Preface: TV's : The Biggest Loser
A tv show that encourages contestants to not be in the bottom rung of people losing the minimum percentage of weight, weekly.

Fault: unit of measurement != results
Televising that someone goes from 170lbs to 163lbs (delta -4.11%) one week then from 163lbs to 159lbs (delta -2.45%) doesn't mean that that person has slowed their fat loss (a goal of becoming healthy), just their weekly weight loss has slowed (the weekly goal of the television show contest).

Measure each person's %body_fat at each time interval, and compare that %body_fat each time. And compare that delta %body_fat each week. Reason why this is better, from experience, I can't lose or gain weight (beyond a 10 pound fluctuation). But I can gain or lose body fat percentage.

My story:
I weigh ~(195-205)lbs.
I work out, I convert fat into leg, core, and arm strength.
I slouch and become an enslaved cubicle person who goes out to lunch daily; muscle deteriorates and becomes fat.

Lastly, muscle weighs more than fat, and having a balanced amount of muscle over fat is our healthy bound.

Now, to compare this with Google's Official Google Blog: Sorting 1PB with MapReduce

Holy Damn!!
Our cluster admin would have killed my job before it got anywhere close to tying up 1pb, or ran for 6 hours.

Preface: Google has shown that 1PB of data can be sorted in ~6 hours
Fault: human time is not the same as cycle time.
Solution: Show that their algorithm for building a 1PB dataset and sorting it takes less cycles than any other mechanism.

If one wants to, you can ask for a more mathematical rigor to this public demonstration.

As a student, this rigor has killed my love for computers as being just fun, to building this the science of computing is quite an amazing feat, as evidenced by computing as an industry has grown some 100B% in the past 50 years.

My questions are: what is the time to
insert something into this 1PB dataset,
remove something,
retrieve the first element,
sort it

In terms of I guess cycle time, and Θ or O of an input n.
To be congruent with all other sorting mechanisms, I can't accept that this feat is better than Θ( log2(n) )

And I would like to see what the cycle time is to create a data set of 1PB, and the cycle time to sort 1PB.

The ability to process 1PB in a "timely" fashion pushes the capabilities of computing even further as we are finally able to approach extremely large datasets and for that not to be just too large to use, thus putting awesome tools in the hands of programmers who performs tasks for others.

But cycle time is cycle time, and this is just saying that we have so many fast computers connected to each other that we have to power to do what we need. MapReduce(1PB) < Θ(awesome)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.