A model for readership dropoff

Kindle has popular highlights and a pseudo-pager number system. So a 200 page book might have 5000 locations. A book might have 15 popular highlights, maybe 20 people for the first, 10 the second, 2 the 15th.  The highlights are located at various locations, say 1% in, 5% in, and the last is at 20% in.

The quality of a sentence affects the # of highlights.  Also the location of the book affects # of highlights because people stopped reading the book.

readers = a + b x
– where the intercept is 100% , all readers are present on page 1.
– the slope is negative and represents the drop off rate.

highlights = quality of sentence  * readers(x)

The number of highlights depend son the quality of a sentence, which is constant, but unknown. So a good sentence will be highlighted say, 1% of the time.

So someday when I have time, I want to see if I can establish the confidence intervals for the curves. Because there are so many constraints, it seems like we should be able to get good estimates of the drop off rate despite relatively few data points.

Comments are closed.