Posts Tagged ‘data mining’

VC Firms Use Big Data to Seek Out the Next Big Thing

April 25, 2017

Venture Capital firms, that is, firms that invest in funding start-ups, early stage  companies, and companies with good growth potential, are always on the hunt for the next great opportunity.  The article below talks about the trend away from people who are expert in spotting such opportunities, and towards computer based analytics which is believed to be faster and better at finding the “next big thing.”

Venture-Capital Firms Use Big Data to Seek Out the Next Big Thing – WSJ


“At the current rate…” and other assumptions

March 30, 2017

In order to make any predictions in any area of study, one must make assumptions about the data, and what is likely to affect it going forward.  That is why statistical studies often make use of such statements as:

“At the current rate…”

“If things continue as in the past…”

“Based on what we know now…”

“It is reasonable to assume…”

“If history is a guide…”

“If things don’t change…”

But things do change, and that is what makes predicting the future so difficult, and so vulnerable to known and unknown biases.  While these caveats are unavoidable, and need to be stated, they require the consumer of the statistics to have a reasonable amount of skepticism and a large amount knowledge.

A quote in a The Great Race by Levi Tillemann states that “The available supply of gasoline, as is well know, is quite limited and it behooves the farseeing men of the motor car industry to look for likely substitutes.”  This quote is attributed to Thomas J. Fay, in 1905, in a magazine call Horseless Age.

Statistically Counting Lost Luggage as Not Lost

March 30, 2017

“An innovation aiming to streamline how air passengers reconnect with their lost luggage comes with a major asterisk: airline would no longer count the luggage as lost.”  Using text messaging, airlines could advise you that your bags did not make your flight, and you can give them instructions via text where they should send the lost bags.

DOT statisticians will not count those bags as lost, and airlines can maintain goods stats while perhaps not performing so well.  You say, “They’re not lost, because we know where they are.”  I say, “They’re lost because they are not where they belong.”

Whoever said statistics was an exact science?

For the full article in the Wall Street Journal, click here:

counting Lost Baggage

When “Big Data” gets it wrong

January 4, 2017

Big Data could provide a big advantage to investors, but when the data is wrong or misinterpreted, it can be catastrophic.

“Credit card data sold to investors is making shares of retailers behave strangely, especially when the data gets things wrong.”  So begins an article in the Wall Street Journal ‘Big Data Adds Up to Trading Distortions.’  The first example is about Tailored Brands (owner of Men’s Warehouse and Joseph A. Bank) stock that shot up nearly 40% in one day when investors realized that the data they were basing their decisions on was inaccurate.

For the full article:  how-credit-card-data-might-be-distorting-retail-stocks-wsj

Popular graph types

March 26, 2015

Seven of the most common graphs in statistics are listed below:

  1. Pareto Diagram or Bar Graph– A bar graph contains a bar for each category of a set of qualitative data. The bars are arranged in order of frequency, so that more important categories are emphasized.
  2. Pie Chart or Circle Graph– A pie chart displays qualitative data in the form of a pie. Each slice of pie represents a different category.
  3. Histogram– A histogram in another kind of graph that uses bars in its display. This type of graph is used with quantitative data. Ranges of values, called classes, are listed at the bottom, and the classes with greater frequencies have taller bars.
  4. Stem and Left Plot– A stem and left plot breaks each value of a quantitative data set into two pieces, a stem, typically for the highest place value, and a leaf for the other place values. It provides a way to list all data values in a compact form.
  5. Dot plot – A dot plot is a hybrid between a histogram and a stem and leaf plot. Each quantitative data value becomes a dot or point that is placed above the appropriate class values.
  6. Scatterplots – A scatterplot displays data that is paired by using a horizontal axis (thex axis), and a vertical axis (the y axis). The statistical tools of correlation and regression are then used to show trends on the scatterplot.
  7. Time-Series Graphs– A time-series graph displays data at different points in time, so it is another kind of graph to be used for certain kinds of paired data. The horizontal axis shows the time and the vertical axis is for the data values. These kinds of graphs can be used to show trends as time progresses.

What Online Social Networks May Know About Non-Members

May 18, 2012

What Online Social Networks May Know About Non-Members

ScienceDaily (Apr. 30, 2012) — What can social networks on the internet know about persons who are friends of members, but have no user profile of their own? Researchers from the Interdisciplinary Center for Scientific Computing of Heidelberg Univer­sity studied this question. Their work shows that through network analytical and machine learning tools the relationships between members and the connection patterns to non-members can be evaluated with regards to non-member relationships. Using simple contact data, it is possible, under certain conditions, to correctly predict that two non-members know each other with approx. 40 percent probability.

Until now, studies of this type were restricted to users of social networks, i.e. persons with a posted user profile who agreed to the given privacy terms. “Non-members, however, have no such agreement. We therefore studied their vulnerability to the automatic generation of so-called shadow profiles,” explains Prof. Dr. Katharina Zweig, who until recently worked at the Interdisciplinary Center for Scientific Computing (IWR) of Heidelberg University.

Data Mining for improved on-line tutoring.

February 26, 2012

Bruce Upbin, Forbes Staff

TECH | 2/22/2012 @ 10:51AM |10,720 views

Knewton Is Building The World’s Smartest Tutor

Facebook and Google are two of technology’s great data projects. Love them or hate them, they spend all day mining their users’ activity. They harvest a few dozen bits of usable personal information per user per day. All in the interest of serving you ads.

That’s nothing, as data projects go, next to the amount of info students produce when they work online. “Education is the world’s largest data industry, by far,” says Jose Ferreira, who knows this because he runs a firm called Knewton that’s building what could become the world’s most valuable repository of the ways people learn.

Knewton, started four years ago in New York City by the 43-year-old Ferreira, builds its software into online classes that watch students’ every move: scores, speed, accuracy, delays, keystrokes, click-streams and drop-offs. “We’re physically collecting thousands of data points per student per day,” says Ferreira. Students go at their own pace, and the software continuously adapts to challenge and cajole them to learn based on their individual learning style. As individual students are correlated to the behaviors of thousands of other students, Knewton can make between 5 million and 10 million refinements to its data model every day. Psychometricians use similar principles to build standardized exams, but Knewton harvests way more data than testmakers ever will. Someday it will know what kids will get on the SAT, so they won’t have to take it.

“Online education,” says Ferreira, “is on the cusp of massive change, and only 100 cognoscenti know about it.”

A passel of other tech companies, such as Aleks, Grockit, Blackboard, Coursekit and 2tor, are also working to speed up education’s shift online. In higher ed, growth is already apace. The number of college students enrolled in at least one online course has risen from 1.7 million to 6 million since 2002. In the first nine months of 2011 Pearson, the world’s biggest educational publisher, had 8 million U.S. students register for its online homework and assessment programs and almost 5 million enroll in its remedial online college courses. These numbers were up 23% and 33%, respectively, over last year.

Data mining in education can be a far more powerful tool than mining consumer purchasing or social profiles and search because, as this illustration shows, virtually every math concept correlates to all the others. Strong correlations allow a computer to quickly model a student’s performance and adjust an entire online math course based on even a limited data sample. (Click on image to enlarge.)

Five thousand freshmen at Arizona State University enrolled in Knewton-powered Pearson courses for remedial math in August. Half of the students finished four weeks early. Another quarter finished early enough to move into regular freshman math. The portion of students withdrawing from the courses fell from 13% to 6%, and pass rates rose from 66% to 75%. The best part for the school, says ASU executive vice provost Phil Regier, is that teachers now have a dashboard to track who’s falling behind and needs help.

In October Knewton raised $33 million from Pearson and Peter Thiel’s Founders Fund, reportedly at a valuation north of $150 million. The company had already raised $21 million from venture firms such as Accel Partners, Bessemer Venture Partners and First Mark. Says Thiel, an impassioned advocate for shaking up the college model: “We like companies that have breakthrough technologies but not disruptive technologies, which typically don’t work. Knewton tries to make the existing system better with a very powerful tool.”

There are limits. A computer can’t teach that which is up for interpretation, such as the causes of the War of 1812. Nor will it ever replace teachers. What Knewton offers is a way to automate the drudgery of delivering basic skills, the lack of which is epidemic at colleges. Between 10% and 15% of incoming students at ASU are not ready for freshman math. In the University of California system a quarter of 2010’s freshmen were unprepared for college-level writing.

Knewton founder Ferreira was never a big fan of traditional education. His parents moved their three kids from Mozambique to the U.S. to ensure they had a better education. Unlike his two brainy sisters, Ferreira struggled in class but had a particular skill at standardized tests. He scored well enough on his SATs (1540 out of 1600) to get into Carleton College, where he graduated in 1990 with a philosophy degree. He moved to San Francisco without a job and did practice SATs over breakfast the way other people do crossword puzzles. A friend suggested he work as a tutor at Stanley H. Kaplan Educational Centers. Ferreira became a star teacher.

In 1993 he was brought to New York to run Kaplan’s GRE prep service just as the Educational Testing Service was converting to computerized tests. Within a few months, “almost by accident,” says Ferreira, he came up with a foolproof strategy to answer a new type of question on the computer tests. ETS ended up having to destroy hundreds of thousands of test books. The next year Ferreira figured out the GRE scoring algorithm and uncovered duplicate questions across tests, something ETS denied would be a problem. Kaplan went public with the findings, and ETS sued for breach of contract, fraud and copyright infringement. E-mail exchanges dug up during discovery ­labeled Ferreira as the “Antichrist.”

Ferreira left Kaplan for Harvard Business School and did a stint at Goldman and a startup before returning to ­Kaplan in 2001. He badly wanted to launch an online test prep course that would adapt to students. After four years, though, he got frustrated with Kaplan’s delays and left for venture capital. But the idea stuck with him. In 2008 he decided to do it himself.

He raised $2.5 million from Accel, First Round Capital and angels such as Reid Hoffman. That funded the build-out of adaptive GMAT, SAT and LSAT test-prep courses, which proved the technology could succeed. A money-back guarantee to raise scores paid out in only 3% of cases. Test prep led to math-readiness courses for college freshmen, which in turn led to the Pearson deal in 2011. The plan is to convert a whole shelf of Pearson material to the adaptive format. Knewton will share revenue on every product it powers and plans to add more publishers in K–12 and overseas. The company doesn’t release its financial info, but it is estimated to have finished 2011 with $6 million in revenue. Based on expansion of its deal with Pearson and new business growth, rough estimates suggest that within four years Knewton’s revenue could surpass $100 million.

Ferreira’s long-term goal is having a global corpus of educational content fully tagged and loaded in the Knewton ­system. Anyone will be able to make an adaptive course from piece parts. Developed countries could subsidize a free version for underserved education markets in Africa, Asia and Latin America.

“I like it when entrepreneurs have that sort of vision but are focused on what’s need to be done next,” says angel investor Reid Hoffman.

“One billion school-age children will grow up with very minimal reading, writing and math,” says Ferreira. “People should be marching in the streets with pitchforks about this.”

This story appears in the March 12, 2012 edition of Forbes magazine.

This article is available online at: