Mathematics – A new Domain for Datamining? : Mathematics – A new Domain for Datamining?
Simon Colton
simonco@cs.york.ac.uk
http://www.dai.ed.ac.uk/~simonco
Universities of Edinburgh & York
United Kingdom
Mathematics is the new Biology : Mathematics is the new Biology Many databases of math information
Massive potential for datamining
This talk
Overview of mathematics databases
Hurdles to overcome for datamining
Suggested Methods
Potential Rewards
Mathematical Databases : Mathematical Databases Mathworld encyclopedia
8974 entries, 153958 cross-references, 1400 pages
MathSciNet citation service
10843 reviews, 151350 articles, 358104 authors
Mizar library of formalised maths
666 articles, 2000 concept definitions
Mathematica CAS functions
Tens of thousands of computer algebra functions
Mathematical Databases : Mathematical Databases Encyclopedia of Integer Sequences
60,000 sequences with terms, definitions, etc.
Inverse Symbolic Calculator
50 million constants, 400 tables
Gap library (CAS)
6 million groups
Ad hoc databases everywhere
Geometry junkyard, My favourite constants
Problems with the Data : Problems with the Data Highly heterogeneous
No agreed upon format for concepts, conjectures
Distributed
Hundreds of websites
Dynamic
Eg. 50 new integer sequences daily
Really need to impose homogenuity
Suggestions for Datamining : Suggestions for Datamining Conjectures: simple relationships between concepts
Equivalence, implication, nonexistence, moonshine
Need to worry about interestingness
Plausibility, complexity, surprisingness
Concept formation to get correct statements
Composition, tweaking, monster-barring
Potential Rewards - Example : Potential Rewards - Example NumbersWithNames program
http://machine-creativity.com/programs/nwn
Datamining the Encyclopedia of Integer Sequences
Perfect numbers are pernicious
Perfect: sum of divisors is twice the number
Pernicious: prime number of 1s in binary
6, 28, 496, ….
Found by looking for subsequences
Lots more of similar examples
Potential Rewards: Money & Fame : Potential Rewards: Money & Fame Money
EPSRC funded big project: e-science
E-maths initiative being discussed
Fame
Monstrous Moonshine Conjectures
Found by accident (numbers 196833 & 196884)
Led to Fields Medal (see paper)
Conclusions and Future Work : Conclusions and Future Work Consider mathematics as a datamining domain
Much data available, but there are problems
Techniques required are simple
Possible to make important conjectures
Cross domain/database sharing of data
Projects like NumbersWithNames
http://machine-creativity.com/programs/nwn