WizIQ helps you learn and teach online - any subject you can think of!
Join for FREE

Hierarchical Bayesian models

Add to Favourites
Post to:
Join the English Learning Community

Description
Hierarchical Bayesian models Can represent and reason about knowledge at multiple levels of abstraction. Have been used by statisticians for many years.

Comments
Presentation Transcript Presentation Transcript

Part III Hierarchical Bayesian Models : Part III Hierarchical Bayesian Models

Slide2 : Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

Vision : (Han and Zhu, 2006) Vision

Word learning : Principles Structure Data Whole-object principle Shape bias Taxonomic principle Contrast principle Basic-level bias Word learning

Hierarchical Bayesian models : Hierarchical Bayesian models Can represent and reason about knowledge at multiple levels of abstraction. Have been used by statisticians for many years.

Hierarchical Bayesian models : Hierarchical Bayesian models Can represent and reason about knowledge at multiple levels of abstraction. Have been used by statisticians for many years. Have been applied to many cognitive problems: causal reasoning (Mansinghka et al, 06) language (Chater and Manning, 06) vision (Fei-Fei, Fergus, Perona, 03) word learning (Kemp, Perfors, Tenenbaum,06) decision making (Lee, 06)

Outline : Outline A high-level view of HBMs A case study Semantic knowledge

Slide8 : Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(phrase structure | grammar) P(utterance | phrase structure) P(speech | utterance) P(grammar | UG)

Slide9 : Phrase structure Utterance Grammar Universal Grammar u1 u2 u3 u4 u5 u6 s1 s2 s3 s4 s5 s6 G U Hierarchical Bayesian model P(G|U) P(s|G) P(u|s)

Slide10 : Phrase structure Utterance Grammar Universal Grammar u1 u2 u3 u4 u5 u6 s1 s2 s3 s4 s5 s6 G U A hierarchical Bayesian model specifies a joint distribution over all variables in the hierarchy: P({ui}, {si}, G | U) = P ({ui} | {si}) P({si} | G) P(G|U) Hierarchical Bayesian model P(G|U) P(s|G) P(u|s)

Knowledge at multiple levels : Knowledge at multiple levels Top-down inferences: How does abstract knowledge guide inferences at lower levels? Bottom-up inferences: How can abstract knowledge be acquired? Simultaneous learning at multiple levels of abstraction

Slide12 : Phrase structure Utterance Grammar Universal Grammar u1 u2 u3 u4 u5 u6 s1 s2 s3 s4 s5 s6 G U Top-down inferences Given grammar G and a collection of utterances, construct a phrase structure for each utterance.

Slide13 : Phrase structure Utterance Grammar Universal Grammar u1 u2 u3 u4 u5 u6 s1 s2 s3 s4 s5 s6 G U Infer {si} given {ui}, G: P( {si} | {ui}, G) α P( {ui} | {si} ) P( {si} |G) Top-down inferences

Slide14 : Phrase structure Utterance Grammar Universal Grammar u1 u2 u3 u4 u5 u6 s1 s2 s3 s4 s5 s6 G U Bottom-up inferences Given a collection of phrase structures, learn a grammar G.

Slide15 : Phrase structure Utterance Grammar Universal Grammar u1 u2 u3 u4 u5 u6 s1 s2 s3 s4 s5 s6 G U Infer G given {si} and U: P(G| {si}, U) α P( {si} | G) P(G|U) Bottom-up inferences

Slide16 : Phrase structure Utterance Grammar Universal Grammar u1 u2 u3 u4 u5 u6 s1 s2 s3 s4 s5 s6 G U Given a set of utterances {ui} and innate knowledge U, construct a grammar G and a phrase structure for each utterance. Simultaneous learning at multiple levels

Slide17 : Phrase structure Utterance Grammar Universal Grammar u1 u2 u3 u4 u5 u6 s1 s2 s3 s4 s5 s6 G U Simultaneous learning at multiple levels A chicken-or-egg problem: Given a grammar, phrase structures can be constructed Given a set of phrase structures, a grammar can be learned

Slide18 : Phrase structure Utterance Grammar Universal Grammar u1 u2 u3 u4 u5 u6 s1 s2 s3 s4 s5 s6 G U Infer G and {si} given {ui} and U: P(G, {si} | {ui}, U) α P( {ui} | {si} )P({si} |G)P(G|U) Simultaneous learning at multiple levels

Slide19 : Phrase structure Utterance Grammar Universal Grammar u1 u2 u3 u4 u5 u6 s1 s2 s3 s4 s5 s6 G U Hierarchical Bayesian model P(G|U) P(s|G) P(u|s)

Knowledge at multiple levels : Knowledge at multiple levels Top-down inferences: How does abstract knowledge guide inferences at lower levels? Bottom-up inferences: How can abstract knowledge be acquired? Simultaneous learning at multiple levels of abstraction

Outline : Outline A high-level view of HBMs A case study: Semantic knowledge

Folk Biology : Folk Biology R: principles S: structure D: data mouse squirrel chimp gorilla The relationships between living kinds are well described by tree-structured representations “Gorillas have hands”

Folk Biology : Folk Biology R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree

Outline : Outline A high-level view of HBMs A case study: Semantic knowledge Property induction Learning structured representations Learning the abstract organizing principles of a domain

Property induction : Property induction R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree

Property Induction : Property Induction R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion Approach: work with the distribution P(D|S,R)

Property Induction : Property Induction Previous approaches: Rips (75), Osherson et al (90), Sloman (93), Heit (98)

Bayesian Property Induction : Hypotheses Bayesian Property Induction

Bayesian Property Induction : Hypotheses Bayesian Property Induction

Slide30 : D C }

Choosing a prior : Choosing a prior

Bayesian Property Induction : Bayesian Property Induction A challenge: We have to specify the prior, which typically includes many numbers An opportunity: The prior can capture knowledge about the problem.

Property Induction : Property Induction R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion

Biological properties : Biological properties Structure: Living kinds are organized into a tree Stochastic process: Nearby species in the tree tend to share properties

Slide35 : Structure:

Slide36 : Structure:

Stochastic Process : Smooth Not smooth Stochastic Process Nearby species in the tree tend to share properties. In other words, properties tend to be smooth over the tree.

Stochastic process : Hypotheses Stochastic process

Generating a property : Generating a property y h where y tends to be smooth over the tree: threshold

Slide40 : S

The diffusion process : The diffusion process where Ө(yi) is 1 if yi ≥ 0 and 0 otherwise the covariance K encourages y to be smooth over the graph S

p(y|S,R): Generating a property : Let yi be the feature value at node i } i j p(y|S,R): Generating a property (Zhu, Lafferty, Ghahramani 03)

Biological properties : Biological properties R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion Approach: work with the distribution P(D|S,R)

Slide44 : D C }

Results : Results (Osherson et al) Model Human

Results : Results Cows have property P. Elephants have property P. Horses have property P. All mammals have property P. Model Human

Spatial model : Spatial model R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: 2D space Stochastic process: diffusion

Slide48 : Structure:

Slide49 : Structure:

Tree vs 2D : Tree vs 2D “horse” “all mammals” Tree + diffusion 2D + diffusion

Biological Properties : Biological Properties R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion

Three inductive contexts : Class C Class A Class D Class E Class G Class F Class B Class C Class A Class D Class E Class G Class F Class B Class C Class G Class F Class E Class D Class B Class A Three inductive contexts R: S: tree + diffusion process chain + drift process network + causal transmission “has T4 cells” “can bite through wire” “carries E. Spirus bacteria”

Threshold properties : Threshold properties “can bite through wire” “has skin that is more resistant to penetration than most synthetic fibers” Hippo Cat Lion Camel Elephant Poodle Collie Doberman (Osherson et al; Blok et al)

Threshold properties : Threshold properties Structure: The categories can be organized along a single dimension Stochastic process: Categories towards one end of the dimension are more likely to have the novel property

Results : Results “has skin that is more resistant to penetration than most synthetic fibers” (Blok et al, Smith et al) 1D + drift 1D + diffusion

Three inductive contexts : Class C Class A Class D Class E Class G Class F Class B Class C Class A Class D Class E Class G Class F Class B Class C Class G Class F Class E Class D Class B Class A Three inductive contexts R: S: tree + diffusion process chain + drift process network + causal transmission “has T4 cells” “can bite through wire” “carries E. Spirus bacteria”

Causally transmitted properties : Causally transmitted properties (Medin et al; Shafto and Coley) Salmon Grizzly bear

Causally transmitted properties : Causally transmitted properties Structure: The categories can be organized into a directed network Stochastic process: Properties are generated by a noisy transmission process

Experiment: disease properties : Experiment: disease properties Island Mammals (Shafto et al)

Results: disease properties : Results: disease properties Mammals Island Web + transmission

Three inductive contexts : Class C Class A Class D Class E Class G Class F Class B Class C Class A Class D Class E Class G Class F Class B Class C Class G Class F Class E Class D Class B Class A Three inductive contexts R: S: tree + diffusion process chain + drift process network + causal transmission “has T4 cells” “can bite through wire” “carries E. Spirus bacteria”

Property Induction : Property Induction R: Principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion Approach: work with the distribution P(D|S,R)

Conclusions : property induction : Conclusions : property induction Hierarchical Bayesian models help to explain how abstract knowledge can be used for induction

Outline : Outline A high-level view of HBMs A case study: Semantic knowledge Property induction Learning structured representations Learning the abstract organizing principles of a domain

Structure learning : Structure learning R: Principles S: structure D: data Structural form: tree Stochastic process: diffusion mouse squirrel chimp gorilla

Structure learning : Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) Structural form: tree Stochastic process: diffusion

Structure learning : Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R) Structural form: tree Stochastic process: diffusion

Structure learning : Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R) The distribution previously used for property induction Structural form: tree Stochastic process: diffusion

Generating features over the tree : mouse squirrel chimp gorilla Generating features over the tree

Generating features over the tree : mouse squirrel chimp gorilla Generating features over the tree

Structure learning : Structure learning R: principles S: structure D: data ? Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R) Structural form: tree Stochastic process: diffusion

P(S|R): Generating structures : P(S|R): Generating structures Consistent with R Inconsistent with R

P(S|R): Generating structures : P(S|R): Generating structures Complex Simple

P(S|R): Generating structures : P(S|R): Generating structures if S inconsistent with R otherwise Each structure is weighted by the number of nodes it contains: where is the number of nodes in S

Structure Learning : Structure Learning P(S|D,R) will be high when: The features in D vary smoothly over S S is a simple graph (a graph with few nodes) Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R) R: principles S: structure D: data

Structure Learning : Structure Learning P(S|D,R) will be high when: The features in D vary smoothly over S S is a simple graph (a graph with few nodes) Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R) R: principles S: structure D: data

Structure learning example : Participants rated the goodness of 85 features for 48 animals E.g., elephant: gray hairless toughskin big bulbous longleg tail chewteeth tusks smelly walks slow strong muscle quadrapedal inactive vegetation grazer oldworld bush jungle ground timid smart group Structure learning example (Osherson et al)

Biological Data : Biological Data Features Animals

Slide79 : Tree:

Spatial model : Spatial model R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: 2D space Stochastic process: diffusion

Slide81 : 2D space:

Conclusions: structure learning : Conclusions: structure learning Hierarchical Bayesian models provide a unified framework for the acquisition and use of structured representations

Outline : Outline A high-level view of HBMs A case study: Semantic knowledge Property induction Learning structured representations Learning the abstract organizing principles of a domain

Learning structural form : Learning structural form R: principles S: structure D: data mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion

Which form is best? : Ostrich Robin Crocodile Snake Bat Orangutan Turtle Ostrich Robin Crocodile Snake Bat Orangutan Turtle Which form is best?

Structural forms : Structural forms Order Chain Ring Partition Hierarchy Tree Grid Cylinder

Learning structural form : Learning structural form R: principles S: structure D: data ? Goal: find S,F that maximize P(S,F|D) could be tree, 2D space, ring, …. Structural form: F Stochastic process: diffusion

Learning structural form : Learning structural form R: principles S: structure D: data ? Aim: find S,F that maximize P(S,F|D) α P(D|S)P(S|F) P(F) Uniform distribution on the set of forms Structural form: F Stochastic process: diffusion

Learning structural form : Learning structural form R: principles S: structure D: data ? Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F) The distribution used for property induction Structural form: F Stochastic process: diffusion

Learning structural form : Learning structural form R: principles S: structure D: data ? Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F) Structural form: F Stochastic process: diffusion The distribution used for structure learning

P(S|F): Generating structures from forms : P(S|F): Generating structures from forms if S inconsistent with F otherwise Each structure is weighted by the number of nodes it contains: where is the number of nodes in S

Slide92 : Simpler forms are preferred A B C P(S|F): Generating structures from forms D All possible graph structures S P(S|F) Chain Grid

Learning structural form : Learning structural form F: form S: structure D: data ? Goal: find S,F that maximize P(S,F|D) ?

Learning structural form : Learning structural form P(S,F|D) will be high when: The features in D vary smoothly over S S is a simple graph (a graph with few nodes) F is a simple form (a form that can generate only a few structures) F: form S: structure D: data Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

Learning structural form : Learning structural form P(S,F|D) will be high when: The features in D vary smoothly over F S is a simple graph (a graph with few nodes) F is a simple form (a form that can generate only a few structures) F: form S: structure D: data Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

Form learning: Biological Data : Form learning: Biological Data Features Animals 33 animals, 110 features

Form learning: Biological Data : Form learning: Biological Data

Supreme Court (Spaeth) : Supreme Court (Spaeth) Votes on 1600 cases (1987-2005)

Color (Ekman) : Color (Ekman)

Outline : Outline A high-level view of HBMs A case study: Semantic knowledge Property induction Learning structured representations Learning the abstract organizing principles of a domain

Where do priors come from? : Where do priors come from?

Slide102 : mouse squirrel chimp gorilla Stochastic process: diffusion

Slide103 : mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion

Slide104 : mouse squirrel chimp gorilla Structural form: tree Stochastic process: diffusion

Where do structural forms come from? : Order Chain Ring Partition Hierarchy Tree Grid Cylinder Where do structural forms come from?

Where do structural forms come from? : Where do structural forms come from? Form Form Process Process

Node-replacement graph grammars : Node-replacement graph grammars Production (Chain) Derivation

Node-replacement graph grammars : Node-replacement graph grammars Production (Chain) Derivation

Node-replacement graph grammars : Node-replacement graph grammars Production (Chain) Derivation

Where do structural forms come from? : Where do structural forms come from? Form Form Process Process

The complete space of grammars : The complete space of grammars 1 4096 ... ...

When can we stop adding levels? : When can we stop adding levels? When the knowledge at the top level is simple or general enough that it can be plausibly assumed to be innate.

Conclusions : Conclusions Hierarchical Bayesian models provide a unified framework which can Explain how abstract knowledge is used for induction Explain how abstract knowledge can be acquired

Learning abstract knowledge : Learning abstract knowledge Applications of hierarchical Bayesian models at this conference: Semantic knowledge: Schmidt et al. Learning the M-constraint Syntax: Perfors et al. Learning that language is hierarchically organized Word learning: Kemp et al. Learning the shape bias

Want to learn?

Sign up and browse through relevant courses.

Name:
Your Email:
Password:
Country:
Contact no.:


Area code Number
Subject you are interested in:
Word verification: (Enter the text as in image)


Sign Up Already a member? Sign In
I agree to WizIQ's User Agreement & Privacy Policy
7 Followers

Your Facebook Friends on WizIQ