What did we do so far? : What did we do so far? Learned how to summarize our data using pivot tables, histograms, VlOOKUP, Box-Plot, etc. (Chapters 1-4)
Leaned how to optimize our model’s outcomes by making decisions under uncertainty using the Precision Tree tool (Chapter 7)
Learned how to do system simulation to estimate risks using @RISK (chapter 16)
Learned how to do optimize our model’s outcomes using the Solver tools ( chapters 14 and 15)
Learned how to install and use highly recommended statistical packages (videos are available)
Leaned technology limitations when it comes to real-life use of technology and modeling (limitations of virtual class tools and traditional Excel tool). 11/12/2009 1 George Towfic, Ph.D.
The class technology factor and technology Limitations In Business : The class technology factor and technology Limitations In Business Blackboard is not ready for the “Business type” of online classes
Solutions were sought fast and everyone in Clarke worked to provide better technology. These should be considered as a learning lessons in the class.
We not only improved the Blackboard tools, but also improved THE best software tool in the field of online teaching (WizIQ). 11/12/2009 2 George Towfic, Ph.D.
Sample emails to improve WizIQ : Sample emails to improve WizIQ Dear George,
Thank you for contacting WiZiQ support.
1. The screen sharing that you perform during the session is only available in the recording of public classes at the moment.
2. The feature of editing recordings is not available as of yet but is in our roadmap.
If you have any further questions or comments, please let us know.
Regards,
Sid
WiZiQ
Two weeks after the above email I received yesterday the following announcement: New Screen Sharing in WiZiQ Virtual Classroom
Dear George,
Thank you for contacting WiZiQ support.
We are making changes in the recordings of private classes so that the screen sharing activity during the session is available in the recording as well.
For premium members, the new recording version of private classes will be available within day or so and for non premium members, the new version of private class recordings will be available in a few weeks.
If you have any further questions or comments, please let us know.
Regards,
Sid 11/12/2009 3 George Towfic, Ph.D.
General Comments : General Comments This class is about using technology in real-life system modeling and simulation
This book is normally taught in two semesters (read preface page xiii). If it is taught in one semester, one has either to teach chapters 2,3,5,6,8,-12 only, or assume that class members are familiar with reasonable statistical tools and concepts and just skim through chapters 2, 3, 5, 6, and 8-10 and go directly to the late chapters
Having questioned you in our first session about your use of basic statistics (Pivot tables, Vlookup, histograms, Chi square, ANOVA, etc.) I notices that you either didn’t know much about these tools or you have forgot how to use them (which makes sense since you don’t necessarily use statistics in your work)
Since I didn’t want this class to cover part of the book, I decided to speed up the learning process by providing you with video demos that explain how to install and use the tools so that I can discuss more material
This class is not running according the convenient of the instructor. In fact it is the other way around
I noticed that two class members are taking the attitude that they know all the class material s and hence they think that this class will not add much to their information. I want these members to reconsider since I think that the Excel add-ins technology that we are using is not available in local industry. These software add-ins normally cost a lot of money, so please make sure to understand their use and how to develop your statistical models. Just because you have statistical packages other than Excel does not mean that you know how to use statistics in business. Please try to look positively at things so that you can learn.
One of two class members are not viewing the class materials (including the videos that I am preparing) and that one or two and not submitting their assignments at the due time. Please make sure to spend the time to learn. I, very carefully watch class members‘ participation and hard work and take this in my final grading. 11/12/2009 4 George Towfic, Ph.D.
Upcoming tests : Upcoming tests 1. A test will be posted Monday We will have a test on Monday the 16th. It will be a multiple choice test that covers the materials and assignments we discussed so far (including today's lecture and the use and advantages of the Palisade tools). The test will be online and will be available for three days. You can take the test at your convenient time. There will be 30 question to be answered in 10 minutes. You should study to be able to use the time effectively.
On the Tuesday the 17th you need to submit your “Using the book's tools in your project” assignment. On Thursday 19th and Monday 23rd you will present your outcomes using WizIQ. Notice that this assignment is graded for 700 points and that your presentation will be graded for 150 points.
In the discussion link at Blackboard, you have a discussion about this lecture’s material. Please make sure to attend the open discussion and try to answer each other’s questions. Questions could include how to install and use software tools and understanding the basic statistics discussed in this lecture. The link is in the discussion area under the title “Review and 11-12 lecture”. 11/12/2009 George Towfic, Ph.D. 5
Project’s Technical Guidelines : Project’s Technical Guidelines know your data size, know your variables, know your categories
Know your independent and dependent variables
Summarize your data by: Categories (Pivot tables), Frequencies (Histograms), Distributions and outliers (Box-Plot), historical changes (Time series)
Calculate simple statistics such as mean, median, mod, standard deviations, z-score, correlation values, and historical data summary (for example mean of each year of sale or each year of class enrolment, etc.)
If your data is not large (less than 100 records), select sample random setS from the data and calculate the mean, standard devotion, z-score for each set.
Build a relation between one or two dependent variables and related independent variables. Use these relations to optimize your model using the Solver tool
Study the effect of change of independent variables on the dependent variables using @RISK
Estimate the probability of occurrence of different events in your model (such as the probability of bad car design, bad car parts, high car prices, etc.) and use Precision Tree to study the optimum path to follow to reach specific goals (such as low car price, low car maintenance, etc.) 11/12/2009 6 George Towfic, Ph.D.
Basic Statistics : Basic Statistics 11/12/2009 7 George Towfic, Ph.D.
Statistical Inference : Statistical Inference Problem: How to reach a conclusion about a population given information about samples?
Why is it important: because sometimes you don’t have access to a complete dataset (for security or privacy reasons). Also, sometimes it is impossible to get the entire dataset (like in the case of obtaining all cancer patients’ datasets).
Tool: @RISK when you don’t know the population data, but you can obtain the data distribution. For example, if you know for sure that the total number of patients is more than 3000, then your data is for sure normally distributed (according to the Central Limit Theorem). So you can say that since my data is normally distributed and since I have part of the data, then I can use @RISK with a normal distribution function to predict my output variables. 11/12/2009 8 George Towfic, Ph.D.
Optimization analysis : Optimization analysis Problem: How to obtain maximum or minimum performance parameters.
Why is it important: Because sometimes you need to put your model in a mathematical form to obtain data optimization. For example, given your company’s constrains, what would be the best working environment that satisfies the given contrarians and that maximizes the profit (or minimize health insurance payments).
Tools: Solver tool 11/12/2009 9 George Towfic, Ph.D.
Decision making : Decision making Problem: you have partial data for a considered model, but you have a percentage estimations on other data. You want to use the real data and the probabilistic data to make a conclusion about which directions to follow to obtain the best model performance.
Why is it important: Because sometime you have many options to follow to reach a given goal. You have a historical data or experts data that provides probabilistic estimation for different paths to reach one or many goals. You want to use this information to estimate which path to follow, what would be the best goal, and what would be the cost (it terms of money, human resources, risks, etc.) to reach each goal.
Tools: Precision Tree 11/12/2009 10 George Towfic, Ph.D.
Random Sampling : Random Sampling Problem: You want to know if your samples have the same mean. You can select random samples (with replacements), then test the mean of each sample and use t-test, Chi-Square test, or ANOVA to test the hypothesis if two or more means are the same.
Why is it important: to know if a given population is homogenous or not. If a population is homogenous, then they obey similar statistical rules, otherwise, you can say that this sample did not come from this population and hence, a hypothesis to the population night not apply on the sample.
Tools: StatTools, and Data Analysis tools 11/12/2009 11 George Towfic, Ph.D.
Median, Mod, and Mean : Median, Mod, and Mean Median:
One way to start understanding how the numbers are positioned within the minimum and maximum is to find the median. This is the middle number. To get this, you need to sort the data into order, and count how many there are. If there are an odd number, it's easy to find the middle one. If there are an even number, then you must find the mean of the two middle numbers.
Mode
When you collect data, you often find that some of it is the same. If you measure how tall a group of people are to the nearest centimeter, you are likely to find several people of the same height. Then there will be less people a centimeter lower, or higher, and so on. Obvious this number which happens more than any other in the set is important. This is the mode. When you make a pictogram, the mode will stick out most.
Mean
Both the median and the mode are single numbers which exist in the set (unless you have to calculate the median). The median is only interested in having the same number of numbers below it and above it. It isn't interested in what those numbers are. The mode is only really interested in the numbers that are the same. It ignores totally all single numbers. So while these are valuable, it would get useful to have a number which described this set of numbers, which used every number in the set, and took the same interest in the value of each. This is the mean.
Try them in Excel using the “median”, “mode”, and “average” functions 11/12/2009 George Towfic, Ph.D. 12
Skewness and normal distribution : Skewness and normal distribution The equation for skewness is defined as:
n*[Sum(x-mean)^3/SD]/(n-1)*(n-2)
Positive skweness indicates higher values above the mean. Negative skweness indicates higher values below the mean. We use this to test the symmetry of a given dataset. If the skew value of a set of numbers is near zero (like +0.0123 or -0.0212) then the data is normally distributed. If the sign of the skew in positive then the data is positively skewed otherwise it is negatively skewed
Try using the “skew” function in Excel (on, for example, 10 numbers).
Another way to test skewness and normal distribution is by considering that:
If mode = median= mean Then the data is normally distributed
If median > mean Then the data is positively skewed
otherwise the data is negatively skewed 11/12/2009 George Towfic, Ph.D. 13
Variance, Co-variance, and Standard deviations : Variance, Co-variance, and Standard deviations 11/12/2009 George Towfic, Ph.D. 14
Why Standard Deviation : Why Standard Deviation The standard deviation is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data.
Let's say Hempstead HS has a higher mean test score than Wahlert HS. Your first reaction might be to say that the kids at Hempstead are smarter.
But a bigger standard deviation for one school tells you that there are relatively more kids at that school scoring toward one extreme or the other. By asking a few follow-up questions you might find that, say, Wahlert's mean was skewed up because the school district sends all of the gifted education kids to Wahlert. Or that Wahlert’s scores were dragged down because students who recently have been "mainstreamed" from special education classes have all been sent to Shelbyville.
In this way, looking at the standard deviation can help point you in the right direction when asking why information is the way it is.
The standard deviation can also help you evaluate the worth of all those so-called "studies" that seem to be released to the press everyday. A large standard deviation in a study that claims to show a relationship between eating Twinkies and killing politicians, for example, might tip you off that the study's claims aren't all that trustworthy.
Of course, you'll want to seek the advice of a trained statistician whenever you try to evaluate the worth of any scientific research. But if you know at least a little about standard deviation going in, that will make your interview much more productive.
Try to use the “stdev” function in Excel to test the standard deviation 11/12/2009 George Towfic, Ph.D. 15
Normal Distibution : Normal Distibution 11/12/2009 George Towfic, Ph.D. 16 One standard deviation away from the mean in either direction on the horizontal axis (the red area on the above graph) accounts for somewhere around 68 percent of the people in this group. Two standard deviations away from the mean (the red and green areas) account for roughly 95 percent of the people. And three standard deviations (the red, green and blue areas) account for about 99 percent of the people.
Z-Score : Z-Score Z= (x-mean)/SD
Unlike SD which tells us how an Entire dataset clusters around the mean, z-test tells us how many standard deviations each individual data is far from the mean. Remember that skewness tells us how many the ENTIRE data is far from the mean. We use z-score a lot in grading students tests. If you take 95/100 that doesn’t mean that you are an A student. Many be the test was exceptionally easy and everyone else scored above 90. Z-score will tell how many standard deviations you are away from the mean and you will be scored accordingly.
Try the “standardize” function in Excel 11/12/2009 George Towfic, Ph.D. 17