Assigment And Discusssion

Assignment:

 

1. Obtain one of the data sets available at the UCI Machine Learning Repository and apply as many of the different visualization techniques described in the chapter as possible. The bibliographic notes and book Web site provide pointers to visualization software.

2. Identify at least two advantages and two disadvantages of using color to visually represent information.

  1. What      are the arrangement issues that arise with respect      to three-dimensional plots?
  2. Discuss      the advantages and disadvantages of using sampling to reduce the number of      data objects that need to be displayed. Would simple random sampling (without      replacement) be a good approach to sampling? Why or why not?
  3. Describe      how you would create visualizations to display information that de-scribes      the following types of systems.

a) Computer networks. Be sure to include both the static aspects of the network, such as connectivity, and the dynamic aspects, such as traffic.

b) The distribution of specific plant and animal species around the world fora specific moment in time.

c) The use of computer resources, such as processor time, main memory, and disk, for a set of benchmark database programs.

d) The change in occupation of workers in a particular country over the last thirty years. Assume that you have yearly information about each person that also includes gender and level of education.

Be sure to address the following issues:

· Representation. How will you map objects, attributes, and relation-ships to visual elements?

· Arrangement. Are there any special considerations that need to be taken into account with respect to how visual elements are displayed? Specific examples might be the choice of viewpoint, the use of transparency, or the separation of certain groups of objects.

· Selection. How will you handle a large number of attributes and data objects

 

Decision Tree Assignment

 

Play now? Play later?

 

You can become a millionaire! That’s what the junk mail said. But then there was the fine print:

 

If you send in your entry before midnight tonight, then here are   your chances:

 

0.1% that you win $1,000,000

 

75% that you win nothing

 

Otherwise, you must PAY $1,000

 

But wait, there’s more! If you don’t win the million AND you don’t have to pay on your first   attempt,

 

then you can choose to play one more time. If you choose to play again, then here are   your chances:

 

2% that you win $100,000

 

20% that you win $500

 

Otherwise, you must PAY $2,000

 

 

What is your expected outcome for attempting this venture? Solve this problem using

 

a decision tree and clearly show all calculations and the   expected monetary value at each node.

 

Use maximization of expected value as your decision criterion.

 

Answer these questions:

 

1) Should you play at all? (5%) If you play, what is your   expected (net) monetary value? (15%)

 

2) If you play and don’t win at all on the first try (but don’t   lose money), should you try again? (5%) Why? (10%)

 

3) Clearly show the decision tree (40%) and expected net   monetary value at each node (25%)

Discussion :

 

What are the cons of data mining? Describe and provide some examples of cons in data mining that an organization may face.

You must make at least two substantive responses to your classmates’ posts. Respond to these posts in any of the following ways:

· Build on something your classmate said.

· Explain why and how you see things differently.

· Ask a probing or clarifying question.

· Share an insight from having read your classmates’ postings.

· Offer and support an opinion.

· Validate an idea with your own experience.

· Expand on your classmates’ postings.

· Ask for evidence that supports the post.

Discussion Length (word count): At least 150 words

References: At least one peer-reviewed, scholarly journal references.