Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Rapid miner lab / DataMiningForTheMasses

.pdf
Скачиваний:
22
Добавлен:
27.01.2022
Размер:
17.51 Mб
Скачать

Glossary and Index

Stopwords: In text mining, these are small words that are necessary for grammatical correctness, but which carry little meaning or power in the message of the text being mined. These are often articles, prepositions or conjuntions, such as ‘a’, ‘the’, ‘and’, etc., and are usually removed in the Process Document operator’s sub-process. (Page 199)

Stream: This is the string of operators in a data mining model, connected through the operators’ ports via splines, that represents all actions that will be taken on a data set in order to mine it. (Page 41)

Structured Query Language (SQL): The set of codes, reserved keywords and syntax defined by the American National Standards Institute used to create, manage and use relational databases. (Page 17)

Sub-process: In RapidMiner, this is a stream of operators set up to apply a series of actions to all inputs connected to the parent operator. (Page 197)

Support Percent: In an association rule data mining model, this is the percent of the time that when the antecedent is found in an observation, the consequent is also found. Since this is calculated as the number of times the two are found together divided by the total number of they could have been found together, the Support Percent is the same for reciprocal rules. (Page 84)

Table: In data collection, a table is a grid of columns and rows, where in general, the columns are individual attributes in the data set, and the rows are observations across those attributes. Tables are the most elemental entity in relational databases. (Page 16)

Target Attribute: See Label; Dependent Variable. (Page 108)

Technology: Any tool or process invented by mankind to do or improve work. (Page 11)

Text Mining: The process of data mining unstructured text-based data such as essays, news articles, speech transcripts, etc. to discover patterns of word or phrase usage to reveal deeper or previously unrecognized meaning. (Page 190)

249

Data Mining for the Masses

Token (Tokenize): In text mining, this is the process of turning words in the input document(s) into attributes that can be mined. (Page 197)

Training Data: In a predictive model, this data set already has the label, or dependent variable defined, so that it can be used to create a model which can be applied to a scoring data set in order to generate predictions for the latter. (Page 108)

Tuple: See Observation. (Page 16)

Variable: See Attribute. (Page 16)

View: A type of pseudo-table in a relational database which is actually a named, stored query. This query runs against one or more tables, retrieving a defined number of attributes that can then be referenced as if they were in a table in the database. Views can limit users’ ability to see attributes to only those that are relevant and/or approved for those users to see. They can also speed up the query process because although they may contain joins, the key columns for the joins can be indexed and cached, making the view’s query run faster than it would if it were not stored as a view. Views can be useful in data mining as data miners can be given read-only access to the view, upon which they can build data mining models, without having to have broader administrative rights on the database itself. (Page 27)

250

Data Mining for the Masses

ABOUT THE AUTHOR

Dr. Matthew North is Associate Professor of Computing and Information Studies at Washington & Jefferson College in Washington, Pennsylvania, USA. He has taught data management and data mining for more than a decade, and previously worked in industry as a data miner, most recently at eBay.com. He continues to consult with various organizations on data mining projects as well.

Dr. North holds a Bachelor of Arts degree in Latin American History and Portuguese from Brigham Young University; a Master of Science in Business Information Systems from Utah State University; and a Doctorate in Technology Education from West Virginia University. He is the author of the book Life Lessons & Leadership (Agami Press, 2011), and numerous papers and articles on technology and pedagogy. His dissertation, on the topic of teaching models and learning styles in introductory data mining courses, earned him a New Faculty Fellows award from the Center for Advancement of Scholarship on Engineering Education (CASEE); and in 2010, he was awarded the Ben Bauman Award for Excellence by the International Association for Computer Information Systems (IACIS). He lives with his wife, Joanne, and their three daughters in southwestern Pennsylvania.

To contact Dr. North regarding this text, consulting or training opportunities, or for speaking engagements, please access this book’s companion web site at: https://sites.google.com/site/dataminingforthemasses/

251

Data Mining for the Masses

252

Соседние файлы в папке Rapid miner lab