您的位置:首页>分类数据>详情
When Big Data Leads To Lost Data
所属领域: 印太交汇区海洋物质能量中心形成演化过程与机制
资源类型: 人工智能与海洋大数据 / 海洋大数据
文献作者: Megler, V. M.; Maier, David
文献发表年份: 2012
文献期刊: PROCEEDINGS OF THE 5TH PH.D. WORKSHOP ON INFORMATION AND KNOWLEDGE
文献摘要:For decades, scientists bemoaned the scarcity of observational data to analyze and against which to test their models. Exponential growth in data volumes from ever-cheaper environmental sensors has provided scientists with the answer to their prayers: "big data". Now, scientists face a new challenge: with terabytes, petabytes or exabytes of data at hand, stored in thousands of heterogeneous datasets, how can scientists find the datasets most relevant to their research interests? If they cannot find the data, then they may as well never have collected it; that data is lost to them. Our research addresses this challenge, using an existing scientific archive as our test-bed. We approach this problem in a new way: by adapting Information Retrieval techniques, developed for searching text documents, into the world of (primarily numeric) scientific data. We propose an approach that uses a blend of automated and "semi-curated" methods to extract metadata from large archives of scientific data. We then perform searches over the extracted metadata, returning results ranked by similarity to the query terms. We briefly describe an implementation performed at an ocean observatory to validate the proposed approach. We propose performance and scalability research to explore how continued archive growth will affect our goal of interactive response, no matter the scale.
文献类型: Proceedings Paper
文献语种: English
关键词: Scientific data; ranked data search
文献作者地址: [Megler, V. M.; Maier, David] Portland State Univ, Dept Comp Sci, Portland, OR 97207 USA

版权所有@2017中国科学院文献情报中心

制作维护:中国科学院文献情报中心信息系统部地址:北京中关村北四环西路33号邮政编号:100190