期刊文献+
共找到243,513篇文章
< 1 2 250 >
每页显示 20 50 100
基于Linked data的数据完整性评估新方法 认领
1
作者 袁满 胡超 仇婷婷 《吉林大学学报:工学版》 EI CAS CSCD 北大核心 2020年第5期1826-1831,共6页
为更加高效、准确地对数据完整性进行评估,通过对国内外完整性评估技术和方法的研究,本文基于Linked data的数据特点,提出了用于数据完整性评估的β算法和用于隐含数据挖掘的Dam算法,并从理论上分析证明了算法的有效性和准确性。最后,... 为更加高效、准确地对数据完整性进行评估,通过对国内外完整性评估技术和方法的研究,本文基于Linked data的数据特点,提出了用于数据完整性评估的β算法和用于隐含数据挖掘的Dam算法,并从理论上分析证明了算法的有效性和准确性。最后,将东北石油大学教务数据发布为Linked data作为验证数据进行实验,与文献中两种完整性评估算法进行了比对,结果表明:评估完整性提高约6%,评估效率平均提高约40倍,验证了本文算法的准确性和高效性。本文提出的基于Linked data的数据完整性评估算法不仅能保证数据评估的准确性,同时能大幅度提高计算效率。 展开更多
关键词 计算机软件 数据质量 数据完整性 Linked data
地理大数据挖掘——目标、内涵与研究问题 认领
2
作者 裴韬 宋辞 +5 位作者 郭思慧 舒华 刘亚溪 杜云艳 马廷 周成虎 《地理学报:英文版》 SCIE CSCD 2020年第2期251-266,共16页
The objective,connotations and research issues of big geodata mining were discussed to address its significance to geographical research in this paper.Big geodata may be categorized into two domains:big earth observat... The objective,connotations and research issues of big geodata mining were discussed to address its significance to geographical research in this paper.Big geodata may be categorized into two domains:big earth observation data and big human behavior data.A description of big geodata includes,in addition to the“5Vs”(volume,velocity,value,variety and veracity),a further five features,that is,granularity,scope,density,skewness and precision.Based on this approach,the essence of mining big geodata includes four aspects.First,flow space,where flow replaces points in traditional space,will become the new presentation form for big human behavior data.Second,the objectives for mining big geodata are the spatial patterns and the spatial relationships.Third,the spatiotemporal distributions of big geodata can be viewed as overlays of multiple geographic patterns and the characteristics of the data,namely heterogeneity and homogeneity,may change with scale.Fourth,data mining can be seen as a tool for discovery of geographic patterns and the patterns revealed may be attributed to human-land relationships.The big geodata mining methods may be categorized into two types in view of the mining objective,i.e.,classification mining and relationship mining.Future research will be faced by a number of issues,including the aggregation and connection of big geodata,the effective evaluation of the mining results and the challenge for mining to reveal“non-trivial”knowledge. 展开更多
关键词 BIG earth observation DATA BIG human behavior DATA GEOGRAPHICAL SPATIOTEMPORAL pattern SPATIOTEMPORAL HETEROGENEITY knowledge DISCOVERY
Correlates of Parental Choice of Child Discipline Methods in Ghana: A Multilevel Modeling Approach 认领
3
作者 Stephen Kwaku Amoah Ezekiel Nii Noye Nortey Abukari Alhassan 《应用科学(英文)》 2020年第3期78-99,共22页
This study applied multilevel modeling to investigate the impact of observed predictors and different levels or groups that households belong, on parents’ choice of discipline methods using data from 8156 households ... This study applied multilevel modeling to investigate the impact of observed predictors and different levels or groups that households belong, on parents’ choice of discipline methods using data from 8156 households derived from a nationwide survey by the Ghana Statistical Service (GSS) in 2011. The aim of the study is to provide in-depth information on why parents choose particular discipline methods as corrective measures to reduce unwanted child behaviour in the present and to increase desirable ones in the future. The results of the study show that, religion and age-group of household heads have significant effect on household’s likelihood to choose physical discipline methods whereas the wealth index of a household and ethnicity of the household head, have significant effect on households’ likelihood to choose non-physical and psychological aggression methods. The results further show significant contextual effect on the differences in choices of parents at the household and regional levels. The choice of physical discipline methods by parents was consistent across households and regional levels unlike non-physical and psychological aggression methods whose application varied across the regions. Households in the Northern, Eastern and Volta regions mostly chose to apply physical discipline methods whereas in the Upper West, Western and Northern regions the most chosen discipline methods were non-physical discipline methods. Psychological aggression discipline methods were predominantly applied in the Upper East, Central and Northern regions of the country. 展开更多
关键词 Child DISCIPLINE PHYSICAL DISCIPLINE Non-Physical DISCIPLINE Psycho-logical AGGRESSION DISCIPLINE MULTILEVEL MULTINOMIAL Logit Model Hier-archical Structured DATA Nested DATA
在线阅读 免费下载
Using Statistical Learning to Treat Missing Data: A Case of HIV/TB Co-Infection in Kenya 认领
4
作者 Collins Odhiambo 《数据分析和信息处理(英文)》 2020年第3期110-133,共24页
In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objec... In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%. 展开更多
关键词 Missing Data HIV/TB Co-Infection IMPUTATION Missing at Random Count Data
在线阅读 免费下载
浅谈电视灯光数据管理的应用与思考 认领
5
作者 姚葳澎 《演艺科技》 2020年第3期40-43,共4页
通过对电视灯光数据来源与特性的理解,阐述数据管理对灯光工作的作用与意义,以期引发业内人士对电视灯光数据管理的重视与思考。
关键词 电视灯光 数据 数据管理 数据应用 数据收集
在线阅读 下载PDF
An Improvement of Data Cleaning Method for Grain Big Data Processing Using Task Merging 认领
6
作者 Feiyu Lian Maixia Fu Xingang Ju 《电脑和通信(英文)》 2020年第3期1-19,共19页
Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in... Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in high scalability mode, but due to the lack of effective design, there are amounts of computing redundancy in the process of data cleaning, which results in lower performance. In this research, we found that some tasks often are carried out multiple times on same input files, or require same operation results in the process of data cleaning. For this problem, we proposed a new optimization technique that is based on task merge. By merging simple or redundancy computations on same input files, the number of the loop computation in MapReduce can be reduced greatly. The experiment shows, by this means, the overall system runtime is significantly reduced, which proves that the process of data cleaning is optimized. In this paper, we optimized several modules of data cleaning such as entity identification, inconsistent data restoration, and missing value filling. Experimental results show that the proposed method in this paper can increase efficiency for grain big data cleaning. 展开更多
关键词 GRAIN BIG DATA DATA CLEANING TASK MERGING Hadoop MapReduce
在线阅读 免费下载
文章速递Hybrid Warehouse Model and Solutions for Climate Data Analysis 认领
7
作者 Hasan Hashim 《电脑和通信(英文)》 2020年第10期75-98,共24页
Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting ins... Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting insightful information from the massive volume of data is a challenging task. Big data analytics is becoming a vital research area in domains such as climate data analysis which demands fast access to data. Nowadays, an open-source platform namely MapReduce which is a distributed computing framework is widely used in many domains of big data analysis. In our work, we have developed a conceptual framework of data modeling essentially useful for the implementation of a hybrid data warehouse model to store the features of National Climatic Data Center (NCDC) climate data. The hybrid data warehouse model for climate big data enables for the identification of weather patterns that would be applicable in agricultural and other similar climate change-related studies that will play a major role in recommending actions to be taken by domain experts and make contingency plans over extreme cases of weather variability. 展开更多
关键词 Data Warehouse Hadoop NCDC Data Set Weather
在线阅读 下载PDF
文章速递Fifty-Six Big Data V’s Characteristics and Proposed Strategies to Overcome Security and Privacy Challenges (BD2) 认领
8
作者 Abou_el_ela Abdou Hussein 《信息安全(英文)》 2020年第4期304-328,共25页
The amount of data that is traveling across the internet today, including very large and complex set of raw facts that are not only large, but also, complex, noisy, heterogeneous, and longitudinal data as well. Compan... The amount of data that is traveling across the internet today, including very large and complex set of raw facts that are not only large, but also, complex, noisy, heterogeneous, and longitudinal data as well. Companies, institutions, healthcare system, mobile application capturing devices and sensors, traffic management, banking, retail, education etc., use piles of data which are further used for creating reports in order to ensure continuity regarding the services that they have to offer. Recently, Big data is one of the most important topics in IT industry. Managing Big data needs new techniques because traditional security and privacy mechanisms are inadequate and unable to manage complex distributed computing for different types of data. New types of data have different and new challenges also. A lot of researches treat with big data challenges starting from Doug Laney’s landmark paper</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;"> during the previous two decades;the big challenge is how to operate a huge volume of data that has to be securely delivered through the internet and reach its destination intact. The present paper highlights important concepts of Fifty</span><span style="font-family:Verdana;">-</span><span style="font-family:Verdana;">six Big Data V’s characteristics. This paper also highlights the security and privacy Challenges that Big Data faces and solving this problem by proposed technological solutions that help us avoiding these challenging problems. 展开更多
关键词 Big Data Big Data V’s Characteristics Security Privacy Challenges Technological Solutions
在线阅读 下载PDF
数据确权的困境及破解之道 认领
9
作者 韩旭至 《东方法学》 CSSCI 北大核心 2020年第1期97-107,共11页
数据确权存在多种理论,可置于既有制度规范中考察。这可能涉及物权法、合同法、知识产权法、竞争法、个人信息保护法等诸多规范。然而,无论是单独适用还是综合适用,既有规范均无法充分解答数据权属、保护与利用的问题。从数据新型权利... 数据确权存在多种理论,可置于既有制度规范中考察。这可能涉及物权法、合同法、知识产权法、竞争法、个人信息保护法等诸多规范。然而,无论是单独适用还是综合适用,既有规范均无法充分解答数据权属、保护与利用的问题。从数据新型权利理论来看,亦未能实现数据确权。该类理论多建立在劳动赋权的简单论证之上,并未深入研究数据权利的生产机制,以致数据权利主体、客体均难以判定,且无法回应实践中的数据治理问题。数据确权的困境植根于以意志论、利益论为代表的传统权利理论无法解释新的数据问题。通过对数据权利生产机制考察发现,算法在数据价值与数据权利的形成中处于核心地位。因此,可通过算法规制反向实现数据确权。 展开更多
关键词 数据 非个人数据 数据权利 新型权利 个人信息
在线阅读 下载PDF
基于ODC的国产卫星影像存储与应用研究 认领
10
作者 李俊杰 陈舒博 +3 位作者 张文 余长慧 张志远 孟令奎 《地球信息科学学报》 CSCD 北大核心 2020年第9期1860-1867,共8页
随着地球观测进入大数据时代,传统的数据管理技术已经难以适应大数据需求,Open Data Cube(ODC)作为新型的开源的地球观测数据管理与分析平台,适合进行时间序列数据的高性能计算和探索性数据分析,是亚大区域综合地球观测系统AOGEOSS的重... 随着地球观测进入大数据时代,传统的数据管理技术已经难以适应大数据需求,Open Data Cube(ODC)作为新型的开源的地球观测数据管理与分析平台,适合进行时间序列数据的高性能计算和探索性数据分析,是亚大区域综合地球观测系统AOGEOSS的重要技术支撑平台。但当前ODC对国产卫星影像支持不友好,缺乏自动化管理和数据组织工具,使用ODC进行国产卫星影像数据管理的技术不成熟。因此,本文以高分一号卫星影像为例,通过开发ODC_GFTool中间件和自定义高分卫星影像产品格式探索ODC框架下国产影像数据自动化管理流程,最后以鄱阳湖为试验区,进行了基于ODC框架的水体提取应用实验,论证了ODC框架下国产卫星数据存取的可行性,研究结果表明相较于传统的数据处理工具ODC具有明显的效率优势,能够为AOGEOSS基础设施建设和国产卫星影像数据管理提供一定的参考。 展开更多
关键词 Open Data Cube ODC 国产影像 数据管理 影像存储 高分一号 鄱阳湖
A Survey of Data Partitioning and Sampling Methods to Support Big Data Analysis 认领
11
作者 Mohammad Sultan Mahmud Joshua Zhexue Huang +2 位作者 Salman Salloum Tamer Z.Emara Kuanishbay Sadatdiynov 《大数据挖掘与分析(英文)》 2020年第2期85-101,共17页
Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed... Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed up the computation of big data and increase scalability.In this paper,we present a comprehensive survey of the methods and techniques of data partitioning and sampling with respect to big data processing and analysis.We start with an overview of the mainstream big data frameworks on Hadoop clusters.The basic methods of data partitioning are then discussed including three classical horizontal partitioning schemes:range,hash,and random partitioning.Data partitioning on Hadoop clusters is also discussed with a summary of new strategies for big data partitioning,including the new Random Sample Partition(RSP)distributed model.The classical methods of data sampling are then investigated,including simple random sampling,stratified sampling,and reservoir sampling.Two common methods of big data sampling on computing clusters are also discussed:record-level sampling and blocklevel sampling.Record-level sampling is not as efficient as block-level sampling on big distributed data.On the other hand,block-level sampling on data blocks generated with the classical data partitioning methods does not necessarily produce good representative samples for approximate computing of big data.In this survey,we also summarize the prevailing strategies and related work on sampling-based approximation on Hadoop clusters.We believe that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects. 展开更多
关键词 big data analysis data partitioning data sampling distributed and parallel computing approximate computing
Super Resolution Perception for Improving Data Completeness in Smart Grid State Estimation 认领
12
作者 Gaoqi Liang Guolong Liu +4 位作者 Junhua Zhao Yanli Liu Jinjin Gu Guangzhong Sun Zhaoyang Dong 《工程(英文)》 SCIE EI 2020年第7期789-800,共12页
The smart grid is an evolving critical infrastructure,which combines renewable energy and the most advanced information and communication technologies to provide more economic and secure power supply services.To cope ... The smart grid is an evolving critical infrastructure,which combines renewable energy and the most advanced information and communication technologies to provide more economic and secure power supply services.To cope with the intermittency of ever-increasing renewable energy and ensure the security of the smart grid,state estimation,which serves as a basic tool for understanding the true states of a smart grid,should be performed with high frequency.More complete system state data are needed to support high-frequency state estimation.The data completeness problem for smart grid state estimation is therefore studied in this paper.The problem of improving data completeness by recovering highfrequency data from low-frequency data is formulated as a super resolution perception(SRP)problem in this paper.A novel machine-learning-based SRP approach is thereafter proposed.The proposed method,namely the Super Resolution Perception Net for State Estimation(SRPNSE),consists of three steps:feature extraction,information completion,and data reconstruction.Case studies have demonstrated the effectiveness and value of the proposed SRPNSE approach in recovering high-frequency data from low-frequency data for the state estimation. 展开更多
关键词 State estimation Low-frequency data High-frequency data Super resolution perception Data completeness
在线阅读 免费下载
农业农村数据开放平台研究与实践 认领
13
作者 李新 梁栋 +2 位作者 贾昕为 陈慧金 毕涛 《中国农业资源与区划》 CSSCI CSCD 北大核心 2020年第7期216-223,共8页
[目的]数据作为一种新型生产要素写入中央文件,有助于释放底层数据的价值,加快数据资产化进程,加速数字经济新业态、新模式的诞生。数据开放作为政府信息公开的重要组成部分,对于发挥数据价值,为大数据广泛应用提供重要的数据支撑具有... [目的]数据作为一种新型生产要素写入中央文件,有助于释放底层数据的价值,加快数据资产化进程,加速数字经济新业态、新模式的诞生。数据开放作为政府信息公开的重要组成部分,对于发挥数据价值,为大数据广泛应用提供重要的数据支撑具有重要意义。农业是我国的基础产业,农业农村数据对于政府的决策和社会主体参与农业生产活动也起到了关键的作用。[方法]文章就农业农村数据开放问题,阐述了农业农村数据作为农业生产要素的重要作用,总结了农业农村数据开放对于农业管理和服务的重要意义,研究了国内外有关农业农村数据开放的典型案例,分析了现阶段农业农村数据开放平台存在的问题,提出了一种升级面向社会开放农业农村统计数据平台的思路,并依托官方网站搭建数据开放的新平台,丰富了开放的数据源,整合了已有数据,并拓展了数据展现方式。[结果]对农业农村数据开放平台的运行成效进行了分析和总结,数据浏览量、检索量、数据下载量等反映建设成效的各项指标有了大幅度提升。[结论]社会公众对于高质量的农业农村开放数据具有极大需求,而以官方网站为载体进行的数据开放,能够提供科学、权威、全面的农业农村数据,为数据的增值增效提供了平台。 展开更多
关键词 数据 生产要素 农业农村 数据开放 数据频道
在线阅读 下载PDF
Mining Conditional Functional Dependency Rules on Big Data 认领
14
作者 Mingda Li Hongzhi Wang Jianzhong Li 《大数据挖掘与分析(英文)》 2020年第1期68-84,共17页
Current Conditional Functional Dependency(CFD)discovery algorithms always need a well-prepared training dataset.This condition makes them difficult to apply on large and low-quality datasets.To handle the volume issue... Current Conditional Functional Dependency(CFD)discovery algorithms always need a well-prepared training dataset.This condition makes them difficult to apply on large and low-quality datasets.To handle the volume issue of big data,we develop the sampling algorithms to obtain a small representative training set.We design the fault-tolerant rule discovery and conflict-resolution algorithms to address the low-quality issue of big data.We also propose parameter selection strategy to ensure the effectiveness of CFD discovery algorithms.Experimental results demonstrate that our method can discover effective CFD rules on billion-tuple data within a reasonable period. 展开更多
关键词 DATA MINING CONDITIONAL functional DEPENDENCY BIG DATA DATA quality
A review of systematic evaluation and improvement in the big data environment 认领
15
作者 Feng YANG Manman WANG 《工程管理前沿:英文版》 2020年第1期27-46,共20页
The era of big data brings unprecedented opportunities and challenges to management research.As one of the important functions of management decision-making,evaluation has been given more functions and application spa... The era of big data brings unprecedented opportunities and challenges to management research.As one of the important functions of management decision-making,evaluation has been given more functions and application space.Exploring the applicable evaluation methods in the big data environment has become an important subject of research.The purpose of this paper is to provide an overview and discussion of systematic evaluation and improvement in the big data environment.We first review the evaluation methods based on the main analytic techniques of big data such as data mining,statistical methods,optimization and simulation,and deep learning.Focused on the characteristics of big data(association feature,data loss,data noise,and visualization),the relevant evaluation methods are given.Furthermore,we explore the systematic improvement studies and application fields.Finally,we analyze the new application areas of evaluation methods and give the future directions of evaluation method research in a big data environment from six aspects.We hope our research could provide meaningful insights for subsequent research. 展开更多
关键词 BIG DATA evaluation methods systematic IMPROVEMENT BIG DATA ANALYTIC TECHNIQUES DATA MINING
A survey of uncertain data management 认领
16
作者 Lingli LI Hongzhi WANG +1 位作者 Jianzhong LI Hong GAO 《中国计算机科学前沿:英文版》 SCIE EI CSCD 2020年第1期162-190,共29页
Uncertain data are data with uncertainty information,which exist widely in database applications.In recent years,uncertainty in data has brought challenges in almost all database management areas such as data modeling... Uncertain data are data with uncertainty information,which exist widely in database applications.In recent years,uncertainty in data has brought challenges in almost all database management areas such as data modeling,query representation,query processing,and data mining.There is no doubt that uncertain data management has become a hot research topic in the field of data management.In this study,we explore problems in managing uncertain data,present state-of-the-art solutions,and provide future research directions in this area.The discussed uncertain data management techniques include data modeling,query processing,and data mining in uncertain data in the forms of relational,XML,graph,and stream. 展开更多
关键词 UNCERTAIN DATA PROBABILISTIC DATABASE PROBABILISTIC XML SEMI-STRUCTURED DATA DATA STREAM
Dynamic data auditing scheme for big data storage 认领
17
作者 Xingyue CHEN Tao SHANG +2 位作者 Feng ZHANG Jianwei LIU Zhenyu GUAN 《中国计算机科学前沿:英文版》 SCIE EI CSCD 2020年第1期219-229,共11页
When users store data in big data platforms,the integrity of outsourced data is a major concern for data owners due to the lack of direct control over the data.However,the existing remote data auditing schemes for big... When users store data in big data platforms,the integrity of outsourced data is a major concern for data owners due to the lack of direct control over the data.However,the existing remote data auditing schemes for big data platforms are only applicable to static data.In order to verify the integrity of dynamic data in a Hadoop big data platform,we presents a dynamic auditing scheme meeting the special requirement of Hadoop.Concretely,a new data structure,namely Data Block Index Table,is designed to support dynamic data operations on HDFS(Hadoop distributed file system),including appending,inserting,deleting,and modifying.Then combined with the MapReduce framework,a dynamic auditing algorithm is designed to audit the data on HDFS concurrently.Analysis shows that the proposed scheme is secure enough to resist forge attack,replace attack and replay attack on big data platform.It is also efficient in both computation and communication. 展开更多
关键词 BIG DATA DATA security REMOTE DATA AUDITING dynamic UPDATE PRIVACY protection
数据权利属性法律问题研究 认领
18
作者 张钦润 傅晓媚 《燕山大学学报:哲学社会科学版》 2020年第1期46-51,共6页
大数据蓬勃发展,随之引发诸多法律问题,个人信息泄露、数据的权利不明晰等问题较突出。依据数据的去识别处理,可以将数据分开进行法律保护,去识别处理前应属于个人信息范畴,去识别处理后的数据可以纳入财产权体系保护。因数据自身特殊... 大数据蓬勃发展,随之引发诸多法律问题,个人信息泄露、数据的权利不明晰等问题较突出。依据数据的去识别处理,可以将数据分开进行法律保护,去识别处理前应属于个人信息范畴,去识别处理后的数据可以纳入财产权体系保护。因数据自身特殊的属性,不管是纳入到知识产权范畴亦或是物权范畴都不相融和,确立数据财产权更有利于大数据的发展与流通,减少数据权属纠纷。数据财产权可以配置数据的处分、转让、收益等具体权利,通过主张违约或者侵权的路径获得救济。随着我国大数据发展的不断成熟,未来可以更好地确立数据相关的法律法规,促进数据交易流通与发展。 展开更多
关键词 数据 权利属性 数据财产权 个人信息 数据确权
在线阅读 下载PDF
Preserving Data Privacy in Speech Data Publishing 认领
19
作者 孙佳鑫 蒋进 赵萍 《东华大学学报:英文版》 EI CAS 2020年第4期293-297,共5页
Speech data publishing breaches users'data privacy,thereby causing more privacy disclosure.Existing work sanitizes content,voice,and voiceprint of speech data without considering the consistence among these three ... Speech data publishing breaches users'data privacy,thereby causing more privacy disclosure.Existing work sanitizes content,voice,and voiceprint of speech data without considering the consistence among these three features,and thus is susceptible to inference attacks.To address the problem,we design a privacy-preserving protocol for speech data publishing(P3S2)that takes the corrections among the three factors into consideration.To concrete,we first propose a three-dimensional sanitization that uses feature learning to capture characteristics in each dimension,and then sanitize speech data using the learned features.As a result,the correlations among the three dimensions of the sanitized speech data are guaranteed.Furthermore,the(ε,δ)-differential privacy is used to theoretically prove both the data privacy preservation and the data utility guarantee of P3S2,filling the gap of algorithm design and performance evaluation.Finally,simulations on two real world datasets have demonstrated both the data privacy preservation and the data utility guarantee. 展开更多
关键词 speech data publishing data privacy data utility differential privacy
在线阅读 下载PDF
Empirical Likelihood Based Longitudinal Data Analysis 认领
20
作者 Tharshanna Nadarajah Asokan Mulayath Variyath J Concepción Loredo-Osti 《统计学期刊(英文)》 2020年第4期611-639,共29页
In longitudinal data analysis, our primary interest is in the estimation of regression parameters for the marginal expectations of the longitudinal responses, and the longitudinal correlation parameters are of seconda... In longitudinal data analysis, our primary interest is in the estimation of regression parameters for the marginal expectations of the longitudinal responses, and the longitudinal correlation parameters are of secondary interest. The joint likelihood function for longitudinal data is challenging, particularly due to correlated responses. Marginal models, such as generalized estimating equations (GEEs), have received much attention based on the assumption of the first two moments of the data and a working correlation structure. The confidence regions and hypothesis tests are constructed based on the asymptotic normality. This approach is sensitive to the misspecification of the variance function and the working correlation structure which may yield inefficient and inconsistent estimates leading to wrong conclusions. To overcome this problem, we propose an empirical likelihood (EL) procedure based on a set of estimating equations for the parameter of interest and discuss its <span style="font-family:Verdana;">characteristics and asymptotic properties. We also provide an algorithm base</span><span style="font-family:Verdana;">d on EL principles for the estimation of the regression parameters and the construction of its confidence region. We have applied the proposed method in two case examples.</span> 展开更多
关键词 Longitudinal Data Generalized Estimating Equations Empirical Likelihood Adjusted Empirical Likelihood Extended Empirical Likelihood
在线阅读 免费下载
上一页 1 2 250 下一页 到第
使用帮助 返回顶部 意见反馈