Keyword: data-analysis
Paper Title Other Keywords Page
WED3O05 Big Data Analysis and Analytics with MATLAB database, framework, software, controls 1
 
  • D.S. Willingham
    ASCo, Clayton, Victoria, Australia
 
  Overview using Data Analytics to turn large volumes of complex data into actionable information can help you improve design and decision-making processes. In today's world, there is an abundance of data being generated from many different sources. However, developing effective analytics and integrating them into existing systems can be challenging. Big data represents an opportunity for analysts and data scientists to gain greater insight and to make more informed decisions, but it also presents a number of challenges. Big data sets may not fit into available memory, may take too long to process, or may stream too quickly to store. Standard algorithms are usually not designed to process big data sets in reasonable amounts of time or memory. There is no single approach to big data. Therefore, MATLAB provides a number of tools to tackle these challenges. In this paper 2 case studies will be presented: 1. Manipulating and doing computations on big datasets on light weight machines; 2. Visualising big, multi-dimensional datasets Developing Predictive Models High performance computing with clusters and Cloud Integration with Databases, HADOOP and Big Data Environments.  
slides icon Slides WED3O05 [12.369 MB]  
 
WEPGF037 Data Lifecycle in Large Experimental Physics Facilities: The Approach of the Synchrotron ELETTRA and the Free Electron Laser FERMI operation, experiment, synchrotron, electron 1
 
  • F. Billè, R. Borghes, F. Brun, V. Chenda, A. Curri, V. Duic, D. Favretto, G. Kourousias, M. Lonza, M. Prica, R. Pugliese, M. Scarcia, M. Turcinovich
    Elettra-Sincrotrone Trieste S.C.p.A., Basovizza, Italy
 
  Often the producers of Big Data face the emerging problem of Data Deluge. Nevertheless experimental facilities such as synchrotrons and free electron lasers may have additional requirements, mostly related to the necessity of managing the access for thousands of scientists. A complete data lifecycle describes the seamless path that joins distinct IT tasks such as experiment proposal management, user accounts, data acquisition and analysis, archiving, cataloguing and remote access. This paper presents the data lifecycle of the synchrotron ELETTRA and the free electron laser FERMI. With the focus on data access, the Virtual Unified Office (VUO) is presented. It is a core element in scientific proposal management, user information DB, scientific data oversight and remote access. Eventually are discussed recent developments of the beamline software, that holds the key role to data and metadata acquisition but also requires integration with the rest of the system components in order to provide data cataloging, data archiving and remote access. The scope of this paper is to disseminate the current status of a complete data lifecycle, discuss key issues and hint on the future directions.  
poster icon Poster WEPGF037 [1.110 MB]  
 
WEPGF043 Metadatastore: A Primary Data Store for NSLS-2 Beamlines experiment, database, GUI, interface 1
 
  • A. Arkilic, D.B. Allan, T.A. Caswell, L.R. Dalesio, W.K. Lewis
    BNL, Upton, Long Island, New York, USA
 
  Funding: Department of Energy, Brookhaven National Lab
The beamlines at NSLS-II are among the highest instrumented, and controlled of any worldwide. Each beamline can produce unstructured data sets in various formats. This data should be made available for data analysis and processing for beamline scientists and users. Various data flow systems are in place in numerous synchrotrons, however these are very domain specific and cannot handle such unstructured data. We have developed a data flow service, metadatastore, that manages experimental data in NSLS-II beamlines. This service enables data analysis and visualization clients to access this service either directly or via databroker api in a consistent and partition tolerant fashion, providing a reliable and easy to use interface to our state-of-the-art beamlines.
 
 
WEPGF044 Filestore: A File Management Tool for NSLS-II Beamlines experiment, EPICS, operation, interface 1
 
  • A. Arkilic, T.A. Caswell, D. Chabot, L.R. Dalesio, W.K. Lewis
    BNL, Upton, Long Island, New York, USA
 
  Funding: Brookhaven National Lab, Departmet of Energy
NSLS-II beamlines can generate 72,000 data sets per day resulting in over 2 M data sets in one year. The large amount of data files generated by our beamlines poses a massive file management challenge. In response to this challenge, we have developed filestore, as means to provide users with an interface to stored data. By leveraging features of Python and MongoDB, filestore can store information regarding the location of a file, access and open the file, retrieve a given piece of data in that file, and provide users with a token, a unique identifier allowing them to retrieve each piece of data. Filestore does not interfere with the file source or the storage method and supports any file format, making data within files available for NSLS-II data analysis environment.
 
poster icon Poster WEPGF044 [0.849 MB]  
 
WEPGF046 Towards a Second Generation Data Analysis Framework for LHC Transient Data Recording framework, operation, hardware, extraction 1
 
  • S. Boychenko, C. Aguilera-Padilla, M. Dragu, M.A. Galilée, J.C. Garnier, M. Koza, K.H. Krol, R. Orlandi, M.C. Poeschl, T.M. Ribeiro, K.S. Stamos, M. Zerlauth
    CERN, Geneva, Switzerland
  • M. Zenha-Rela
    University of Coimbra, Coimbra, Portugal
 
  During the last two years, CERNs Large Hadron Collider (LHC) and most of its equipment systems were upgraded to collide particles at an energy level twice higher compared to the first operational period between 2010 and 2013. System upgrades and the increased machine energy represent new challenges for the analysis of transient data recordings, which have to be both dependable and fast. With the LHC having operated for many years already, statistical and trend analysis across the collected data sets is a growing requirement, highlighting several constraints and limitations imposed by the current software and data storage ecosystem. Based on several analysis use-cases, this paper highlights the most important aspects and ideas towards an improved, second generation data analysis framework to serve a large variety of equipment experts and operation crews in their daily work.  
poster icon Poster WEPGF046 [0.497 MB]  
 
WEPGF053 Monitoring and Cataloguing the Progress of Synchrotron Experiments, Data Reduction, and Data Analysis at Diamond Light Source From a User's Perspective experiment, software, detector, interface 1
 
  • J. Aishima
    SLSA, Clayton, Australia
  • A. Ashton, S. Fisher, K. Levik, G. Winter
    DLS, Oxfordshire, United Kingdom
 
  The high data rates produced by the latest generation of detectors, more efficient sample handling hardware and ever more remote users of the beamlines at Diamond Light Source require improved data reduction and data analysis techniques to maximize their benefit to scientists. In this paper some of the experiment data reduction and analysis steps are described, including real time image analysis with DIALS, our Fast DP and xia2-based data reduction pipelines, and Fast EP phasing and Dimple difference map calculation pipelines that aim to rapidly provide feedback about the recently completed experiment. SynchWeb, an interface to an open source laboratory information management system called ISPyB (co-developed at Diamond and the ESRF), provides a modern, flexible framework for managing samples and visualizing the data from all of these experiments and analyses, including plots, images, and tables of the analysed and reduced data, as well as showing experimental metadata, sample information.  
 
WEPGF068 Formalizing Expert Knowledge in order to Analyse CERN's Control Systems controls, monitoring, software, operation 1
 
  • A. Voitier, M. Gonzalez-Berges, F.M. Tilaro
    CERN, Geneva, Switzerland
  • M. Roshchin
    Siemens AG, Corporate Technology, München, Germany
 
  The automation infrastructure needed to reliably run CERN's accelerator complex and its experiments produces large and diverse amounts of data, besides physics data. Over 600 industrial control systems with about 45 million parameters store more than 100 terabytes of data per year. At the same time a large technical expertise in this domain is collected and formalized. The study is based on a set of use cases classified into three data analytics domains applicable to CERN's control systems: online monitoring, fault diagnosis and engineering support. A known root cause analysis concerning gas system alarms flooding was reproduced with Siemens' Smart Data technologies and its results were compared with a previous analysis. The new solution has been put in place as a tool supporting operators during breakdowns in a live production system. The effectiveness of this deployment suggests that these technologies can be applied to more cases. The intended goals would be to increase CERN's systems reliability and reduce analysis efforts from weeks to hours. It also ensures a more consistent approach for these analyses by harvesting a central expert knowledge base available at all times.  
poster icon Poster WEPGF068 [1.468 MB]