Data Management, Analytics & Visualisation
Paper Title Page
WED3O01 MASSIVE: an HPC Collaboration to Underpin Synchrotron Science 1
  • W.J. Goscinski
    Monash University, Faculty of Science, Clayton, Victoria, Australia
  • K. Bambery, C.J. Hall, A. Maksimenko, S. Panjikar, D. Paterson, C.G. Ryan, M. Tobin
    ASCo, Clayton, Victoria, Australia
  • C.U. Felzmann
    SLSA, Clayton, Australia
  • C. Hines, P. McIntosh
    Monash University, Clayton, Australia
  • D.A. Thompson
    CSIRO ATNF, Epping, Australia
  MASSIVE is the Australian specialised High Performance Computing facility for imaging and visualisation. The project is a collaboration between Monash University, Australian Synchrotron and CSIRO. MASSIVE underpins a range of advanced instruments, with a particular focus on Australian Synchrotron beamlines. This paper will report on the outcomes of the MASSIVE project since 2011, in particular focusing on instrument integration, and interactive access. MASSIVE has developed a unique capability that supports an increasing number of researchers generating and processing instrument data. The facility runs an instrument integration program to help facilities move data to an HPC environment and provide in-experiment data processing. This capability is best demonstrated at the Imaging and Medical Beamline where fast CT reconstruction and visualisation is now essential to performing effective experiments. The MASSIVE Desktop provides an easy method for researchers to begin using HPC, and is now an essential tool for scientists working with large datasets, including large images and other types of instrument data.  
slides icon Slides WED3O01 [28.292 MB]  
WED3O02 Databroker: An Interface for NSLS-II Data Management System 1
  • A. Arkilic, D.B. Allan, D. Chabot, L.R. Dalesio, W.K. Lewis
    BNL, Upton, Long Island, New York, USA
  Funding: Brookhaven National Lab, U.S. Department of Energy
A typical experiment involves not only the raw data from a detector, but also requires additional data from the beamline. This information is largely kept separated and manipulated individually, to date. A much more effective approach is to integrate these different data sources, and make these easily accessible to data analysis clients. NSLS-II data flow system contains multiple backends with varying data types. Leveraging the features of these (metadatastore, filestore, channel archiver, and Olog), this library provides users with the ability to access experimental data. This service acts as a single interface for time series, data attribute, frame data access and other experiment related information.
slides icon Slides WED3O02 [2.940 MB]  
WED3O03 MADOCA II Data Logging System Using NoSQL Database for SPRING-8 1
  • A. Yamashita, M. Kago
    JASRI/SPring-8, Hyogo-ken, Japan
  The data logging system for SPring-8 was upgraded to the new system using NoSQL database, as a part of a MADOCA II framework. It has been collecting all the log data required for accelerator control without any trouble since the upgrade. In the past, the system powered by a relational database management system (RDBMS) had been operating since 1997. It had grown with the development of accelerators. However, the system with RDBMS became difficult to handle new requirements like variable length data storage, data mining from large volume data and fast data acquisition. New software technologies gave solution for the problems. In the new system, we adopted two NoSQL databases, Apache Cassandra and Redis, for data storage. Apache Cassandra is utilized for perpetual archive. It is a scalable and highly available column oriented database suitable for time series data. Redis is used for the real time data cache because of a very fast in-memory key-value store. Data acquisition part of the new system was also built based on ZeroMQ message packed by MessagePack. The operation of the new system started in January 2015 after the long term evaluation over one year.  
slides icon Slides WED3O03 [0.508 MB]  
WED3O04 HDB++: A New Archiving System for TANGO 1
  • L. Pivetta, C. Scafuri, G. Scalamera, G. Strangolino, L. Zambon
    Elettra-Sincrotrone Trieste S.C.p.A., Basovizza, Italy
  • R. Bourtembourg, J.L. Pons, P.V. Verdier
    ESRF, Grenoble, France
  The TANGO release 8 led to several enhancements, including the adoption of the ZeroMQ library for faster and lightweight event-driven communication. Exploiting these improved capabilities, a high performance, event-driven archiving system written in C++ has been developed. It inherits the database structure from the existing TANGO Historical Data Base (HDB) and introduces new storage architecture possibilities, better internal diagnostic capabilities and an optimized API. Its design allows storing data into traditional database management systems such as MySQL or into NoSQL database such as Apache Cassandra. This paper describes the software design of the new HDB++ archiving system, the current state of the implementation and gives some performance figures and use cases.  
slides icon Slides WED3O04 [1.397 MB]  
WED3O05 Big Data Analysis and Analytics with MATLAB 1
  • D.S. Willingham
    ASCo, Clayton, Victoria, Australia
  Overview using Data Analytics to turn large volumes of complex data into actionable information can help you improve design and decision-making processes. In today's world, there is an abundance of data being generated from many different sources. However, developing effective analytics and integrating them into existing systems can be challenging. Big data represents an opportunity for analysts and data scientists to gain greater insight and to make more informed decisions, but it also presents a number of challenges. Big data sets may not fit into available memory, may take too long to process, or may stream too quickly to store. Standard algorithms are usually not designed to process big data sets in reasonable amounts of time or memory. There is no single approach to big data. Therefore, MATLAB provides a number of tools to tackle these challenges. In this paper 2 case studies will be presented: 1. Manipulating and doing computations on big datasets on light weight machines; 2. Visualising big, multi-dimensional datasets Developing Predictive Models High performance computing with clusters and Cloud Integration with Databases, HADOOP and Big Data Environments.  
slides icon Slides WED3O05 [12.369 MB]  
Data Streaming - Efficient Handling of Large and Small (Detector) Data at the Paul Scherrer Institute  
  • S.G. Ebner
    PSI, Villigen, Villigen, Switzerland
  • H.R. Billich, H. Brands, E.H. Panepucci, L. Sala
    PSI, Villigen PSI, Switzerland
  For the latest generation of detectors transmission, persistence and reading of data becomes a bottleneck. Following the traditional pattern acquisition-persistence-analysis leads to a massive delay before information on the data is available. This prevents the efficient use of beamtime for users. Also, sometimes, single nodes cannot keep up in receiving and persisting data. PSI is breaking up with the traditional data acquisition paradigm for its detectors and is focusing on data streaming, to address these issues. Data is immediately streamed out directly after acquisition. The resulting stream is either retrieved by a node next to the storage to persist the data, or split up to enable parallel persistence, as well as online processing and monitoring. The concepts, designs, and software involved in the current implementation for the Pilatus, Eiger , PCO Edge and Gigafrost detectors at SLS, as well as what we are going to use for the Jungfrau detector and the whole beam synchronous data acquisition system at SwissFEL, will be shown. It will be shown how load-balancing, scalability, extensibility and immediate feedback are achieved, while reducing overall software complexity.  
slides icon Slides WED3O06 [2.264 MB]  
WEM310 How Cassandra Improves Performances and Availability of HDB++ Tango Archiving System 1
  • R. Bourtembourg, J.L. Pons, P.V. Verdier
    ESRF, Grenoble, France
  The TANGO release 8 led to several enhancements, including the adoption of the ZeroMQ library for faster and lightweight event-driven communication. Exploiting these improved capabilities, a high performance, event-driven archiving system, named Tango HDB++*, has been developed. Its design gives the possibility to store archiving data into Apache Cassandra: a high performance scalable NoSQL distributed database, providing High Availability service and replication, with no single point of failure. HDB++ with Cassandra will open up new perspectives for TANGO in the era of big data and will be the starting point of new big data analytics/data mining applications, breaking the limits of the archiving systems which are based on traditional relational databases. This paper describes the current state of the implementation and our experience with Apache Cassandra in the scope of the Tango HDB++ project. It also gives some performance figures and use cases where using Cassandra with Tango HDB++ is a good fit.
* HDB++ project is the result of a collaboration between the Elettra synchrotron (Trieste) and the European Radiation Synchrotron Facility (Grenoble)
slides icon Slides WEM310 [1.010 MB]  
poster icon Poster WEM310 [2.415 MB]  
WEPGF036 Data Categorization And Storage Strategies At RHIC 1
  • S. Binello, K.A. Brown, T. D'Ottavio, R.A. Katz, J.S. Laster, J. Morris, J. Piacentino
    BNL, Upton, Long Island, New York, USA
  Funding: Work supported by Brookhaven Science Associates, LLC under Contract No. DE-SC0012704 with the U.S. Department of Energy.
This past year the Controls group within the Collider Accelerator Department at Brookhaven National Laboratory replaced the Network Attached Storage (NAS) system that is used to store software and data critical to the operation of the accelerators. The NAS also serves as the initial repository for all logged data. This purchase was used as an opportunity to categorize the data we store, and review and evaluate our storage strategies. This was done in the context of an existing policy that places no explicit limits on the amount of data that users can log, no limits on the amount of time that the data is retained at its original resolution, and that requires all logged data be available in real-time. This paper will describe how the data was categorized, and the various storage strategies used for each category.
poster icon Poster WEPGF036 [0.337 MB]  
WEPGF037 Data Lifecycle in Large Experimental Physics Facilities: The Approach of the Synchrotron ELETTRA and the Free Electron Laser FERMI 1
  • F. Billè, R. Borghes, F. Brun, V. Chenda, A. Curri, V. Duic, D. Favretto, G. Kourousias, M. Lonza, M. Prica, R. Pugliese, M. Scarcia, M. Turcinovich
    Elettra-Sincrotrone Trieste S.C.p.A., Basovizza, Italy
  Often the producers of Big Data face the emerging problem of Data Deluge. Nevertheless experimental facilities such as synchrotrons and free electron lasers may have additional requirements, mostly related to the necessity of managing the access for thousands of scientists. A complete data lifecycle describes the seamless path that joins distinct IT tasks such as experiment proposal management, user accounts, data acquisition and analysis, archiving, cataloguing and remote access. This paper presents the data lifecycle of the synchrotron ELETTRA and the free electron laser FERMI. With the focus on data access, the Virtual Unified Office (VUO) is presented. It is a core element in scientific proposal management, user information DB, scientific data oversight and remote access. Eventually are discussed recent developments of the beamline software, that holds the key role to data and metadata acquisition but also requires integration with the rest of the system components in order to provide data cataloging, data archiving and remote access. The scope of this paper is to disseminate the current status of a complete data lifecycle, discuss key issues and hint on the future directions.  
poster icon Poster WEPGF037 [1.110 MB]  
WEPGF038 A Flexible System for End-User Data Visualisation, Analysis Prototyping and Experiment Logbook 1
  • R. Borghes, V. Chenda, G. Kourousias, M. Lonza, M. Prica, M. Scarcia
    Elettra-Sincrotrone Trieste S.C.p.A., Basovizza, Italy
  Experimental facilities like synchrotrons and free electron lasers, often aim at well defined data workflows tightly integrated with their control systems. Still such facilities are also service providers to visiting scientists. The hosted researchers often have requirements different than those present in the established processes. The most evident needs are those for i) flexible experimental data visualisation, ii) rapid prototyping of analysis methods, and iii) electronic logbook services. This paper reports on the development of a software system, collectively referred to as DonkiTools, that aims at satisfying the aforementioned needs for the synchrotron ELETTRA and the free electron laser FERMI. The design strategy is outlined and includes topics regarding: dynamic data visualisation, Python scripting of analysis methods, integration with the TANGO distributed control system, electronic logbook with automated metadata reporting, usability, customization, and extensibility. Finally a use case presents a full deployment of the system, integrated with the FermiDAQ data collection system, in the free electron laser beamline EIS-TIMEX.  
poster icon Poster WEPGF038 [1.011 MB]  
WEPGF041 Monitoring Mixed-Language Applications with Elastic Search, Logstash and Kibana (ELK) 1
  • O.Ø. Andreassen, C. Charrondière, A. De Dios Fuente
    CERN, Geneva, Switzerland
  Application logging and system diagnostics is nothing new. Ever since we had the first computers scientist and engineers have been storing information about their systems, making it easier to understand what is going on and, in case of failures, what went wrong. Unfortunately there are as many different standards as there are file formats, storage types, locations, operating systems, etc. Recent development in web technology and storage has made it much simpler to gather all the different information in one place and dynamically adapt the display. With the introduction of Logstash with Elasticsearch as a backend, we store, index and query data, making it possible to display and manipulate data in whatever form one wishes. With Kibana as a generic and modern web interface on top, the information can be adapted at will. In this paper we will show how we can process almost any type of structured or unstructured data source. We will also show how data can be visualised and customised on a per user basis and how the system scales when the data volume grows.  
poster icon Poster WEPGF041 [3.843 MB]  
WEPGF042 Scalable Web Broadcasting for Historical Industrial Control Data 1
  • B. Copy, O.Ø. Andreassen, Ph. Gayet, M. Labrenz, H. Milcent, F. Piccinelli
    CERN, Geneva, Switzerland
  With the wide-spread use of asynchronous web communication mechanisms like WebSockets and WebRTC, it has now become possible to distribute industrial controls data originated in field devices or SCADA software in a scalable and event-based manner to a large number of web clients in the form of rich interactive visualizations. There is however no simple, secure and performant way yet to query large amounts of aggregated historical data. This paper presents an implementation of a tool, able to make massive quantities of pre-indexed historical data stored in ElasticSearch available to a large amount of web-based consumers through asynchronous web protocols. It also presents a simple, Opensocial-based dashboard architecture, that allows users to configure and organize rich data visualizations (based on Highcharts Javascript libraries) and create navigation flows in a responsive mobile-friendly user interface. Such techniques are used at CERN to display interactive reports about the status of the LHC infrastructure (e.g. vacuum or cryogenics installations) and give access to fine-grained historical data stored in the LHC Logging database in a matter of seconds.

poster icon Poster WEPGF042 [1.052 MB]  
WEPGF043 Metadatastore: A Primary Data Store for NSLS-2 Beamlines 1
  • A. Arkilic, D.B. Allan, T.A. Caswell, L.R. Dalesio, W.K. Lewis
    BNL, Upton, Long Island, New York, USA
  Funding: Department of Energy, Brookhaven National Lab
The beamlines at NSLS-II are among the highest instrumented, and controlled of any worldwide. Each beamline can produce unstructured data sets in various formats. This data should be made available for data analysis and processing for beamline scientists and users. Various data flow systems are in place in numerous synchrotrons, however these are very domain specific and cannot handle such unstructured data. We have developed a data flow service, metadatastore, that manages experimental data in NSLS-II beamlines. This service enables data analysis and visualization clients to access this service either directly or via databroker api in a consistent and partition tolerant fashion, providing a reliable and easy to use interface to our state-of-the-art beamlines.
WEPGF044 Filestore: A File Management Tool for NSLS-II Beamlines 1
  • A. Arkilic, T.A. Caswell, D. Chabot, L.R. Dalesio, W.K. Lewis
    BNL, Upton, Long Island, New York, USA
  Funding: Brookhaven National Lab, Departmet of Energy
NSLS-II beamlines can generate 72,000 data sets per day resulting in over 2 M data sets in one year. The large amount of data files generated by our beamlines poses a massive file management challenge. In response to this challenge, we have developed filestore, as means to provide users with an interface to stored data. By leveraging features of Python and MongoDB, filestore can store information regarding the location of a file, access and open the file, retrieve a given piece of data in that file, and provide users with a token, a unique identifier allowing them to retrieve each piece of data. Filestore does not interfere with the file source or the storage method and supports any file format, making data within files available for NSLS-II data analysis environment.
poster icon Poster WEPGF044 [0.849 MB]  
WEPGF045 Large Graph Visualization of Millions of connections in the CERN Control System Network Traffic: Analysis and Design of Routing and Firewall Rules with a New Approach 1
  • L. Gallerani
    CERN, Geneva, Switzerland
  The CERN Technical Network (TN) TN was intended to be a network for accelerator and infrastructure operations. However, today, more than 60 Million IP packets are routed every hour between the General Purpose Network (GPN) and the TN involving more than 6000 different hosts. In order to improve the security of the accelerator control system, it is fundamental to understand the network traffic between the two networks in order to define appropriate routing and firewall rules without impacting Operations. The complexity and huge size of the infrastructure and the number of protocols and services involved have discouraged for years any attempt to understand and control the network traffic between the GPN and the TN. In this talk, we will show a new way to solve the problem graphically. Combining the network traffic analysis with the use of large graph visualization algorithms we produce comprehensible and usable 2D large colour topology graphs mapping the complex network relations of the control system machines and services in a detail and clarity never seen before. The talk integrates very interesting pictures and video of the graphical analysis attempt.  
poster icon Poster WEPGF045 [6.804 MB]  
WEPGF046 Towards a Second Generation Data Analysis Framework for LHC Transient Data Recording 1
  • S. Boychenko, C. Aguilera-Padilla, M. Dragu, M.A. Galilée, J.C. Garnier, M. Koza, K.H. Krol, R. Orlandi, M.C. Poeschl, T.M. Ribeiro, K.S. Stamos, M. Zerlauth
    CERN, Geneva, Switzerland
  • M. Zenha-Rela
    University of Coimbra, Coimbra, Portugal
  During the last two years, CERNs Large Hadron Collider (LHC) and most of its equipment systems were upgraded to collide particles at an energy level twice higher compared to the first operational period between 2010 and 2013. System upgrades and the increased machine energy represent new challenges for the analysis of transient data recordings, which have to be both dependable and fast. With the LHC having operated for many years already, statistical and trend analysis across the collected data sets is a growing requirement, highlighting several constraints and limitations imposed by the current software and data storage ecosystem. Based on several analysis use-cases, this paper highlights the most important aspects and ideas towards an improved, second generation data analysis framework to serve a large variety of equipment experts and operation crews in their daily work.  
poster icon Poster WEPGF046 [0.497 MB]  
WEPGF047 Smooth Migration of CERN Post Mortem Service to a Horizontally Scalable Service 1
  • J.C. Garnier, C. Aguilera-Padilla, S. Boychenko, M. Dragu, M.A. Galilée, M. Koza, K.H. Krol, T. Martins Ribeiro, R. Orlandi, M.C. Poeschl, M. Zerlauth
    CERN, Geneva, Switzerland
  The Post Mortem service for CERNs accelerator complex stores and analyses transient data recordings of various equipment systems following certain events, like a beam dump or magnet quenches. The main purpose of this framework is to provide fast and reliable diagnostic to the equipment experts and operation crews to decide whether accelerator operation can continue safely or whether an intervention is required. While the Post Mortem System was initially designed to serve CERNs Large Hadron Collider (LHC), the scope has been rapidly extended to include as well External Post Operational Checks and Injection Quality Checks in the LHC and its injector complex. These new use cases impose more stringent time-constraints on the storage and analysis of data, calling to migrate the system towards better scalability in terms of storage capacity as well as I/O throughput. This paper presents an overview on the current service, the ongoing investigations and plans towards a scalable data storage solution and API, as well as the proposed strategy to ensure an entirely smooth transition for the current Post Mortem users.  
poster icon Poster WEPGF047 [1.449 MB]  
WEPGF049 The Unified Anka Archiving System - a Powerful Wrapper to Scada Systems Like Tango and Wincc Oa 1
  • D. Haas, S.A. Chilingaryan, A. Kopmann, W. Mexner, D. Ressmann
    KIT, Eggenstein-Leopoldshafen, Germany
  ANKA realized a new unified archiving system for the typical synchrotron control systems by integrating their logging databases into the "Advanced Data Extraction Infrastructure" (ADEI). ANKA's control system environment is heterogeneous: some devices are integrated into the Tango archiving system, other sensors are logged by the Supervisory Control and Data Acquisition (SCADA) system WinCC OA. For both systems modules exist to configure the pool of sensors to be archived in the individual control system databases. ADEI has been developed to provide a unified data access layer for large time-series data sets. It supports internal data processing, caching, data aggregation and fast visualization in the web. Intelligent caching strategies ensure fast access even to huge data sets stored in the attached data sources like SQL databases. With its data abstraction layer the new ANKA archiving system is the foundation for automated monitoring while keeping the freedom to integrate nearly any control system flavor. The ANKA archiving system has been introduced successfully at three beamlines. It is operating stable since about one year and it is intended to extend it to the whole facility.  
poster icon Poster WEPGF049 [1.066 MB]  
WEPGF050 Integrated Detector Control and Calibration Processing at the European XFEL 1
  • A. Münnich, S. Hauf, B.C. Heisen, F. Januschek, M. Kuster, P.M. Lang, N. Raab, T. Rüter, J. Sztuk, M. Turcato
    XFEL. EU, Hamburg, Germany
  The European X-ray Free Electron Laser is a high-intensity X-ray light source currently being constructed in the area of Hamburg, that will provide spatially coherent X-rays in the energy range between 0.25 keV and 25 keV. The machine will deliver 10 trains/s, consisting of up to 2700 pulses, with a 4.5 MHz repetition rate. The LPD, DSSC and AGIPD detectors are being developed to provide high dynamic-range Mpixel imaging capabilities at the mentioned repetition rates. A consequence of these detector characteristics is that they generate raw data volumes of up to 15 Gbyte/s. In addition the detector's on-sensor memory-cell and multi-/non-linear gain architectures pose unique challenges in data correction and calibration, requiring online access to operating conditions and control settings. We present how these challenges are addressed within XFEL's control and analysis framework Karabo, which integrates access to hardware conditions, acquisition settings (also using macros) and distributed computing. Implementation of control and calibration software is mainly in Python, using self-optimizing (py) CUDA code, numpy and iPython parallels to achieve near-real time performance for calibration application.  
poster icon Poster WEPGF050 [3.425 MB]  
Data Management and Visualization with Acquaman  
  • D. Hunter, D.K. Chevrier, R. Feng, I. Workman
    CLS, Saskatoon, Saskatchewan, Canada
  The Acquaman framework, developed at the Canadian Light Source, provides high-level user interfaces and experiment control with a scientific focus. Currently, it is the primary interface on the SGM, VESPERS and IDEAS beamlines and is the interface for the REIXS XES and SXRMB microprobe endstations. Synchrotron scientists collect large amounts of data which can become untenable - particularly for repeat users. There are many tools that the Acquaman user interfaces offer in terms of data management, visualization, and accessibility. This poster will show how these various systems work together to visualize data at run time, organize collected data after the fact, inspect previous scan configurations, and export data into relevant output formats. A focal point will be demonstrating how the system visualizes data in the same manner as it was collected enabling previous scans to be rerun or new scans to be configured.  
WEPGF052 Development of the J-PARC Time-Series Data Archiver using a Distributed Database System, II 1
  • N. Kikuzawa, A. Yoshii
    JAEA/J-PARC, Tokai-Mura, Naka-Gun, Ibaraki-Ken, Japan
  • H. Ikeda, Y. Kato
    JAEA, Ibaraki-ken, Japan
  The linac and the RCS in J-PARC (Japan Proton Accelerator Research Complex) have over 64000 EPICS records, providing enormous data to control much equipment. The data has been collected into PostgreSQL, while we are planning to replace it with HBase and Hadoop, a well-known distributed database and a distributed file system that HBase depends on. In the previous conference it was reported that we had constructed an archive system with a new version of HBase and Hadoop that cover a single point of failure, although we realized there were some issues to make progress into a practical phase. In order to revise the system with resolving the issues, we have been reconstructing the system with replacing master nodes with reinforced hardware machines, creating a kickstart file and scripts to automatically set up a node, introducing a monitoring tool to early detect flaws without fail, etc. In this paper these methods are reported, and the performance tests for the new system with accordingly fixing some parameters in HBase and Hadoop, are also examined and reported.  
WEPGF053 Monitoring and Cataloguing the Progress of Synchrotron Experiments, Data Reduction, and Data Analysis at Diamond Light Source From a User's Perspective 1
  • J. Aishima
    SLSA, Clayton, Australia
  • A. Ashton, S. Fisher, K. Levik, G. Winter
    DLS, Oxfordshire, United Kingdom
  The high data rates produced by the latest generation of detectors, more efficient sample handling hardware and ever more remote users of the beamlines at Diamond Light Source require improved data reduction and data analysis techniques to maximize their benefit to scientists. In this paper some of the experiment data reduction and analysis steps are described, including real time image analysis with DIALS, our Fast DP and xia2-based data reduction pipelines, and Fast EP phasing and Dimple difference map calculation pipelines that aim to rapidly provide feedback about the recently completed experiment. SynchWeb, an interface to an open source laboratory information management system called ISPyB (co-developed at Diamond and the ESRF), provides a modern, flexible framework for managing samples and visualizing the data from all of these experiments and analyses, including plots, images, and tables of the analysed and reduced data, as well as showing experimental metadata, sample information.  
WEPGF056 Flyscan: a Fast and Multi-technique Data Acquisition Platform for the SOLEIL Beamlines 1
  • N. Leclercq, J. Bisou, F. Blache, F. Langlois, S. Lê, K. Medjoubi, C. Mocuta, S. Poirier
    SOLEIL, Gif-sur-Yvette, France
  SOLEIL is continuously optimizing its 29 beamlines in order to provide its users with state of the art synchrotron radiation based experimental techniques. Among the topics addressed by the related transversal projects, the enhancement of the computing tools is identified as a high priority task. In this area, the aim is to optimize the beam time usage providing the users with a fast, simultaneous and multi-technique scanning platform. The concrete implementation of this general concept allows the users to acquire more data in the same amount of beam time. The present paper provides the reader with an overview of so call 'Flyscan' project currently under deployment at SOLEIL. It notably details a solution in which an unbounded number of distributed actuators and sensors share a common trigger clock and deliver their data into temporary files. The latter are immediately merged into common file(s) in order to make the whole experiment data available for on-line processing and visualization. Some application examples are also commented in order to illustrate the advantages of the Flyscan approach.  
poster icon Poster WEPGF056 [2.335 MB]  
WEPGF059 The Australian Store. Synchrotron Data Management Service for Macromolecular Crystallography 1
  • G.R. Meyer, S. Androulakis, P.J. Bertling, A.M. Buckle, W.J. Goscinski, D. Groenewegen, C. Hines, A. Kannan, S. McGowan, S.M. Quenette, J. Rigby, P. Splawa-Neyman, J.M. Wettenhall
    Monash University, Clayton, Australia
  • D. Aragao, T. Caradoc-Davies, N. Mudie
    SLSA, Clayton, Australia
  • C.S. Bond
    University of Western Australia, Crawley, Australia
  Store. Synchrotron is a service for management and publication of diffraction data from the macromolecular crystallography (MX) beamlines of the Australian Synchrotron. Since the start of the development, in 2013, the service has handled over 51.8 TB of raw data (~ 4.1 million files). Raw data and autoprocessing results are made available securely via the web and SFTP so experimenters can sync it to their labs for further analysis. With the goal of becoming a large public repository of raw diffraction data, a guided publishing workflow which optionally captures discipline specific information was built. The MX-specific workflow links PDB coordinates from the PDB to raw data. An optionally embargoed DOI is created for convenient citation. This repository will be a valuable tool for crystallography software developers. To support complex projects, integration of other instruments such as microscopes is underway. We developed an application that captures any data from instrument computers, enabling centralised data management without the need for custom ingestion workflows. The next step is to integrate the hosted data with interactive processing and analysis tools on virtual desktops.  
poster icon Poster WEPGF059 [2.109 MB]  
WEPGF060 A Data Management Infrastructure for Neutron Scattering Experiments in J-PARC/MLF 1
  • K. Moriyama, T. Nakatani
    JAEA/J-PARC, Tokai-Mura, Naka-Gun, Ibaraki-Ken, Japan
  The role of data management is one of the greatest contributions in the research workflow for scientific experiments such as neutron scattering. The facility is required to safely and efficiently manage a huge amount of data over the long duration, and provide an effective data access for facility users promoting the creation of scientific results. In order to meet these requirements, we are operating and updating a data management infrastructure in J-PAPC/MLF, which consists of the web-based integrated data management system called the MLF Experimental Database (MLF EXP-DB), the hierarchical raw data repository composed of distributed storages, and the integrated authentication system. The MLF EXP-DB creates experimental data catalogues in which raw data, measurement logs, and other contextual information on sample, experimental proposal, investigator, etc. are interrelated. This system conducts the reposition, archive and on-demand retrieve of raw data in the repository. Facility users are able to access the experimental data via a web portal. This contribution presents the overview of our data management infrastructure, and the recent updated features for high availability, scaling-out, and flexible data retrieval in the MLF EXP-DB.  
poster icon Poster WEPGF060 [1.017 MB]  
WEPGF061 Beam Trail Tracking at Fermilab 1
  • D.J. Nicklaus, L.R. Carmichael, R. Neswold, Z.Y. Yuan
    Fermilab, Batavia, Illinois, USA
  This paper presents a system for acquiring and sorting data from select devices depending on the destination of each particular beam pulse in the Fermilab accelerator chain. The 15 Hz beam that begins in the Fermilab Linac can be directed to a variety of additional accelerators, beam lines, beam dumps, and experiments. We have implemented a data acquisition system that senses the destination of each pulse and reads the appropriate beam intensity devices so that profiles of the beam can be stored and analyzed for each type of beam trail. It is envisioned that this data will be utilized long term to identify trends in the performance of the accelerators.  
poster icon Poster WEPGF061 [2.194 MB]  
WEPGF062 Processing High-Bandwidth Bunch-by-Bunch Observation Data from the RF and Transverse Damper Systems of the LHC 1
  • M. Ojeda Sandonís, P. Baudrenghien, A.C. Butterworth, J. Galindo, W. Höfle, T.E. Levens, J.C. Molendijk, D. Valuch
    CERN, Geneva, Switzerland
  • F. Vaga
    University of Pavia, Pavia, Italy
  The radiofrequency and transverse damper feedback systems of the Large Hadron Collider digitize beam phase and position measurements at the bunch repetition rate of 40 MHz. Embedded memory buffers allow a few milliseconds of full rate bunch-by-bunch data to be retrieved over the VME bus for diagnostic purposes, but experience during LHC Run I has shown that for beam studies much longer data records are desirable. A new "observation box" diagnostic system is being developed which parasitically captures data streamed directly out of the feedback hardware into a Linux server through an optical fiber link, and permits processing and buffering of full rate data for around one minute. The system will be connected to an LHC-wide trigger network for detection of beam instabilities, which allows efficient capture of signals from the onset of beam instability events. The data will be made available for analysis by client applications through interfaces which are exposed as standard equipment devices within CERN's controls framework. It is also foreseen to perform online Fourier analysis of transverse position data inside the observation box using GPUs with the aim of extracting betatron tune signals.  
poster icon Poster WEPGF062 [4.408 MB]  
WEPGF063 Developing HDF5 for the Synchrotron Community 1
  • N.P. Rees
    DLS, Oxfordshire, United Kingdom
  • H.R. Billich
    PSI, Villigen PSI, Switzerland
  • A. Götz
    ESRF, Grenoble, France
  • Q. Koziol, E. Pourmal
    The HDF Group, Champaign, Illinois, USA
  • M. Rissi
    DECTRIS Ltd., Baden, Switzerland
  • E. Wintersberger
    DESY, Hamburg, Germany
  HDF5 and NeXus (which normally uses HDF5 as its underlying format) have been widely touted as a standard for storing Photon and Neutron data. They offer many advantages to other common formats and are widely used at many facilities. However, it has been found that the existing implementations of these standards have limited the performance of some recent detector systems. This paper describes how the synchrotron light source community has worked closely with The HDF Group to drive changes to the HDF5 software to make it more suitable for their environment. This includes developments managed by a detector manufacturer (Dectris - for direct chunk writes) as well as synchrotrons (DESY, ESRF and Diamond - for pluggable filters, Single Writer/Multiple Reader and Virtual Data Sets).  
poster icon Poster WEPGF063 [0.718 MB]  
Developing the Neutron Event Data Infrastructure for a Greenfield Site  
  • T.S. Richter, M.E. Hagen, T. Holm Rod, J.W. Taylor
    ESS, Copenhagen, Denmark
  The European Spallation Source (ESS) a neutron facility that is being build on a greenfield site with no existing host organisation but with contributions from 17 partner nations. Within it the Data Management and Software Centre (DMSC) is responsible for delivering an integrated package for a scientific control interface for data acquisition, the data readout, processing, visualisation, analysis and data management. ESS will generate data almost exclusively in event mode, recording every neutron detection individually with spatial and time coordinates. This offers the most flexibility for later processing, but requires an extensive infrastructure to meet the goals of online visualisation. In this paper an overview of what can be solved with existing technology and where new developments are needed, will be presented. The data readout, streaming and file writing aspects will be highlighted specifically.  
WEPGF065 Illustrate the Flow of Monitoring Data through the MeerKAT Telescope Control Software 1
  • M.J. Slabber, M.T. Ockards
    SKA South Africa, National Research Foundation of South Africa, Cape Town, South Africa
  Funding: SKA-SA National Research Foundation (South Africa)
The MeerKAT telescope, under construction in South Africa, is comprised of a large set of elements. The elements expose various sensors to the Control and Monitoring (CAM) system, and the sampling strategy set by CAM per sensor varies from several samples a second to infrequent updates. This creates a substantial volume of sensor data that needs to be stored and made available for analysis. We depict the flow of sensor data through the CAM system, showing the various memory buffers, temporary disk storage and mechanisms to permanently store the data in HDF5 format on the network attached storage (NAS).
poster icon Poster WEPGF065 [1.229 MB]  
WEPGF066 A Systematic Measurement Analyzer for LHC Operational Data 1
  • G. Valentino, X. Buffat, D. Kirchner, S. Redaelli
    CERN, Geneva, Switzerland
  The CERN Accelerator Logging Service stores data from hundreds of thousands of parameters and measurements, mostly from the Large Hadron Collider (LHC). The systematic measurement analyzer is a Java-based tool that is used to visualize and analyze various beam measurement data over multiple fills and time intervals during the operational cycle, such as ramp or squeeze. Statistical analysis and various manipulations of data are possible, including correlation with several machine parameters such as β* and energy. Examples of analyses performed include checks of collimator positions, beam losses throughout the cycle and tune stability during the squeeze which is then used for feed-forward purposes.  
poster icon Poster WEPGF066 [2.270 MB]  
WEPGF068 Formalizing Expert Knowledge in order to Analyse CERN's Control Systems 1
  • A. Voitier, M. Gonzalez-Berges, F.M. Tilaro
    CERN, Geneva, Switzerland
  • M. Roshchin
    Siemens AG, Corporate Technology, München, Germany
  The automation infrastructure needed to reliably run CERN's accelerator complex and its experiments produces large and diverse amounts of data, besides physics data. Over 600 industrial control systems with about 45 million parameters store more than 100 terabytes of data per year. At the same time a large technical expertise in this domain is collected and formalized. The study is based on a set of use cases classified into three data analytics domains applicable to CERN's control systems: online monitoring, fault diagnosis and engineering support. A known root cause analysis concerning gas system alarms flooding was reproduced with Siemens' Smart Data technologies and its results were compared with a previous analysis. The new solution has been put in place as a tool supporting operators during breakdowns in a live production system. The effectiveness of this deployment suggests that these technologies can be applied to more cases. The intended goals would be to increase CERN's systems reliability and reduce analysis efforts from weeks to hours. It also ensures a more consistent approach for these analyses by harvesting a central expert knowledge base available at all times.  
poster icon Poster WEPGF068 [1.468 MB]  
WEPGF070 A New Data Acquiring and Query System With Oracle and Epics in the BEPCII 1
  • C.H. Wang, L.F. Li
    IHEP, Beijing, People's Republic of China
  The old historical Oracle database in the BEPCII has been put into operation in 2006, there are some problems such as the program operation instability and EPICS PVs loss, a new data acquiring and query system with Oracle and EPICS has been developed with Eclipse and JCA. On one hand, the authors adopt the technology of the table-space and the table-partition to build a special database schema in Oracle. On another hand, based on RCP and Java, EPICS data acquiring system is developed successfully with a very friendly user interface. It's easy for users to check the status of each PV's connection, manage or maintain the system. Meanwhile, the authors also develop the system of data query, which provides many functions, including data query, data plotting, data exporting, data zooming, etc. This new system has been put into running for three years. It also can be applied to any EPICS control systems.
*supported by NFSC(1137522)
poster icon Poster WEPGF070 [0.876 MB]  
WEPGF071 Python Scripting for Instrument Control and Online Data Treatment 1
  • N. Xiong, N. Hauser, D. Mannicke
    ANSTO, Menai, New South Wales, Australia
  Scripting is an important feature of instrument control software. It allows scientists to execute a sequence of tasks to run complex experiments, and it makes a software developers' life easier when testing and deploying new features. Modern instrument control applications require easy to develop and reliable scripting support. At ANSTO we provide a Python scripting interface for Gumtree. Gumtree is an application that provides three features; instrument control, data treatment and visualisation for neutron scattering instruments. The scripting layer has been used to coordinate these three features. The language is simple and well documented, so scientists require minimal programming experience. The scripting engine has a web interface so that users can use a web browser to run scripts remotely. The script interface has a numpy-like library that makes data treatment easier. It also has a GUI library that automatically generates control panels for scripts. The same script can be loaded in both the workbench (desktop) application and the web service application for online data treatment. In both cases a GUI will be generated with similar look and feel.
* Gumtree T. Lam, N. Hauser, A. Gotz, P. Hathaway, F. Franceschini, H. Rayner, GumTree. An integrated scientific experiment environment, Physica B 385-386, 1330-1332 (2006)
poster icon Poster WEPGF071 [2.727 MB]  
WEPGF072 Parameters Tracking and Fault Diagnosis base on NoSQL Database at SSRF 1
  • Y.B. Yan, Z.C. Chen, L.W. Lai, Y.B. Leng
    SINAP, Shanghai, People's Republic of China
  As a user facility, the reliability and stability are very important. Besides using high-reliability hardware, the rapid fault diagnosis, data mining and predictive analytic s are also effective ways to improve the efficiency of the accelerator. A beam data logging system was built at SSRF, which was based on NoSQL database. The logging system stores beam parameters under some predefined conditions. The details of the system will be reported in this paper.  
WEPGF134 Applying Sophisticated Analytics to Accelerator Data at BNLs Collider-Accelerator Complex: Bridging to Repositories, Tools of Choice, and Applications 1
  • K.A. Brown, P. Chitnis, T. D'Ottavio, J. Morris, S. Nemesure, S. Perez, D.J. Thomas
    BNL, Upton, Long Island, New York, USA
  Funding: Work supported by Brookhaven Science Associates, LLC under Contract No. DE-SC0012704 with the U.S. Department of Energy.
Analysis of accelerator data has traditionally been done using custom tools, either developed locally or at other laboratories. The actual data repositories are openly available to all users, but it can take significant effort to mine the desired data, especially as the volume of these repositories increases to hundreds of terabytes or more. Much of the data analysis is done in real time when the data is being logged. However, sometimes users wish to apply improved algorithms, look for data correlations, or perform more sophisticated analysis. There is a wide spectrum of desired analytics for this small percentage of the problem domains. In order to address this tools have been built that allow users to efficiently pull data out of the repositories but it is then left up to them to post process that data. In recent years, the use of tools to bridge standard analysis systems, such as Matlab, R, or SciPy, to the controls data repositories, has been investigated. In this paper, the tools used to extract data from the repositories, tools used to bridge the repositories to standard analysis systems, and directions being considered for the future, will be discussed.
poster icon Poster WEPGF134 [2.709 MB]