Skip common site navigation and headers
United States Environmental Protection Agency
Great Lakes Monitoring
Begin Hierarchical Links EPA Home > Great Lakes > Monitoring  > Data Projects  > Lake Michigan Mass Balance

R/V Lake Guardian
Indicators
Limnology
Sediments
Air
Data Projects
Fish
Beach closings
Plankton
Biology
Benthic invertebrates

 

Lake Michigan Mass Balance

About the Project About the Data Summary Results
basic information data request metadata quality assurance

Peer Review

Peer Review of the LMMB QA and Data Management Process
April 29-30, 1999


According to the EPA's Peer Review Policy issued June 7, 1994, 

“Major scientifically and technically based work products
related to Agency decisions normally should be peer-reviewed." 

The EPA is mandated to subject all major research projects and reports with policy implications to external peer review by independent experts in each subject field. In April 1999, the GLNPO LMMB Technical Coordinating Committee conducted a two-day peer review to evaluate the verification, data collection, storage, and quality assurance activities associated with the LMMB project. A panel of five international experts were selected for each of the following areas:


Review Panel

Nicolas S. Bloom, Frontier Geosciences Inc.

Brian Fowler, Axys Analytical Services Ltd.

Jeffrey Newcomer, Raytheon Information Technologies & Scientific Services

Craig J. Palmer, University of Nevada, Las Vegas

Carl J. Watras, University of Wisconsin, Madison


Lake Michigan Mass Balance (LMMB) Program 
Quality Assurance and Database Management Review
 

June 8, 1999

Outline

General Comments

I. Project Management

II. Quality Assurance Process
A. Data Verification
B. Data Standard
C. Statistical Assessment

III. Data Management and Release
A. Timing of Data Access
B. Formats for Data Access
C. Separate QA Management and Data Distribution Teams

IV. Particular QC issues
A. Methyl Mercury
B. in vivo Chlorophyll
C. Total Metals
D. PCB congener analyses

V. Jeff Newcomer's Comments
A. Project Management
B. Quality Assurance Process
C. Data Management
D. Data Release/Distribution

General Comments

The Great Lakes National Program Office (GLNPO) and their LMMB data management team has shown an unparalleled commitment to creating a database that combines scientific excellence, system flexibility, and documentable data integrity. A number of management decisions early in the program have contributed to making this project both a scientific and a data management success. First, the EPA and cooperating partners were clearly willing to commit all necessary financial and personnel resources to ensure the projects’ success. We saw little evidence of a need to “cut corners,” as so often happens on projects of this magnitude. Second, the decision to secure the services of top researchers and QA validators in their respective fields assured that the state-of-the-art methodologies needed to conduct an ultra-trace mass balance would be employed and verified creatively and successfully. Finally, the enthusiastic zeal with which the database and QA teams approached their tasks was nothing short of amazing.

In our collective experience, the panel could not recount another case where the data management personnel were so willing to work interactively and flexibly with the data collection groups to make the project an holistic success. The data management group demonstrated excellence in identifying important QA issues, then instituting creative solutions, without (as is so common) throwing all of the issues back on the data originators. This served to make the entire LMMB much more of a cooperative venture than the typical “dictatorial” top-down model of data/QA management.

Although some difficult issues relating to sample identification and QA intercomparability could have been avoided or minimized if addressed early in the planning process (see below), the fact that the management team was willing and able to rapidly learn from past mistakes, and adapt the system to meet project realities has made the project a singular success. The panel recommends that in addition to disseminating the excellent science that resulted from this project, the lessons learned in developing the data and QA management approach will be communicated widely to the project management community, through both presentations, and appropriate publications. The US EPA Norfolk Conference would be an excellent venue.

The choice not to collect methyl mercury data in a project of this nature was considered to be a very significant omission, which, if not corrected in some way, will seriously compromise the ability to perform the food chain bioaccumulation portion of the study for mercury. Although the panel recognizes the thoughtfulness that went into the 1992 decision not to collect this data, due to QA concerns, it strongly recommends that sufficient methyl mercury data now be collected from archived samples or even from new spot sampling to allow appropriate parameterization of the food chain model. Additionally, if the mass balance model is to be accurate, a value for volatile Hg in the lake water column is required. Although some methyl Hg measurements apparently have recently been authorized (e.g., tributary samples), it does not appear as if plans exist to obtain methyl Hg fractions for all critical model components (particularly phytoplankton, zooplankton, precipitation, and lake water).

I. Project Management

To be successful, a program of this magnitude must provide adequate resources to the Quality Assurance and Data Management components of the program. The current level of effort for both of these components is considered appropriate. However, it is apparent that during the initial phases of the program, the resources that were needed to adequately address these components were not fully recognized by management. This resulted in delays in the progress of data verification and in the development of a database. By the time that data verification procedures were undertaken, some of the key personnel at participating laboratories had graduated from their universities or left for other jobs. A consequence was that QA staff reported difficulties in their ability to resolve QA issues.

A unique feature of this program is that many innovative techniques were developed to collect data for a wide variety of media. New procedures require extensive QA in order to improve and eventually validate the methods. The panel noted that some of the new methods may not have been adequately validated prior to implementation. Program management needs to recognize the additional costs associated with methods development and ensure that adequate funding is provided to cover the additional QA activities associated with new measurement methods. 

The overall QA approach to the program began with the development of QA management plan followed by the preparation of individual QA project plans for each activity in the program. Although these QA project plans were reviewed and approved, it is apparent that a number of different approaches were used by the Principal Investigators (PI’s) to QA procedures, such as the frequency and type of quality control samples. The overall program would have benefited from the development of a more standardized approach for use by all PI’s. Perhaps a meeting of the PI’s and QA staff early in the program could have been held to resolve differences in QA approaches and develop a consistent overall approach for the program.

The panel felt that the approach of investigator-defined methods and associated data standards was appropriate to the goals of the project. However, management should require that project-wide standardization of methods be used, and management should clearly identify any methods that are deemed inappropriate. Experience indicates that top-quality researchers and labs often do not readily accept data QA, collection, and analysis methods defined by management even if the defined items are what they would have proposed themselves. It is much better to have management require the definition and standardization of methods that the investigators first define individually and then collaboratively. This allows the researcher groups to use the knowledge and creativity for which they were selected to learn more about each other through the standardization requirement, and also allows them to lead the scientific aspects of the project. Defining these items needs to be done early in the effort. And management should consider using brief trial data collection and processing efforts and subsequent reviews to improve the anticipated results.

A consequence of the investigator-defined methods approach was a wide variety in data reporting formats. Many of the problems identified by the QA staff related to data reporting issues. Early involvement of data management staff with PI’s may have prevented many of these problems, and would have fostered consistency in the use of codes across the program. In future data collection programs, it is recommended that some consideration be given to the development of data collection programs that could be used on portable computers or field data recorders. These could be provided by the program to PI’s to encourage the use of common codes and consistent sample identifiers, as well as common QA approaches.

An important QA activity is the documentation of the quality of data, along with the processes and procedures used to collect the data. The methods compendium prepared for the program is comprehensive and well organized. The approach of developing a quality assurance chapter to accompany the data report for each program component is considered by the panel to be very useful. This approach should help data users evaluate the quality and potential utility of data in a timely fashion.

Program management noted, during the review, that they are planning to include a synthesis of QA results in an executive summary of the program. The panel agrees that some type of overall summary of the QA program, data quality results, and lessons learned would be beneficial, in addition to the individual QA chapters planned for data summary reports. We also encourage staff to prepare a journal paper discussing these topics, as these results are important to the general scientific and monitoring communities.

The QA staff reported a number of instances where the timing of sample collection was not adequately recorded. It is assumed that this was the result of not using some type of sample tracking or chain-of-custody during sample collection. During future data collection programs, managers are encouraged to consider requiring some level of chain-of-custody, as well as more coordination of sampling times and locations between focus groups, to prevent these problems from occurring and to improve the documentation associated with each sample.

Two different contractors (Grace Analytical Services and DynCorp Inc.) participated in QA activities, and a third contractor (AMS Inc.) developed the database. This approach provided EPA with highly qualified staff to address the various tasks. A natural consequence of using several organizations is the need to encourage communication between the groups. It was apparent that the QA manager had been able to assist with this communication, and with the development of a team approach. Some differences in the use of terminology (e.g. focus groups vs. projects) were noted, but these are considered minor issues.

The panel agrees with the approach used, of encouraging the PI’s to provide finalized and selected data -- as opposed to raw data -- for verification. It is considered inappropriate to require the QA staff to routinely review raw instrument output, such as chromatograms. The selection of top quality labs, and then accepting that these personnel are the best qualified to calculate, select, and flag their data was most significant in building a sense of trust and common purpose between the PIs and the QA/data management groups (usually playing the role of mutual antagonists in projects such as this).  

See Jeff Newcomer's Comments on Project Management

II.  Quality Assurance Process

A. Data Verification

The panel feels that given the state-of-the-art nature of the analytical methods, and the requirement for accurate quantification at ultra-trace ambient concentrations, the approach of PI-developed management quality objectives (MQOs) and flagging was largely appropriate. Rechecking flagging conditions by the QA reviewers and by the database functions was very helpful in identifying further discrepancies -- the resolution of which made the finalized database very robust and well-characterized. Having said this however, it appears as though there was not good initial coordination between all groups concerning how to prepare appropriate MQOs and flagging conditions, nor even what types and quantities of QC data were needed to documentably assess data quality. The panel feels that a lesson here is that all PIs, as well as the QA and database teams, should have met for several days, prior to any sample collection, to coordinate QC data, acceptance criteria, and flagging conventions. This would have gone a long way toward avoiding confusion, minimizing the very large number of flags, and providing more robust intercomparability of between-focus group data quality.

An appropriate role of the QA team, if no common ground could be quickly reached, would have been to dictate a minimum common set of QC parameters and frequencies that each focus group would be required to collect. Common definitions for what counts as less-than-detect, estimated, high bias, and low bias, should have been agreed upon in advance. If it is possible to append high and low bias data with a documented, PI-estimated bias factor (i.e. HIGH +85%), that would be an extremely valuable addition for future users of the data. The panel believes that while some data may truly require an INV flag, this is a measure of last resort, and should be applied with the agreement of the PI. In any case, no data -- even that flagged INV -- should be eliminated from the database, and extra care should be taken to explain the reasons for INV data, in case a final user wishes to still employ those results in some way. The invalidated chlorophyll data is clearly a case where an end user might wish to still use it, after applying some type of scaling factor.

The panel believes the auditing criteria and number of audits were appropriate for this project. The audit criteria (at least for the primary analytes) were very comprehensive indeed, and it is assumed that they were applied flexibly, as even extremely good research labs would have difficulty escaping numerous findings if the audit criteria were applied rigidly. The requirement that all labs have approved QAPPs, SOPs, and (usually) 40 CFR 136 MDL studies at the outset of the study provided a solid foundation for the ultimate validity and documentation of the project. The methods compendia are clear, well organized, and can serve as useful references for similar future studies.

Overall, the panel agreed with the appropriateness of using blank and surrogate corrected data, as well as of reporting all final numerical values (no “less than” results) in the final database. All blanks and surrogate correction factors appear to be readily available (to those who can navigate the GLENDA database) with each data point, so that uncorrected data can be recovered if necessary. Several panel members, although agreeing that blank corrected data was appropriate for the study, did qualify that opinion with a sense of caution, especially with regards to organics data. One panel member (NB) enthusiastically embraced the approach used as the one and only closest approximation of true real-world concentrations. He further hopes that other EPA groups, such as the Superfund Program, the Office of Water Research, and NPDES will learn a lesson from this wise and enlightened example. Data reported below detection limits were flagged in meaningful ways, and for the purposes of finishing out the project, the current flagging system is OK. However, ideally, on future projects, a commonly used set of flags should be developed—such as “less than MDL (40 CFR 136),” “less than daily estimated MDL,” and “less than quantitation limit, but above MDL” should be employed by all focus groups. This information, together with what the respective MDLs and PQLs are for each data point, and numerical “bias high” and “bias low” flags, would provide much clarity and added value to future users of the data.

B. Data Standard

The panel agrees that the codes and flags used in the project are not, in and of themselves, intuitively or even easily understood (some became downright cryptic, when an appropriate letter was already used up on another code). This stems from the very large number of codes required, which makes many 8-character codes seem similar. Despite this necessary drawback, the database team has done an excellent job of defining the codes, and making these translations available. Placing the definition itself in the tables next to the codes is very helpful. Overall, the panel thought the dynamic nature of the data standard, and the many nuances of codes, strengthened the QA process on this study. This is particularly the case when using cutting-edge (and even developing) methods, and highly regarded published research labs. The panel agreed that some degree of standardization up front, particularly to the minimum numbers and types of QC samples collected, as well as to overarching definitions of detection and quantification limits, bias, etc., would have helped produce a more elegant, coherent evaluation of the overall data quality. Allowing the PIs to determine acceptance criteria, and apply flagging was excellent, because it requires data gatherers to carefully scrutinize the quality of their data, and that the PIs are the most qualified to understand the quality of their own data. Having said this, it appears that the largely academic PIs were sometimes not very well versed in this type of data quality assessment, and so a pre-project training session (given by the QA leaders) on the rationale and principles of setting DQOs, acceptance criteria, and flagging would have been very helpful.

C. Statistical Assessment

The use of comprehensive overall QC data summary reports for each focus group is excellent. It will give users of the data what they need to address data quality and uncertainty issues, without requiring tedious appeal to the raw QC data itself. The panel feels that the methods used, and level of statistical explanation was appropriate for the QC summary reports. Where sufficient QC data was collected, each statistical attribute was well represented. Unfortunately, some analyte groups collected far fewer QC data which allow separating laboratory and field variability, and assessing bias, than did the atrazine and mercury groups -- meaning that some sections of the QC analysis will not be quantitative for those analytes. A lesson for the future is to have all groups buy off on producing key lab and field QC data at pre-established frequencies, so that full numerical treatment of QC information is possible. More attention should have been paid to repeated analysis of like-matrix CRMs (where possible), or well-characterized real-world intercomparison samples, to allow a further -- and more realistic -- assessment of accuracy than can be obtained from matrix spike recoveries. All results from inter-lab intercomparison exercises, breakthrough studies, and any other secondary validation studies should be summarized in the QC summary reports.

The QA assessment teams calculated the “percent variability due to sampling and analytical measurement uncertainty” wherever enough QC data was available to do so. This is a useful, if somewhat qualitative, measure for quickly assessing geographical and temporal variation in data sets. Ideally, future data collections should require minimal collection of field and lab blanks and replicates to allow this parameter to be calculated for every focus group. The analysis should make clear however -- either through stratified calculations (if enough data exists), or by caveat -- that this analysis typically only applies to data where the variability is no longer substantially affected by the absolute concentration (i.e., above the PQL). For this reason, variability information of samples near the MDL should not be included in the calculation of this parameter unless it is being calculated explicitly for <PQL samples.  

See Jeff Newcomer's Comments on Quality Assurance

III.  Data Management and Release

A. Timing of Data Access

Although portions of the LMMB data are available to individuals and groups upon request, the panel believes that the data should be more openly available to the general science community at this point in the project. We understand clearly that each data set requires a certain period of time to get it to a reasonable level of quality before it can be released to the general scientific community. Experience on other large projects indicates that this period should be one to two years, depending on the type and volume of data. This period gives researchers and graduate students the proper amount of time to calibrate and analyze the data, explain and/or remove the majority of the problems, and publish their first results. After this time, the scientific community becomes aware of the research by way of the publications, and is eager to obtain and use some of the data. From what we heard in the presentations, outside research groups can potentially get the data through the 'buddy system' if LMMB management does not release the data. NASA's perspective has been that it is better for outside groups to obtain the 'not quite perfect' data that is properly (but not necessarily completely) documented through the central system in order to minimize the complaints about restricted access to data collected with public funds.

For data distribution, we would encourage GLNPO to take a staged approach that follows the natural progression of the research and mimics the publication of scientific results. For distribution of data early in the project, implement a World Wide Web-site (WWW-site) with access to a simple set of structured directories that follow the structure of the project. By the end of the first year, implement pages on the WWW-site that provide up-to-date information for the general science community on project status and results. In addition, supply links that provide project participants with easy, and password-protected access to data that are categorized by spatial, temporal, and data type/parameter dimensions. During the second year, use a relational database to encourage full integration/ standardization of the more complete data sets (i.e., complete in terms of correctness, completeness, and documentation).

By the end of the second year, provide public access to the best data sets via an archive center/facility that will start to take ownership of the final data and documentation. From our experience, this sort of transition with the project data management and science teams works well to make the data generally available. Realize that for some data sets, multiple versions will be exchanged between the project and archive data management groups. The most important element in this transition is that the data given to the archive center is properly documented.

B. Formats for Data Access

If the information is integrated properly, the LMMB database should be very useful for current and future research efforts, and will provide a good basis for use and improvement in future projects. The data access interface will be an important component of the system, and will be critical in determining how much the system is accessed and which data are used. The demonstration of the Microsoft Windows interface under development showed that the developers are working to provide the needed functionality for LMMB data users. The interface will be useful and provide access for a large number of users, but it will greatly restrict access by users of other computer systems, such as Macintosh and UNIX workstations.

A World Wide Web (WWW) interface would provide access to just about everyone. It would have been difficult to anticipate the current breadth and depth of WWW technologies when the plans for the GLENDA database and its interface were started. However, EPA and LMMB management should temporarily stop interface development efforts to assess the benefits that would be realized from a WWW interface and determine if the likely increase in costs for the interface change can be accommodated. If the Oracle relational database is properly constructed and is data dictionary driven (as it appeared to be in the limited demonstration), there should not be any increased costs for that component. The additional costs should only occur in the design and implementation of the WWW-based interface software. If funds are not available from the LMMB study and/or there is not sufficient time to develop the WWW interface for LMMB, the reviewers encourage GLNPO to initiate a separate development effort to design and develop a functional WWW interface.

At the completion of the project, plan to publish the best data on a semi-permanent media like CD-ROM. Publishing and distributing the best data on CD-ROM ensures that the data will continue to be available for a long time even if consistent funding of an on-line WWW-site is terminated. It will also provide access to the data even if network connections fail. In addition, publish the individual data set documentation and QA summaries in some sort of Technical Memorandum set. Although the Technical Memoranda are 'gray literature', they do provide graduate students with a means of obtaining an important first publication. Publication of the data also gives the data QA and management personnel due credit as the hard-working editors of the final data collection, and gives the researchers recognition for their proper handling and documentation of the data.

C. Separate QA Management and Data Distribution Teams

GLNPO should use the LMMB effort to define a process and structure where groups like LMMB function as the project-specific data management group and data publisher. The goals of the LMMB data QA and management personnel are to obtain and rapidly distribute the data to LMMB researchers, and to consistently integrate, edit, and document the data for archive purposes. Once the data are sufficiently documented and checked, they should be transferred to a more permanent GLNPO or EPA entity that will function as the data archive and public data dissemination point. From completion of data collection, the amount of time to get a data set sufficiently documented varies from 1 to 2 years. Experience with this process on several NASA efforts has shown that a focused, project-specific data QA and management staff is able to meet the specific needs of the project better than having such support come from the staff of an archive center, which has a distinctly different purpose. Also, public release of properly documented (but not necessarily perfect) data must occur within 1 or 2 years in order for outside researchers reading published journal articles about the research to also obtain and use the data. This transition of data to the archive center over time also helps the archive center personnel to become familiar with the project data over a one to two year period. Defining the role of the project data QA and management staff to be that of data editors (similar to scientific journal editors) seems to be well understood and appreciated by the scientific researchers.  

See Jeff Newcomer's Comments on Data Management

See Jeff Newcomer's Comments on Data Release/Distribution

IV. Particular QC issues

A. Methyl Mercury

The review panel recommends that GLNPO try to include information on meHg in the LMMB. From a human health and wildlife perspective, meHg is the chemical species of interest in studies of environmental mercury. It is the form of mercury that biomagnifies along aquatic food chains and it poses the greatest health risk to piscivorous vertebrates. Although determinations of meHg were not included in the original LMMB work plan, some focus groups independently collected samples for meHg determination. At least two focus groups have already published meHg data collected during the LMMB project (Mason and Sullivan, 1997; Hurley et al., 1998). GLNPO should determine the extent of data on meHg potentially available from archived samples or from independent databases and consider ways to include them in the QA/QC assessment and modeling phases of the project.

References:

Mason, RP and KA Sullivan. 1997. Mercury in Lake Michigan. ES&T 31:942-947.

Hurley, J.P. et al. 1998. Partitioning and transport of total and methyl mercury in the lower Fox River, Wl. ES&T. 32:1424-1432. 

B. in vivo Chlorophyll

In vivo fluorescence (IVF) techniques were used to determine the depth distribution of chlorophyll in open lake waters for the eutrophication module of LMMB. The speed and sensitivity of IVF make it an attractive alternative to standard methods which involve the collection of suspended particulate matter by filtration or centrifugation, extraction of pigments in an organic solvent, and determination of pigment concentration by molecular absorbance or fluorescence (e.g. APHA, 1985; Parson et al., 1984). IVF was introduced to biological oceanography several decades ago (Lorenzen, 1966) and since that time numerous studies in lakes and oceans have used the IVF technique to quantify chlorophyll biomass.

Within constraints described below, IVF is a powerful and useful tool. It allows investigators to determine the spatial distribution of phytoplankton communities in real time at low cost. Discrete sampling depths may then be targeted relative to zones of high biomass or productivity. A certain degree of taxonomic discrimination is also possible using the ratio of fluorescence signals generated by different excitation and/or emission wavelengths (e.g. Falkowski and Kiefer, 1985; Yentsch and Phinney, 1985; Wood et al., 1985; Watras and Baker, 1988). Investigating the interactions between physico-chemical and biological processes is thus greatly simplified.

Since the strict quantification of chlorophyll biomass using IVF is limited by the variability of R, the fluorescence yield or ratio of chlorophyll fluorescence to chlorophyll concentration (Kiefer, 1973; Loftus and Seliger, 1975; Heany, 1978; Harris 1980; Cullen, 1982), the review committee recommends a more rigorous statistical assessment of the IVF data collected for the LMMB. R is known to depend on several factors, including phytoplankton species composition, time of day and depth. Fee (1976) found that R was constant for a given lake at a given time, but it varied widely from lake to lake on any day and between days on a single lake. As a result, frequent calibration against extracted samples is required. Since the relationship between IVF and extracted chlorophyll can be strongly influenced by high values from metalimnetic chlorophyll maxima, and since a small error in slope can produce a relatively large error in estimates at low values, accuracy may be reduced in the epilimnetic waters of oligotrophic systems.

References:

American Public Health Association. 1985. Standard Methods for the Examination of Water and Wastewater.

Cullen, JJ. 1982. The deep chlorophyll maximum: comparing vertical profiles of chlorophyll a. Can. J. Fish. Aquat. Sci. 39:791-803.

Falkowski, P. and DA Kiefer. Chlorophyll-a fluorescence in phytoplankton: relationship to photosynthesis and biomass. J. Plank. Res. 7:715-73 1.

Fee, EJ. 1976. The vertical and seasonal distribution of chlorophyll a in lakes of the ELA, northwestern Ontario: implications for primary production estimates. Limnol Oceanogr. 21: 767-783.

Harris, GP. 1980. The relationship between chlorophyll a fluorescence, diffuse attenuation changes, and photosynthesis in natural phytoplankton populations. J. Plank. Res. 2: 109-127.

Heany, SL. 1978. Some observations on the use of the in vivo fluorescence technique to determine chlorophyll a in natural populations and cultures of freshwater phytoplankton. Freshw. Biol. 8:115-126.

Kiefer, DA. 1973. Fluorescence properties of natural phytoplankton populations. Mar. Biol. 22:263-269.

Loftus, ME & HH Seliger. 1975. Some limitations of the in vivo fluorescence technique. Chesapeake Sci. 16:79-92.

Lorenzen. C. 1966. A method for the continuous measurement of in vivo chlorophyll concentration. Deep Sea Res. 13:223-227.

Parsons, TR et al. 1984. A manual of chemical and biological methods for seawater analysis. Pergamon.

Watras, CJ & AL Baker. 1988. Detection of planktonic cyanobacteria by tandem in vivo fluorometry. Hydrobiologia 169: 77-84.

Wood, AM et al. 1985. Discrimination between types of pigments in marine Synechococcus spp by scanning spectroscopy, epifluorescence microscopy, and flow cytometry. Limnol. Oceanogr. 30: 1303-1315.

Yentsch, CS & DA Phinney. 1985. Spectral fluorescence: an ataxonomic tool for studying the structure of phytoplankton communities. J. Plank. Res. 7: 617-632.

C. Total Metals

Although the total metals data are not explicitly part of the mass balance project, and so are not held to the same QA standards, a concern was raised regarding the intercomparability of metals data sets, due to the use of differing, and incomplete digestion procedures. All dissolved metals data appear to be excellent, but unfiltered and air particulate samples are likely biased low in a way that is difficult to assess quantitatively because of the use of weak acid digestions rather than total metals digestions. Mercury in these samples was released as total Hg by BrCl oxidation, but the ICP metals in unfiltered water were only leached with 1% HNO3 (very weak; see Bloom, N.S. and Gauthier, M. 1998. “Lower MDLs and Better Accuracy for ‘Total Recoverable Metals’ in Water through the use of HF/HNO3 Digestion at 85oC in Sealed Teflon Bottles,” 20th US EPA Norfolk Conference), while the air particulate samples were microwave bomb digested with 15% HNO3, which is an arbitrarily stronger, but still incomplete digestion. Although these methods probably meet particular PI-driven DQOs, they cannot be considered total metals analyses, and could not be used for any future metals mass balance of the lake. Thus, they should be labeled “weak acid leachable unfiltered” metals concentrations, rather than “metals,” which carries the implicit prefix of “total.”

D. PCB congener analyses

Given the state of the art of PCB analysis when the program the program began, the specific congener method was a bold but very commendable step. However problems with validation of congener calibration standards and the selection of the ECD method casts some uncertainty on the accuracy of the congener data. The quality of PCB congener standards has steadily improved and it may be timely to re-determine consensus values for the calibration standard and revised concentrations should be verified by additional intercalibration against reference standards and instrumental methods. The systematic biases in PCB data noted earlier from analysis of a congener performance suggest that a simple correction of concentrations using correction factors should be considered.

Bias due to the electron capture response (ECD) method of detection method builds in more subtle errors, as the ECD response of PCB congeners is a function of the number of chlorine substituents and detector response due to co-eluting groups will be method dependent. Such co-elutions present problems with ECD data, as co-eluting of components with different response factors leads to some uncertainty as to which response factor should apply. The method also limits determination of toxicity from PCB data for which toxic congeners should be ideally be reported individually. An MS based method would have provided substantially higher specificity.

V. Jeff Newcomer's Comments

A. Project Management

1. From what I read in the briefing materials and heard from the presentations and in the off-line discussions, it appears that a good balance was struck between the available resources and that put into the Quality Assurance and Data Management Phases of the project. From my experience in similar efforts conducted by NASA, 15 to 20% of the funds of a project of this nature are required to properly support Data QA, Documentation, and Data Management activities. The LMMB Data QA and Management efforts were properly focused on all the key aspects such as compiling common definitions of terms and parameters and agreeing on proper measurement units. The only item that could have benefited the QA effort was more early emphasis on QAPP standardization. Getting agreement on the QAPPs early in the effort could have reduced the amount of effort that was needed later in the project to properly understand and integrate the data. 

2. At this point of the project, I do believe that there has been sufficient management involvement in the QA and Data Management aspects of the project. It appears that management involvement in these areas has been quite high over the last year or so after Lou Blume took over, but that there was less involvement in the early stages.  If there had been more management involvement earlier in the effort, it is likely that some of the recent struggles in the QA and Data Management could have been reduced or eliminated. 

3. I believe that the approach of having investigators define their data quality and methods is the best for projects of this nature; however, management must be firm in requiring that standard methods be used and in clearly identifying any methods that are deemed to be inappropriate. My experience is that top-quality researchers and labs will not readily accept data QA, collection, and analysis methods defined by management even if the defined items are what they would have proposed themselves. It is much better to have management require the definition and use of standard methods that the investigators first define individually and then collaboratively. This allows the researcher groups to use the knowledge and creativity for which they were selected, to learn more about each other through the standardization requirement, and allows them to lead the scientific aspects of the project. Defining these items needs to be done early in the effort with management considering the use of brief trial data collection and processing efforts and subsequent reviews to improve the anticipated results. 

4. Overall, the documentation, data quality, data process, and procedures seemed to be properly defined and appropriately scaled for the LMMB effort. Although portions of the LMMB data are available to individuals and groups that request them, I believe that the data should be more openly available to the general science community at this point in the project. I clearly understand that each data set requires a certain amount of time to get it to a reasonable level of quality before it can be released to the general scientific community. From my experience, this period is one to two years depending on the type and volume of data. This period gives researchers and graduate students the proper amount of time to calibrate and analyze the data, explain and/or remove the majority of the problems, and publish their first results. At this point, the scientific community becomes aware of the research by way of the publications and is eager to obtain and use some of the data. From my experience and what I heard in the presentations, outside research groups could potentially get the data through the 'buddy system' if LMMB management did not release the data. NASA's perspective has been that it is better for outside groups to obtain the 'not quite perfect' data that is properly (but not necessarily completely) documented through the central system in order to minimize the complaints about restricted access to data collected with public funds. 

5. The approaches used for data verification and data management are very appropriate for future projects of this type. For data distribution, I would encourage GLNPO to take a staged approach that follows the natural progression of the research and mimics the publication of scientific results. 

a) For distribution of data early in the project, implement a WWW-site with access to a simple set of structured directories that follow the structure of the project. 

b) By the end of the first year, implement pages on the WWW-site that provide up-to-date information for the general science community on project status and results. In addition, supply links that provide project participants with easy and password protected access to data that are categorized by spatial, temporal, and data type/parameter dimensions. 

c) During the second year, use a relational database to encourage full integration/ standardization of the more complete data sets (i.e., complete in terms of correctness, completeness, and documentation). 

d) By the end of the second year, provide public access to the best data sets via an archive center/facility that will start to take ownership of the final data and documentation. From our experience, this sort of transition with the project data management and science teams works well to make the data generally available. Realize that for some data sets, multiple versions will be exchanged between the project and archive data management groups. The most important element in this transition is that the data given to the archive center is properly documented. 

e) At the completion of the project, plan to publish the best data on a semi-permanent media like CD-ROM. In addition, publish the individual data set documentation in some sort of Technical Memorandum set. Although the Technical Memoranda are 'gray literature', they do provide graduate students with a means of obtaining an important first publication. Publication of the data also gives the data QA and management personnel due credit as the hard-working editors of the final data collection and gives the researchers proper recognition for their proper handling and documentation of the data. 

6. No comment in particular on the data collection methods. 

B. Quality Assurance Process

Data Verification
1. The best thing that the LMMB data management and QA group can do for the data users is to clearly document what is meant by invalid data. It is certainly appropriate to give the users explicit warnings about the limitations of the data. This includes both quantitative and qualitative information about the limitations. My experience with researchers providing data is that they might admit to scientific peers that certain parts of their data should not be used, but they do not want QA and data management people making those statements. Users also seem like they want to make their own determination of whether or not data should be used based on the information that is provided. 

2. No particular comment on the auditing criteria and number of audits. 

3. No particular comment on the blank- and surrogate corrected data. 

4. The approach of flagging data values below detection limits has been the best in my experience. 

Data Standard 
1. The use of codes for anything will always lead to debate among a group of people. However, the LMMB group did address the most important item in using codes by creating a structure for the codes that is clearly documented.

2.  From what I read and heard, the codes were not any more dynamic than what I have experienced. Once the LMMB effort is over, it will be important for the group to review what was done and make a recommendation for future work. 

Statistical Assessment
1. The statistical assessment reports provide the needed method explanations. 

2. Yes, the statistical attributes were represented sufficiently. 

3. The analyses conducted on the data seem to be appropriate and complete. 

4. The percent of variability due to sampling and analytical measurement uncertainty is meaningful and useful. It provides current and future researchers with quantitative information that can be used to judge new instrumentation and techniques.

C. Data Management

1. Not much information was presented or provided in the written materials about the details of the database. From my experience, the data management personnel requested the needed data standards and information. 

2. If the information is integrated properly, the LMMB database should be very useful for current and future research efforts and provide a good basis for use and improvement in future projects. The data access interface will be a very important component of the system and will be important in determining how much the system is accessed and the data are used.  The demonstration of the Microsoft Windows interface being developed showed that the developers are working to provide the needed functionality for LMMB data users.  The interface will be useful and provide access for a large number of users, but it will greatly restrict access by users of other computer systems such as Macintosh and UNIX workstations. A World Wide Web (WWW) interface would provide access to just about everyone.  It would have been difficult to anticipate the current breadth and depth of WWW technologies when the plans for the GLENDA database and its interface were started. However, EPA and LMMB management should temporarily stop interface development efforts to assess the benefits that would be realized from a WWW interface and determine if the likely increase in costs for the interface change can be accommodated. If the Oracle relational database is properly constructed and is data dictionary driven (as it appeared to be in the limited demonstration), there should not be any increased costs for that component. The additional costs should only occur in the design and implementation of the WWW-based interface software. If funds are not available from the LMMB study and/or there is not sufficient time to develop the WWW interface for LMMB, the reviewers encourage GLNPO to initiate a separate development effort to design and develop a functional WWW interface.

D. Data Release/Distribution

1. GLNPO should use the LMMB effort to define a process and structure where groups like LMMB function as the project specific data management group and data publisher. The goals of the LMMB data QA and management personnel are to obtain and rapidly distribute the data to LMMB researchers and to consistently integrate, edit, and document the data for archive purposes. Once the data are sufficiently documented and checked, they should be transitioned to a more permanent GLNPO or EPA entity that will function as the data archive and public data dissemination point. The amount of time that it takes to get a data set sufficiently documented varies from 1 to 2 years from the completion of data collection efforts. In addition, distribution of as much of the data as is reasonable on CD-ROM or other semi-permanent media should be seriously considered. Experience with this process on several NASA efforts has shown that:  

a) A focused project specific data QA and management staff is better able to focus on and meet the specific needs of the project rather than having such support come from the staff of an archive center, which has a distinctly different purpose. 

b) Public release of properly documented (but not necessarily perfect) data must occur within 1 or 2 years in order that outside researchers reading published journal articles about the research can also obtain and use the data. This transition of data to the archive center over time also helps the archive center personnel to become familiar with the project data over a one to two year period. 

c) Defining the role of the project data QA and management staff to be that of data editors (similar to scientific journal editors) seems to be well understood and appreciated by the scientific researchers. 

d) Publishing and distributing the best data on a semi-permanent media like CD-ROM ensures that the data will continue to be available for a long time even if consistent funding of an on-line WWW-site is terminated and provides access to the data even if network connections fail.

Please direct any questions about the LMMB QA program to:
Louis Blume
, QA Manager, 312-353-2317

 
Begin Site Footer

EPA Home | Privacy and Security Notice | Contact Us