Interobserver agreement in the diagnosis of canine hip dysplasia using the standard ventrodorsal hip-extended radiographic method

Similar documents
Schemes plus screening strategy to reduce inherited hip condition

THE EFFECT OF A TECHNICAL QUALITY ASSESSMENT OF HIP-EXTENDED RADIOGRAPHS ON INTEROBSERVER AGREEMENT IN THE DIAGNOSIS OF CANINE HIP DYSPLASIA

Canine Hip Dysplasia: Are Breeders Winning the Battle?

PROCEEDINGS OF THE DOG BREEDERS & OWNERS SYMPOSIUM ON HEREDITARY CONDITIONS IN DOGS. 17 APRIL 1999 Presented by the University of Pretoria

PROCEEDINGS OF THE DOG BREEDERS & OWNERS SYMPOSIUM ON HEREDITARY CONDITIONS IN DOGS 17 APRIL 1999 HIP DYSPLASIA

Estimates of genetic parameters for hip and elbow dysplasia in Finnish Rottweilers 1

The Demographics of Hip Dysplasia in the Maine Coon Cat. Randall T. Loder, MD* Rory J. Todhunter, BVSc, MS, PhD

Elbow dysplasia: The predictive value of radiographic screening at one year of age

526 Vol. 24, No. 7 July 2002

In the first part of this series (published

THE USE OF HEALTH DATABASES AND SELECTIVE BREEDING

IMPACT OF INBREEDING AND HERITABILITY OF CANINE HIP DYSPLASIA IN GERMAN SHEPHERDS POPULATION

GUIDELINES FOR YOUR VETERINARIAN Submitting Radiographs to VDD

Implementation of Estimated Breeding Values (EBVs) for health and behavioural traits at Guide Dogs UK

Hip Dysplasia. So What is Hip Dysplasia? If this Disease Starts in Puppy hood, Why are Most Affected Dogs Elderly?

Summary Report of the Anatolian Shepherd Dog Health Survey. Data collected by ASDCA in partnership with OFA from December 1, 2009 to September 5, 2011

Health Summary. Lachanstrand's Dubhlainn At Janbry. Spaniel (Irish Water) This document contains the following information

( ): Are we making progress?

The femoral head (the ball in the ball and socket joint) is outlined in

Canine hip dysplasia: diagnosis and management. References

Tested Sex Result Date Age Brigburn Kit Carson Dog 0 31/07/ years, 4 months Brigburn Murray Dog 0 03/12/ year, 2 months

A retrospective study on findings of canine hip dysplasia screening in Kenya

Strategies in modern dog breeding

Heritability and Phenotypic Variation of Canine Hip Dysplasia Radiographic Traits in a Cohort of Australian German Shepherd Dogs


Canine Hip Dysplasia Part III

Information Guide. Breeding for Health.

The Institute of Canine Biology (/)

Australian and New Zealand College of Veterinary Scientists. Membership Examination. Small Animal Surgery Paper 1

BVA/KC/ISDS Primary Glaucoma

What consequences can be expected if screening for hip dysplasia in two different canine breeds ceases with regards to HD status?

Alternative classification and screening protocol for transitional lumbosacral vertebra in German shepherd dogs

In the first half of the 20th century, Dr. Guido Fanconi published detailed clinical descriptions of several heritable human diseases.

Inherited disease tests for the Labrador Retriever Orthopaedic tests

Clarifications to the genetic differentiation of German Shepherds

NATIONAL ROTTWEILER COUNCIL (AUSTRALIA) HIP AND ELBOW DYSPLASIA SCHEME

Kennel Club Response to the Home Office s draft guidance on the operation of the Animals (Scientific Procedures) Act 1986 (ASPA) Consultation.

What the Kennel Club does for dog health

CAPE BULLMASTIFF CLUB MAY 2017 NEWSLETTER

Canine hip dysplasia: Pathogenesis, phenotypic scoring, and genetics

15. Scores range from 0-53 for each. Breed average score currently circa. hip. The lower the score the better. Not uncommon.

Complex Segregation Analysis of Canine Hip Dysplasia in German Shepherd Dogs

Australian and New Zealand College of Veterinary Scientists. Fellowship Examination. Small Animal Surgery Paper 1

It is all about balance. Thoughts about the notion of standards AMC

What Price a Normal Hip

Spanish Water Dog Club. Annual Health Report 2017

French Bulldog Club of England Health Improvement Strategy 2012,

Mile High Breeder Referral Program

Cytogenetic Investigation of Canine Soft Tissue Sarcomas. and Histiocytic Malignancies INFORMED CONSENT FOR PARTICIPANTS GOLDEN RETRIEVER

German Shepherd Dog. Vulnerable Breed. Length of coat. Supposedly sheds? Town or Country. Minimum garden size

RULES FOR THE FCI EUROPEAN CUP FOR ENGLISH HUNTING SPANIELS REGULATION A OF THE FCI

Genetic and Genomic Evaluation of Claw Health Traits in Spanish Dairy Cattle N. Charfeddine 1, I. Yánez 2 & M. A. Pérez-Cabal 2

INTERNATIONAL BREEDING RULES OF THE F.C.I.

4460 Watervale Road Manlius, New York (315) FACE BOOK: VonSila Kennels WEBSITE:

Brigburn U'll Do. Health Test Results - Progeny Comparison. BVA/KC Elbow Dysplasia Scheme. BVA/KC Hip Dysplasia Scheme

The Value of Cardiac Testing One Breeders Perspective, By Laura Munro

THE CHARACTERISTICS OF LAMENESS IN DAIRY COWS

Introduction. Primary objective. To Spay or Not to Spay That is the question. If to Spay When to spay. Do we know the answers?


Boxer. Varieties. Vulnerable Breed. Length of coat. Supposedly sheds? Town or Country. Minimum garden size. Bobtail

Science & Technologies. DİSTAL ULNAR RETAİNED CARTİLAGİNOUS CORE RCC İN DOGS Mehmet SAĞLAM 1, M. Alper ÇETİNKAYA 2 1

AKC Canine Health Foundation Grant Updates: Research Currently Being Sponsored By The Vizsla Club of America Welfare Foundation

Wind River s Kennel. Koeberlein s Hunting Preserve. Debbi Koeberlein 274 County Road 1400 East Tolono, Illinois (217) CONTRACT

Loss Given Default as a Function of the Default Rate

Textbook Of Veterinary Diagnostic Radiology Download Free (EPUB, PDF)

Progress of type harmonisation

SHEEP SIRE REFERENCING SCHEMES - NEW OPPORTUNITIES FOR PEDIGREE BREEDERS AND LAMB PRODUCERS a. G. Simm and N.R. Wray

IX. RULES AND REGULATIONS FOR THE WORLD CHAMPIONSHIP OF PRACTICAL HUNTING FOR BRITISH AND CONTINENTAL POINTERS.

Puppy Sales Contract

General Guidelines for the Breeding of German Shepherd Dogs within the WUSV

Canine Hip Dysplasia Part II

14th Conference of the OIE Regional Commission for Africa. Arusha (Tanzania), January 2001

the factual matters in this statement are, so far as I know, true; and

Critical appraisal Randomised controlled trial questions

Spaniel (Cocker) Varieties

A Genetic Comparison of Standard and Miniature Poodles based on autosomal markers and DLA class II haplotypes.

Genetics of behavior traits in dogs

Specifications for the organization of the FCI IPO European Open for Tracking Dogs

PHONE: :: :: FAX:

Seeing Eye Dog Selection A Blend of Science and Savvy

GLOSSARY. Annex Text deleted.

RULES FOR THE EUROPEAN CUP FOR RETRIEVERS

AUSTRALIAN AND NEW ZEALAND COLLEGE OF VETERINARY SCIENTISTS. Sample Exam Questions. Veterinary Practice (Small Animal)

Is Robenacoxib Superior to Meloxicam in Improving Patient Comfort in Dog Diagnosed With a Degenerative Joint Process?

Managing AMR at the Human-Animal Interface. OIE Contributions to the AMR Global Action Plan

Code of Ethics/Conduct. General Code of Ethics/Conduct

1.4. Initial training shall include sufficient obedience training to ensure the canine will operate effectively based on mission requirements.

TECHNICAL BULLETIN Claude Toudic Broiler Specialist June 2006

KUWAITI CYNOLOGICAL ASSOCIATION (KCA)

Comparison of different methods to validate a dataset with producer-recorded health events

September Population analysis of the Giant Schnauzer breed

Australian College of Veterinary Scientists. Fellowship Examination. Small Animal Surgery Paper 1

Bill of Sale and Contract SAMPLE IDENTIFICATION INFORMATION:

AKC Bearded Collie Stud Book & Genetic Diversity Analysis Jerold S Bell DVM Cummings School of Veterinary Medicine at Tufts University

THAL EQUINE LLC Regional Equine Hospital Horse Owner Education & Resources Santa Fe, New Mexico

Small Animal Practice, Stifle Surgery (Veterinary Clinics Of North America, 1993: 23:4) READ ONLINE

Lavin's Radiography For Veterinary Technicians PDF

Proceedings of the 11th International Congress of the World Equine Veterinary Association

Eye disease comes under the spotlight

Overview of the OIE PVS Pathway

Transcription:

PAPER Interobserver agreement in the diagnosis of canine hip dysplasia using the standard ventrodorsal hip-extended radiographic method OBJECTIVES: To determine the agreement between observers and to investigate the effect of observer experience in diagnosing canine hip dysplasia and providing final scoring of hips using the standard ventrodorsal hip-extended radiographic method. The agreement of the final scoring, with a presumed correct assessment based on the Norberg angle, is also investigated. METHODS: Thirty observers were requested to read 50 ventrodorsal hip-extended radiographs of 25 dogs according to Federation Cynologique International criteria. Groups of experienced (nine members) and inexperienced (21 members) observers were used. RESULTS: For providing the distinction between dysplastic versus nondysplastic dogs, the average interobserver agreement was 72 per cent and was significantly higher (P<0 0001) than the score that could be expected by chance without any agreement between observers. For providing the final score (A, B, C, D or E), an average interobserver agreement of 43 6 per cent was found. In the experienced group, an agreement score of 76 per cent was found for the distinction between AB versus non-ab and an agreement score of 81 per cent was found for the distinction between C versus non-c. The agreement score was significantly higher (P<0 0001) for the experienced group than for the inexperienced group in all cases. Agreement between the presumed correct assessment based on the Norberg angle and the observer s evaluation was low (P50 35), irrespective of whether the observers were experienced (71 8 per cent correct assessments) or inexperienced (69 per cent correct assessments). CLINICAL SIGNIFICANCE: Although interobserver agreement is low, observer experience increases agreement. G. VERHOEVEN, F. COOPMAN, L. DUCHATEAU*, J. H. SAUNDERS, B. VAN RIJSSEN AND H. VAN BREE Journal of Small Animal Practice (2007) 48, 387 393 DOI: 10.1111/j.1748-5827.2007.00364.x Department of Medical Imaging and *Department of Physiology, Biochemistry and Biometry, Ghent University, Salisburylaan 133, 9130 Merelbeke, Belgium INTRODUCTION Canine hip dysplasia (CHD) is a debilitating developmental disease first described by Schnelle (1935). The breed prevalence in the USA as estimated by the Orthopedic Foundation for Animals (OFA) varies between 10 and 48 per cent (Keller 2003). These rates are comparable to the results in Belgium and Europe (Coopman and others 2004). In North America, the diagnosis of CHD is made by radiographic evaluation of the hips according to the protocol established by the American Veterinary Medical Association using the subjective standard ventrodorsal hip-extended view and applying a 7-point grading system (Keller 2003). In continental Europe, the recommendations of the Federation Cynologique International (FCI) using a five-grade scale from A to E (Table 1) are followed to a large extent (Flückiger 1993). In this scoring system, dog s hips are classified as A, B, C, D, or E. Dogs with A (no signs of hip dysplasia) and B (near-normal hip joints) hips are considered non-dysplastic and are therefore recommended for use in breeding programmes. Dogs with C hips are considered mildly affected and can be used in the breeding programme in certain instances, whereas D and E dogs are considered clearly dysplastic and are therefore not considered as breeding candidates. However, in some countries and for some breeds, it is not explicitly forbidden to breed with dogs that have D and E hips. In the UK, the British Veterinary Association/Kennel Club uses a scheme with nine different radiographic parameters where each parameter is given a point value between 0 and 6 except the caudal acetabular edge which is scored 0 to 5, depending on the degree of severity (Gibbs 1997). These radiographic evaluations have been used to reduce the incidence of CHD (Brass 1989). Journal of Small Animal Practice Vol 48 July 2007 Ó 2007 British Small Animal Veterinary Association 387

G. Verhoeven and others Table 1. Scoring system according to the Federation Cynologique International regulations A B C D E No signs of hip dysplasia Femoral head and acetabulum are congruent, and the joint space is narrow The cranial acetabular edge is congruent with the femoral head The NA measures at least 100 The FHC is clearly medial to the dorsal acetabular edge There are no signs of osteoarthritis Near-normal hip joints Femoral head and acetabulum are slightly incongruent, but the NA measures at least 100 The cranial acetabular edge is curvilinear and smooth, and the cranial effective acetabular rim is horizontal The FHC is just medial to or on the dorsal acetabular edge There are no signs of osteoarthritis Mild hip dysplasia Femoral head and acetabulum are incongruent The NA measures at least 95 There is mild receding of the cranial acetabular rim The FHC is on the dorsal acetabular edge or slightly lateral to it Mild osteoarthritis is present Moderate hip dysplasia Femoral head and acetabulum are obviously incongruent The NA measures at least 90 The FHC lies clearly lateral to the dorsal acetabular edge There may be receding of the cranial effective acetabular rim Signs of osteoarthritis are usually present Severe hip dysplasia The femoral head is markedly subluxated or luxated The NA measures less than 90 The FHC lies obviously lateral to the dorsal acetabular edge The craniolateral acetabular rim is obviously receding Obvious signs of osteoarthritis are usually present FHC Femoral head centre, NA Norberg angle Despite the extensive use of various protocols using the subjective hipextended method over the past 30 years, the slow progress in decreasing the incidence of CHD remains a fact (Powers and others 2004). This may be attributed to several factors such as the insensitivity in detecting laxity of the hip joint, high interobserver variation, degenerative joint disease that is often not visible at the age the radiographs are made and breeders continuing to use dysplastic dogs (Brass 1989, Morgan and others 2000, Keller 2003). The effect of breeding affected dogs that have not been assessed by any method is unclear and difficult to investigate. To date, there are few studies that have assessed the variation among radiologists in assigning hip scores. One study found that the level of agreement between observers, using a subjective method, was very low (Smith and others 1996). Paster and others (2005) stated that betweenand within-radiologist grading for CHD varies significantly when using standard ventrodorsal hip-extended radiography. Saunders and others (1999) found a significant difference between radiologists evaluating ventrodorsal or dorsoventral hip-extended views. The main objective of this study was to investigate the interobserver agreement in determining dysplastic versus nondysplastic and final scoring using the FCI grading system. We further investigated the agreement of the dysplastic/nondysplastic result with the FCI-proposed standard based on the Norberg angle (NA). The effect of experience on agreement between observers was also investigated. MATERIALS AND METHODS Data collection Original radiographs of 25 dogs were obtained from the database of the National Committee for Inherited Skeletal Disorders (NCISD) of Belgium. For each dog, one pair of radiographs was available. Radiographs of the same dog were different in position and/or exposure. All radiographs were numbered from one to 50, taking care that radiographs of the same dog were not filed consecutively. Additionally, care was taken to avoid radiographs with exuberant signs of degenerative joint disease of the hip joint. In total, 30 observers were recruited from the universities of Bern (one), Ghent (11), Giessen (five), Utrecht (one), Zurich (one) and private practitioners as members of the Flanders Orthopaedic Working Group (FOWG) (11). Two groups were created. The experienced group (nine members) consisted of board-certified radiologists (ECVDI), surgeons (ECVS) and full professors in radiology or surgery. Furthermore, to be categorised as experienced, each member had to be active in the NCISD for at least 10 years. One member of the experienced group was neither board certified nor a full professor but has been active in the NCISD for 23 years. The inexperienced group (21 members) contained residents in diagnostic imaging and surgery and private practitioners active in the FOWG. Each observer was asked to evaluate the original radiographs individually, unaware of the fact that for every dog, two radiographs were presented. Film readings were performed under similar environments for each observer as much as possible in a darkened room. Viewing boxes with shutters were used to exclude extraneous light. Not all boxes had the same light bulbs because film reading was performed in different universities. The observers were asked to clearly indicate whether the right and left hip joints were dysplastic or non-dysplastic and to give a final score (FS) on the right and left hip joint according to the FCI criteria (A, B, C, D or E), irrespective of whether they accepted or rejected the radiograph for its technical quality. The NA was not provided on the radiographs, and observers used free choice in decision to measure the NA or not. The observers who decided to measure the NA used a transparent template of their own choice. Examples of two of these templates are shown in Figs 1 and 2. Observers were not allowed to indicate the femoral head centre (FHC), being the centre of best fitting circle on the template overlying the femoral head, or to write other markings 388 Journal of Small Animal Practice Vol 48 July 2007 Ó 2007 British Small Animal Veterinary Association

Interobserver agreement in the diagnosis of CHD FIG 1. Transparent HD ruler; Eickemeyer on the films. The observers were not asked to provide the NA to the authors. It must be emphasised that the authors chose not to standardise the reading procedure. The observers were asked to read the films in the same manner as they would perform in their official screening procedure. In this way, investigation of interobserver agreement reflects the situation as is currently present in the different screening committees. After the films had been read by the different observers, the NA was measured on each film by the second author (F. C.). Using the HD ruler as presented in Fig 1, thefhcwasmarkedonthefilm.aline was drawn through both FHCs and between the FHC and the craniolateral acetabular edge. The angle between these lines was measured using a standard goniometer with 1 accuracy. Following this procedure, a coefficient of variation of 4 per cent could be obtained (Coopman and others 2006). Based on this NA alone, the hip joint was scored dysplastic/nondysplastic. According to the FCI criteria (Table 1), a hip joint is considered dysplastic if the NA is less than 100. Thisclassification between dysplastic/non-dysplastic, based on the NA measured by the second author, was presumed to be the correct assessment for statistical analysis purposes. Statistical analysis An agreement score was derived for each radiograph by determining the percentage of pairs of observers who made the same diagnosis: no dysplasia or dysplasia, AB (no breeding restriction) or not and C (mild dysplasia) or not. Without any FIG 2. Transparent HD ruler; Eickemeyer interobserver agreement, this agreement score will be on average 50 per cent (as many agreeing as disagreeing pairs). Therefore, data were first tested by a linear mixed model with normally distributed error term and dog as random effect to see whether the agreement score was significantly higher than 50 per cent. Next, a mixed model with dog as random effect and group as fixed effect was used to test whether the agreement score differed between the experienced and inexperienced groups of observers. Additionally, it was tested wether the agreement score differed between the experienced observers who measured the NA and the experienced observers who did not measure the NA. An agreement score was also derived based on the FCI score for each radiograph by determining the percentage of pairs of observers who assigned the same FCI score, and it was also tested whether the FCI agreement score differed between the experienced and inexperienced groups of observers by a mixed model with dog as random effect and group as fixed effect. Finally, the assessment of dysplasia/ non-dysplasia based on the NA was performed, and it was tested whether the percentage of presumed correct assessments differed between the experienced and inexperienced groups of observers based on a generalised mixed model with binomially distributed error term, observer as random effect and group as fixed effect. RESULTS Dysplastic/non-dysplastic The average agreement between observers to make a distinction between dysplasia/ non-dysplasia was 72 per cent and was significantly higher (P,00001) than 50 per cent, the expected score when there would be no agreement. The agreement score was significantly higher (P,00001) for the experienced group (713 per cent) than for the inexperienced group (631 per cent). AB (no breeding restriction) or not The agreement between experienced observers to make a distinction between AB or not AB was 76 per cent and was significantly higher (P,00001) than 50 per cent, the expected score when there would be no agreement. The agreement score was significantly higher (P,00001) for the experienced group (76 per cent) than for the inexperienced group (67 per cent). C (mild dysplasia) or not The agreement between experienced observers to make a distinction between mild dysplasia or not mild dysplasia was 81 per cent and was significantly higher (P,00001) than 50 per cent, the expected score when there would be no agreement. The agreement score was significantly higher (P,00001) for the experienced group (81 per cent) than for the inexperienced group (66 per cent). FCI scoring (A to E) Agreement between observers was found in only 436 per cent of the cases. For the experienced observers, this agreement was significantly (P,00001) higher (5249 per cent) compared with inexperienced observers. Tables 2 and 3 illustrate the relative agreement results between experienced and inexperienced observers for FCI scoring and dysplasia/non-dysplasia determination, respectively. Discrepancy of non-dysplasia/ dysplasia versus AB/CDE scoring Not all observers followed the FCI criteria consistently: some discrepancies were observed in the interpretation of nondysplasia/ab scores and dysplasia/cde scores. This discrepancy was seen in four out of nine experienced observers and in eight out of 21 inexperienced observers. Three experienced and three inexperienced observers had a discrepancy of greater or equal to 5 per cent (Tables 2 and 3). Most discrepancies were found in determining B- and C-scored hips. Norberg angle Using the NA as the correct assessment for determining dysplasia, no significant differences (P=035) were observed between the experienced group with percentage of correct assessment equal to 718 per cent and the group of inexperienced observers Journal of Small Animal Practice Vol 48 July 2007 Ó 2007 British Small Animal Veterinary Association 389

G. Verhoeven and others Table 2. Relative agreement between experienced observers in FCI scoring, dysplasia/non-dysplasia determination, dogs that are considered suitable for breeding (per cent AB) and dysplastic dogs (per cent CDE) and observers who measured NA for every hip joint Observer grade A grade B grade C grade D grade E non-dysplastic AB dysplastic CDE 1 40 22 29 9 0 62 62 38 38 Y 2 13 39 24 24 0 52 52 48 48 Y 3 31 34 30 5 0 65 65 35 35 N 4 2 46 43 9 0 15 48 85 52 Y 5 14 46 34 6 0 60 60 40 40 N 6 21 56 18 5 0 77 77 23 23 Y 7 3 30 50 15 2 29 33 71 67 Y 8 14 48 27 11 0 44 62 56 38 N 9 16 35 26 13 0 56 51 44 39 N Mean 17 40 31 11 0 51 57 49 42 FCI Federation Cynologique International, NA Norberg angle, Y Yes, N No NA with percentage of correct assessment equal to 69 per cent. NA and agreement in the experienced group As the experienced group showed better overall interobserver agreement compared with the inexperienced group, the influence of NA measurement on agreement was studied only in the former group. The most important factor in current breeding practice is the ability to distinguish between AB or not and C or not, so we focussed on interobserver agreement of these scores. In the experienced group, four observers decided not to measure the NA for every hip joint, while five observers did measure the NA for every hip joint (Table 2). There was no significant difference in agreement for AB or not (P=008) or for C or not (P=057) between experienced observers who measured the NA and those who did not. DISCUSSION To decrease the incidence of inherited disorders, all affected animals should be excluded from breeding. Therefore, the distinction between dysplastic and nondysplastic dogs is far more important than the classification of dogs into five (FCI) or seven (OFA) categories. However, in situations where a disease has a high prevalence, a breeding strategy which temporarily permits breeding with mildly affected animals may be justified. This prevents the breeding population from becoming too small, which makes the genetic variability too low and can increase the prevalence of other genetic disorders. In situations where mildly affected animals are allowed to breed, it is not only Table 3. Relative agreement between inexperienced observers in FCI scoring, dysplasia/non-dysplasia determination, dogs that are considered suitable for breeding (per cent AB) and dysplastic dogs (per cent CDE) and observers who measured NA for every hip joint Observer grade A grade B grade C grade D grade E non-dysplastic AB dysplastic CDE 10 25 37 37 1 0 62 62 38 38 N 11 27 34 28 11 0 61 61 39 39 N 12 25 49 25 1 0 74 74 26 26 Y 13 11 40 43 6 0 59 51 41 49 N 14 47 25 12 16 0 75 72 25 28 Y 15 52 33 14 1 0 85 85 15 15 Y 16 29 20 30 21 0 53 49 47 51 N 17 14 37 11 30 8 54 51 46 49 N 18 8 16 42 33 1 24 24 76 76 N 19 2 22 40 30 6 24 24 76 76 N 20 22 34 33 11 0 56 56 44 44 Y 21 8 31 43 16 2 39 39 61 61 N 22 32 37 28 3 0 69 69 32 32 Y 23 22 33 39 2 4 55 55 45 45 N 24 17 50 20 13 0 67 67 33 33 Y 25 18 53 29 0 0 71 71 29 29 N 26 16 47 28 6 3 68 63 32 37 N 27 21 41 24 11 2 63 62 37 37 N 28 28 31 31 10 0 63 59 37 41 Y 29 23 31 37 9 0 56 54 44 46 Y 30 36 40 17 7 0 71 76 29 24 Y Mean 23 35 29 11 1 59 58 41 42 FCI Federation Cynologique International, NA Norberg angle, N No, Y Yes NA 390 Journal of Small Animal Practice Vol 48 July 2007 Ó 2007 British Small Animal Veterinary Association

Interobserver agreement in the diagnosis of CHD important to make a distinction between normal to near-normal hips, mildly affected animals and moderate to severe affected animals but also to know which animals have optima forma hips (A hips) because these animals are the preferable partners for mildly affected animals (C hips). A correct assessment between observers and across borders is important to demonstrate the credibility of the screening procedure and because it is of psychological importance for dog owners to have a dog with optimal hips. Owners prefer to have an A score in the free cases or a C score if the dog is affected. For all these reasons, logical and psychological, it is important to correctly classify hip joints at all stages, within and between screening committees. Furthermore, careful attention to refined phenotypic classification reduces a primary source of heterogeneity and is directly beneficial to genome-wide association studies (Cardon 2006). Our study shows that observers classify dogs as non-dysplastic/dysplastic, AB or not, C or not or in refined classes (A, B, C, D and E) with insufficient agreement. Furthermore, some observers do not classify A (no signs of hip dysplasia; Table 1) and B (near-normal hip joints; Table 1) hip joints as free of dysplasia (Tables 2 and 3), as is prescribed by the FCI criteria and as is commonly presumed. Incorrect classification means that some dysplastic dogs receive a CHD-free score and are used in breeding programmes. There is a particular problem in the accurate classification of B-scoring (near-normal or transitional cases according to some observers) hip joints. These results and observations might explain the scepticism that exists in breeding organisations towards the routine screening methods for CHD. Disagreement between observers inevitably leads to a considerable number of false-positive (loss of genetic variation) and false-negative dogs (genetically affected). Allowing false-negative dogs to breed maintains hip dysplasia in the population, whereas false-positive dogs, which could decrease the susceptibility for hip dysplasia, are rejected from the pool. This may explain the slow progress of decreasing hip dysplasia over the past few decades (Powers and others 2004). Our results show an agreement of 72 per cent between pairs of observers as to whether the animal should be classified as having a normal or abnormal phenotype, which is significantly higher than the expected score of 50 per cent when there would be no agreement but much lower than the 100 per cent one should have in an ideal screening situation. These findings are in contradiction with the excellent results of OFA that found an agreement in 934 per cent (Corley and Hogan 1985) to 949 per cent (Keller 2003) of the cases as to whether the animal should be classified as having normal, borderline or abnormal phenotype. This would indicate that the OFA uses an almost perfect screening procedure. In contrast to the excellent results presented by the OFA, Smith and others (1996) used a weighted kappa analysis to quantify the agreement of hip scores in ventrodorsal radiographic projections from 125 largeand giant-breed dogs older than two years. Three board-certified veterinary radiologists evaluated hips using the OFA system. For the 2-point scoring scheme (normal versus dysplastic) used in this study, kappa values ranged from 031 to 068, which is more or less in agreement with our result. Where the more refined 7-score system of the OFA was used, all three radiologists agreed on the same hip phenotype in 735 per cent of the cases (Keller 2003). These percentages of agreement are high considering the subjective nature of the evaluation (Smith and others 1996, Smith 1997, Corley and others 1997, Paster and others 2005). In the study by Smith and others (1996), the level of agreement between the three radiologists using the 7-score system was poor, with kappa values ranging from 004 (almost no agreement) to 020. Within-radiologist variability was slightly better, with kappa values ranging from 038 to 045 in that study. This poor level of agreement seems to be in accordance with our results, where an average agreement of 436 per cent for the five-score system was found. Unpublished data from the Belgian Committee reveal that the agreement between two different observers for the FS not only can be as low as 027 but also be as high as 07, when using the kappa analysis. Observers who have been reading hip radiographs together over an extended period of time seem to develop a more consistent agreement score (agreement harmonisation). The exact reasons for the discrepancy between our results and the results of the OFA remain hypothetical. A first hypothesis is the difference in observer experience. Our results indicate that there is a significant influence of experience on the agreement between observers. Secondly, the OFA evaluates dogs that are at least two years of age, increasing the chance of detecting degenerative changes which make it much easier to classify dogs as dysplastic or non-dysplastic. The OFA also gives the observer the additional opportunity to give a borderline score (between B and C) in cases where no clear distinction between normal and abnormal is possible. The FCI allows final reading of hips in animals one year of age and allows the observer to evaluate degenerative joint disease (DJD) subjectively if a dog is older than 2 years, giving the opportunity to score a mildly affected dog as B and not C. Thirdly, the OFA is a centralised organisation whereas the FCI allows local and national committees and even individuals to read hips. Finally, the agreement scores were estimated differently. Our agreement score is the result of a statistical analysis, whereas Keller (2003) counted the amount of radiographs that had three similar scores. In our study, we compared 30 observers and defined the number of agreeing pairs of observers. When there is no agreement between observers and scores are just given at random, this agreement score will be on average 50 per cent. It should be stated that the observers in our study were forced to read radiographs of suboptimal quality. Permitting the observers to reject radiographs of poor quality might or might not improve the interobserver agreement. We intentionally prevented observers from rejecting the radiographs for their radiographic quality in order to study the impact of radiographic quality on interobserver agreement. This research is currently underway at this institution. Quality control of radiographs for CHD screening is not or insufficiently mentioned in the other reports concerning interobserver agreement (Corley and Hogan 1985, Smith and others 1996, Journal of Small Animal Practice Vol 48 July 2007 Ó 2007 British Small Animal Veterinary Association 391

G. Verhoeven and others Smith 1997, Corley and others 1997, Keller 2003, Paster and others 2005). An additional finding of our study indicates that there is no agreement between the presumed correct assessment, based solely on the NA, and the evaluation of the experienced observers. Observers do not seem to interpret the NA values as prescribed in the FCI criteria, which is in agreement with a study by van der Velden (1983), who claimed that NA, used as a sole objective parameter, fails to meet the requirements to replace the subjective final scoring. The fact that no improved agreement is found between observers who measured the NA during scoring confirms the questionable value of the NA in diagnosing CHD. The question of validity of the NA has also been raised by Culp and others (2006), who stated that using a NA threshold of 105 (OFA) resulted in a high degree of false-positive and false-negative diagnoses. This is in accordance with other reports that studied the NA. Tomlinson and Johnson (2000) investigated the NA threshold of 105 in golden retrievers, Rottweilers and German shepherd dogs and concluded that this was not a reliable predictor of normal hip status. Smith and others (1993) found DJD in dogs with a NA of more than 105. It should be noted that, according to FCI regulations, an NA of 100 or more corresponds to grades B and A hips, respectively, indicating that the FCI uses even less stringent requirements than the OFA for determining hip quality status. Our results show disagreement in 28 per cent of pairs of observers for determining dysplasia/non-dysplasia and disagreement of the observer s evaluations with the presumed correct assessment based on the NA. According to the experienced group, the range of dogs that were dysplastic was 23 to 85 per cent, while the inexperienced group found a range of 15 to 76 per cent of dysplastic dogs. Determination of AB or not and C or not is comparable to the determination of dysplasia/nondysplasia. No consensus is found on the relationship between the refined scoring and classifying dogs as normal or abnormal. An even higher disagreement was found when assessing the agreement for the different FCI scores. Observers seem to have their own reading style. Some used a template to measure the NA, while others chose not to measure the NA. Different templates were used. The templates as shown in Figs 1 and 2 only help to determine the NA of 105, 100 and 90. Even when measuring with an accuracy of 1, measuring errors of 4 per cent can occur (Coopman and others 2006); therefore, NA measurement is not a highly reproducible technique. The higher variability seen in the inexperienced group cannot be because of measurement or no measurement of the NA as there were equal numbers of observers in both groups who chose to measure NA as those who did not. The observers were not allowed to draw on the radiographs, which may have had some impact on the accuracy of their NA measurement. However, it must be emphasised that the observers were not asked to determine the exact NA measurement but to determine dysplasia/non-dysplasia. Besides the difference in determining the NA, other reasons for the low agreement in our study could be the different viewing conditions, the time needed to read all films and observers subjectively assessing the hip quality status without following the agreed regulations. A limitation of this study is the lack of standardisation of the reading process. This could have an impact on the recognition of individual causative factors of disagreement. Further studies are needed to identify these factors or to confirm the possible causes of disagreement that are mentioned in this study. Nevertheless, we chose to investigate interobserver agreement with the reading processes currently present in the different screening committees, with the limitations that we did not allow quality assessment. Furthermore, the radiographic quality in investigating the agreement between observers in other reports is never clearly stated. We still have no idea of how quality assessment is performed in different screening committees. In the current study, quality assessment was intentionally not performed in order to avoid preselection of the presented radiographs. This will allow the evaluation of the influence of a quality assessment on interobserver agreement. Another limitation is the large number of observers used in this study, which could be responsible for the high range of scores that may influence the interobserver agreement. However, the number of observers should not have an impact on agreement in a sound CHD screening procedure. Because no exact copies of radiographs were used for the same dog, intra-observer agreement could not be investigated. In this study, observers classify dogs as dysplastic/non-dysplastic or into different categories with insufficient agreement. Conclusion Agreement between observers of different European countries who score hips following the FCI criteria is too low. There is no consensus on whether near-normal hip joints should be considered as dysplastic or not. Experience improves agreement between observers. There is insufficient agreement between the presumed correct assessment of dysplasia/non-dysplasia and the evaluation of the experienced observers, indicating that NA is not reproducibly used as a sole parameter in CHD evaluation by the experienced observers. Standardisation of film reading is highly recommended. The currently used hipextended radiographic method for CHD screening may have to be re-evaluated. A screening system that consistently allows the distinction between dysplasia/nondysplasia and between different levels of hip quality status is needed to improve interobserver agreement. Acknowledgements The authors thank the following persons for evaluating the radiographs: M. Flückiger, H. Hazewinkel, M. Kramer, J. Lang and B. Tellhelm. References BRASS, W. (1989) Hip dysplasia in dogs. Journal of Small Animal Practice 30, 166-170 CARDON, L. (2006) Delivering new disease genes. Science 314, 1403-1405 COOPMAN, F., PAEPE, D., VAN BREE, H.& SAUNDERS, J. H. (2004) The prevalence and evolution of canine hip dysplasia in Belgium (Abstract). Annual Scientific Conference Proceedings, The European Association of Veterinary Diagnostic Imaging and The European College of Veterinary Diagnostic Imaging. November 18, Ghent, Belgium. p 68 COOPMAN, F., COMHAIRE, F., SCHOONJANS, F. & DE BRABANDER, K. (2006) Hips dysplasia research at Ghent University; towards a new approach to assess hip quality? (Abstract). Proceedings of the Third International Conference: Advances in 392 Journal of Small Animal Practice Vol 48 July 2007 Ó 2007 British Small Animal Veterinary Association

Interobserver agreement in the diagnosis of CHD Canine and Feline Genomics and Inherited Disorders. August 2 to 5, Davis, CA, USA. p 43 CORLEY,E.A.&HOGAN, P. M. (1985) Trends in hip dysplasia control: analysis of radiographs submitted to the Orthopedic Foundation for Animals, 1974 to 1984. Journal of the American Veterinary Medical Association 187, 805-809 CORLEY, E. A., KELLER, G. G., LATTIMER, J. C.& ELLERSIECK, M. R. (1997) Reliability of early radiographic evaluations for canine hip dysplasia obtained from the standard ventrodorsal radiographic projection. Journal of the American Veterinary Medical Association 211, 1142-1146 CULP, W. T. N., KAPITAN, A. S., GREGOR, T. P., POWERS, M. Y., MCKELVIE, P.J.&SMITH, G. K. (2006) Evaluation of the Norberg Angle threshold: a comparison of Norberg Angle and Distraction Index as measures of coxofemoral degenerative joint disease susceptibility in seven breeds of dogs. Veterinary Surgery 35, 453-459 FLÜCKIGER, M. (1993) The standardized analysis of radiographs for hip dysplasia in dogs. Objectifying a subjective process. Kleintierpraxis 38,693-702 GIBBS, C. (1997) The BVA/KC scoring scheme for control of hip dysplasia: interpretation of criteria. Veterinary Record 13, 275-284 KELLER, G. (2003) The Use of Health Databases and Selective Breeding: A Guide for Dog and Cat Breeders and Owners. 4th edn. Orthopedic Foundation of Animals Inc., Columbia, MO, USA MORGAN, J. P., WIND, A.& DAVIDSON, A. T. (2000) Hip dysplasia. In: Hereditary Bone and Joint Diseases in the Dogs. Schlütersche, Hannover, Germany. pp 131-171 PASTER, E. R., LAFOND, E., BIERY, D. N., IRIYE, A., GREGOR, T. P., SHOFER,F.S.&SMITH, G. K. (2005) Estimates of prevalence of hip dysplasia in Golden Retrievers and Rottweilers and the influence of bias on published prevalence figures. Journal of the American Veterinary Medical Association 226, 387-392 POWERS, M. Y., BIERY, D. N., LAWLER, D. F., EVANS,R.H., SHOFER, F. S., MAYHEW, P., GREGOR, T. P., KEALY, R.D.& SMITH, G. K. (2004) Use of the caudolateral curvilinear osteophytes as an early marker for future development of osteoarthritis associated with hip dysplasia in dogs. Journal of the American Veterinary Medical Association 225, 233-237 SAUNDERS,J.H.,GODEFROID, T.,SNAPS,F.R.,FRANCOIS,A., FARNIR, F.&BALLIGAND, M. (1999) Comparison of ventrodorsal and dorsoventral radiographic projections for hip dysplasia diagnosis. Veterinary Record 145, 109-110 SCHNELLE, G. B. (1935) Some new diseases in dog. American Kennel Gazette 52, 25-26 SMITH, G. K. (1997) Advances in diagnosing canine hip dysplasia. Journal of the American Veterinary Medical Association 210, 1451-1457 SMITH, G. K., GREGOR, T. P.& RHODES, W. H. (1993) Coxofemoral joint laxity from distraction radiography and its contemporaneous and progressive correlation with laxity, subjective score, and evidence of degenerative joint disease from conventional hip-extended radiography in dogs. American Journal of Veterinary Research 54, 1021-1042 SMITH, G. K., BIERY, D. N.& RHODES, W. H. (1996) Between- and within-radiologist accuracy of subjective hip scoring of the ventrodorsal hipextended radiograph (Abstract). Proceedings of the International Symposium on Hip Dysplasia and Osteoarthritis in Dogs. Cornell University, New York, USA. p 20 TOMLINSON,J.L.&JOHNSON, J. C. (2000) Quantification of measurement of femoral head coverage and Norberg Angle within and among four breeds of dogs. American Journal of Veterinary Research 61, 1492-1500 VAN DER VELDEN, N. A. (1983) Hip dysplasia in dogs. Veterinary Quarterly 5, 3-10 Journal of Small Animal Practice Vol 48 July 2007 Ó 2007 British Small Animal Veterinary Association 393