5 Basic statistics

This chapter introduces a practical “starter kit” for working in R from a statistician’s perspective. Before we discuss formal statistical concepts, we need a stable workflow: how to create objects, inspect and clean vectors, manipulate data frames, summarize data, and fit a basic model. These fundamentals are what you will repeatedly use in real projects—whether you are cleaning EHR data, summarizing trial endpoints, or building an analysis dataset for modeling.

The goal here is not to memorize functions, but to understand what each operation is doing and why it matters. Most errors in applied statistics are not due to a complex model; they come from small issues early in the pipeline: missing values, incorrect variable types, accidental coercion to character, or merges that silently duplicate rows. This chapter makes those risks explicit and gives you a set of reliable patterns.

5.1 The essentials of R

R is built around objects. You create an object (vector, matrix, list, data frame), inspect it, transform it, and then use it as input to another function. When you become comfortable with object types and common manipulations, statistical workflows become much faster and safer.

5.1.1 Manipulation of vector

A vector is the simplest data structure in R: an ordered collection of values. However, vectors can hide common pitfalls—especially when they contain mixed types (numbers + characters + missing values). In practice, mixed-type vectors appear when importing data (e.g., a numeric column contains "O" due to a data entry issue).

The code below demonstrates several key diagnostics:

unique() and length() are used to quickly inspect distinct values and count how many unique entries exist—useful when checking a categorical variable, or spotting unexpected values.
as.numeric() converts the vector to numeric, but any non-numeric values become NA. This is one of the most common sources of “silent data loss” in analyses.
log() illustrates that once coercion introduces NA, downstream transformations may produce missing results.
sum(..., na.rm=TRUE) shows a safe pattern for aggregation in the presence of missing values.
sort(decreasing=TRUE) is a quick way to inspect extremes and potential outliers.
is.na() and indexing (x[!is.na(x)]) demonstrate a standard workflow for filtering out missing values.
%in% tests membership (very useful for validation checks).
grepl() performs pattern matching and is helpful for detecting problematic strings during cleaning.

library(tidyverse)
library(dplyr)
vec <- c(3,5,2,1,5,"O",NA)
length(unique(vec))

## [1] 6

num_vec <- as.numeric(vec)
log(num_vec)

## [1] 1.0986123 1.6094379 0.6931472 0.0000000 1.6094379        NA        NA

sum(c(num_vec, NA), na.rm=T)

## [1] 16

sort(num_vec, decreasing = T)

## [1] 5 5 3 2 1

is.na(num_vec)

## [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

num_vec[!is.na(num_vec)]

## [1] 3 5 2 1 5

c(5,6) %in% vec

## [1]  TRUE FALSE

grepl("5", vec)

## [1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE

A practical habit: when you coerce types (e.g., as.numeric()), always check how many NAs were created and why. If a numeric variable suddenly has many NAs, the root cause is usually dirty input values (spaces, commas, symbols, or typos like "O" instead of 0).

5.1.2 Generate sequence or repeted sequece

Simulating data, creating index variables, and generating repeated patterns are extremely common tasks in statistics. Two workhorses are:

seq() to generate sequences (e.g., time points, dose levels, grid search values).
rep() to repeat values by cycles (times) or in blocks (each), often used to build study designs or longitudinal datasets.

seq(from = 0, to = 10, by = 0.5)

##  [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0
## [16]  7.5  8.0  8.5  9.0  9.5 10.0

rep(x = 1:3, times = 4)

##  [1] 1 2 3 1 2 3 1 2 3 1 2 3

rep(x = 1:3, each = 4)

##  [1] 1 1 1 1 2 2 2 2 3 3 3 3

Conceptually: - times repeats the whole vector multiple times. - each repeats each element multiple times before moving to the next.

5.1.3 Get directory and write data out and in

A reproducible workflow needs a stable approach to file paths. getwd() tells you the working directory, and setwd() sets it. Writing and reading data are also routine steps when sharing outputs, debugging, or building analysis datasets.

Important practice notes: - If your project grows, prefer project-based workflows (e.g., RStudio projects) rather than repeatedly calling setwd(). - When exporting, keep track of whether row names are included; they can accidentally become a new column on import.

getwd()

## [1] "C:/Users/hed2/Downloads/others/mybook2/mybook2"

setwd(getwd())
write.csv(cars, "cars.csv", row.names=F)
dataframe  <- read.csv("cars.csv")

5.1.4 Function

Functions let you encapsulate repeated logic and ensure consistency. In applied statistics, functions are often used to: - standardize transformations, - compute derived variables, - generate reports, - run simulation loops.

The function below transforms x into a modified value. This is intentionally simple, but the pattern is the same for more complex analysis utilities.

my_func <- function(x){
  x_mod <- (x + 7) * 4
  return(x_mod)
}

my_func(num_vec)

## [1] 40 48 36 32 48 NA NA

Practical note: a function is safest when it handles missing values and validates input types. Even when you don’t add validation now, it helps to remember that your “future self” (or collaborator) will appreciate defensive checks.

5.1.5 Plot

Exploratory plots help you understand distributions, detect outliers, and identify relationships before modeling. Base R plotting is fast and lightweight, which is why it remains common in statistical practice.

A scatterplot (plot(y ~ x, data=...)) is the basic tool for relationships.
A histogram (hist()) checks distribution shape, skewness, and potential anomalies.

plot(dist ~ speed, data=cars)

hist(cars$dist )

5.1.6 Build model and plot

A linear model (lm) is often the first modeling step: it provides a baseline, helps you understand effect size and direction, and reveals whether a relationship is approximately linear.

This section fits a simple model and overlays the fitted regression line on the scatterplot. The additional vertical and horizontal lines serve as reference thresholds (e.g., a clinically meaningful cutoff, or a design constraint).

model <- lm(dist ~ speed, data=cars)
plot(dist ~ speed, data=cars)
abline(model)
abline(v = 25)
abline(h = 15)

In practice, it is common to annotate plots with reference lines—especially when discussing thresholds, eligibility criteria, or operational boundaries.

5.1.7 Rename names of columns

Clean variable names are more than aesthetics: they affect model formulas, joining keys, and the readability of analysis code. The code below inspects column names and then renames them.

A caution: introducing spaces (e.g., "speed per hour") makes later coding more cumbersome because you must use backticks in formulas and selection. In many applied projects, analysts prefer names like speed_per_hour for reliability.

names(cars)

## [1] "speed" "dist"

names(cars) <- c("speed per hour", "total dist")

5.1.8 Class of dataframe

Understanding classes is crucial because many R functions behave differently depending on the object type.

matrix and data.frame look similar but differ in important ways:
- A matrix is homogeneous (all values must be the same type).
- A data frame can store different types across columns (numeric, factor, character).

The code below converts cars to a matrix and back to a data frame, then checks classes. It also demonstrates transposition (t()), which is defined for matrices.

matrix <- as.matrix(cars)
df <- as.data.frame(matrix)
class(matrix)

## [1] "matrix" "array"

class(df)

## [1] "data.frame"

# tranform
t(matrix)

speed per hour	4	4	7	7	8	9	10	10	10	11	11	12	12	12	12	13	13	13	13	14	14	14	14	15	15	15	16	16	17	17	17	18	18	18	18	19	19	19	20	20	20	20	20	22	23	24	24	24	24	25
total dist	2	10	4	22	16	10	18	26	34	17	28	14	20	24	28	26	34	34	46	26	36	60	80	20	26	54	32	40	32	40	50	42	56	76	84	36	46	68	32	48	52	56	64	66	54	70	92	93	120	85

Practical warning: converting a data frame with mixed types to a matrix often forces everything to character. That can break models and summaries if you do not convert back carefully.

5.1.9 Generate new variable for dataframe (character)

Identifiers and grouping variables are often created using string concatenation. paste0() is a clean way to build IDs without spaces.

The examples below create patterned labels like "raster_1", then attach them to the data frame. These patterns are useful when simulating repeated measures or defining cluster membership.

paste0("raster_", 1:10)

##  [1] "raster_1"  "raster_2"  "raster_3"  "raster_4"  "raster_5"  "raster_6" 
##  [7] "raster_7"  "raster_8"  "raster_9"  "raster_10"

paste0("raster_", rep(x = 1:5, times = 10))

##  [1] "raster_1" "raster_2" "raster_3" "raster_4" "raster_5" "raster_1"
##  [7] "raster_2" "raster_3" "raster_4" "raster_5" "raster_1" "raster_2"
## [13] "raster_3" "raster_4" "raster_5" "raster_1" "raster_2" "raster_3"
## [19] "raster_4" "raster_5" "raster_1" "raster_2" "raster_3" "raster_4"
## [25] "raster_5" "raster_1" "raster_2" "raster_3" "raster_4" "raster_5"
## [31] "raster_1" "raster_2" "raster_3" "raster_4" "raster_5" "raster_1"
## [37] "raster_2" "raster_3" "raster_4" "raster_5" "raster_1" "raster_2"
## [43] "raster_3" "raster_4" "raster_5" "raster_1" "raster_2" "raster_3"
## [49] "raster_4" "raster_5"

df$group <- paste0("raster_", rep(x = 1:5, times = 10))
df$id <-  paste0("raster_",  1:50)

5.1.10 Create a new dataframe using ‘rnorm’ - random number from distribution

Simulation is a core skill in modern statistical practice. Here we generate: - a numeric variable (sample) from a normal distribution, - a grouping variable, - an ID variable to support merging.

The function rnorm(n, mean, sd) generates normal random variables. Rounding is used for readability.

sample <-  round((rnorm(50,0, 1)),2)
group <- paste0("raster_", rep(x = 1:5, times = 10))

df_join <- data.frame(sample, group)
df_join$id <-  paste0("raster_",  1:50)

5.1.11 Left join two dataframes

Merging tables is one of the most error-prone steps in applied analysis. left_join() keeps all rows from the left table and adds matching columns from the right table.

Key practice: - Always confirm uniqueness of the key (id) in each table before joining. - After joining, check row counts and inspect for accidental duplication.

library(dplyr)
data_all <- left_join(df, df_join, by="id")
head(data_all)

speed per hour	total dist	group.x	id	sample	group.y
4	2	raster_1	raster_1	0.84	raster_1
4	10	raster_2	raster_2	0.15	raster_2
7	4	raster_3	raster_3	-1.14	raster_3
7	22	raster_4	raster_4	1.25	raster_4
8	16	raster_5	raster_5	0.43	raster_5
9	10	raster_1	raster_6	-0.30	raster_1

5.1.12 Select variables

Selecting columns is a common step for building analysis-ready datasets. This also helps reduce clutter when checking intermediate results.

select(data_all, group.x, id  )

group.x	id
raster_1	raster_1
raster_2	raster_2
raster_3	raster_3
raster_4	raster_4
raster_5	raster_5
raster_1	raster_6
raster_2	raster_7
raster_3	raster_8
raster_4	raster_9
raster_5	raster_10
raster_1	raster_11
raster_2	raster_12
raster_3	raster_13
raster_4	raster_14
raster_5	raster_15
raster_1	raster_16
raster_2	raster_17
raster_3	raster_18
raster_4	raster_19
raster_5	raster_20
raster_1	raster_21
raster_2	raster_22
raster_3	raster_23
raster_4	raster_24
raster_5	raster_25
raster_1	raster_26
raster_2	raster_27
raster_3	raster_28
raster_4	raster_29
raster_5	raster_30
raster_1	raster_31
raster_2	raster_32
raster_3	raster_33
raster_4	raster_34
raster_5	raster_35
raster_1	raster_36
raster_2	raster_37
raster_3	raster_38
raster_4	raster_39
raster_5	raster_40
raster_1	raster_41
raster_2	raster_42
raster_3	raster_43
raster_4	raster_44
raster_5	raster_45
raster_1	raster_46
raster_2	raster_47
raster_3	raster_48
raster_4	raster_49
raster_5	raster_50

5.1.13 Filter observations

Filtering creates analytic subsets, such as: - a treatment arm, - a subgroup, - an eligibility population, - a set of observations meeting a condition.

This section shows filtering by a grouping string, and filtering by numeric conditions (with a variable name that contains spaces, requiring backticks).

raster_1 <- filter(data_all, group.x == "raster_1")
raster_1

speed per hour	total dist	group.x	id	sample	group.y
4	2	raster_1	raster_1	0.84	raster_1
9	10	raster_1	raster_6	-0.30	raster_1
11	28	raster_1	raster_11	0.55	raster_1
13	26	raster_1	raster_16	-0.21	raster_1
14	36	raster_1	raster_21	-0.40	raster_1
15	54	raster_1	raster_26	-0.03	raster_1
17	50	raster_1	raster_31	-1.55	raster_1
19	36	raster_1	raster_36	-0.50	raster_1
20	52	raster_1	raster_41	0.45	raster_1
24	70	raster_1	raster_46	-2.31	raster_1

speed_dist <- filter(data_all, data_all$`speed per hour` < 11 & data_all$`total dist` >= 10)
speed_dist

speed per hour	total dist	group.x	id	sample	group.y
4	10	raster_2	raster_2	0.15	raster_2
7	22	raster_4	raster_4	1.25	raster_4
8	16	raster_5	raster_5	0.43	raster_5
9	10	raster_1	raster_6	-0.30	raster_1
10	18	raster_2	raster_7	0.90	raster_2
10	26	raster_3	raster_8	0.88	raster_3
10	34	raster_4	raster_9	0.82	raster_4

5.1.14 Append rows

Row-binding is used when you want to stack two datasets with the same structure. This is common when combining: - multiple batches, - subsets, - cohorts.

rbind() requires matching columns (names and order). In tidyverse workflows, bind_rows() is often more forgiving, but rbind() is fine when structures match exactly.

rbind(raster_1,speed_dist)

speed per hour	total dist	group.x	id	sample	group.y
4	2	raster_1	raster_1	0.84	raster_1
9	10	raster_1	raster_6	-0.30	raster_1
11	28	raster_1	raster_11	0.55	raster_1
13	26	raster_1	raster_16	-0.21	raster_1
14	36	raster_1	raster_21	-0.40	raster_1
15	54	raster_1	raster_26	-0.03	raster_1
17	50	raster_1	raster_31	-1.55	raster_1
19	36	raster_1	raster_36	-0.50	raster_1
20	52	raster_1	raster_41	0.45	raster_1
24	70	raster_1	raster_46	-2.31	raster_1
4	10	raster_2	raster_2	0.15	raster_2
7	22	raster_4	raster_4	1.25	raster_4
8	16	raster_5	raster_5	0.43	raster_5
9	10	raster_1	raster_6	-0.30	raster_1
10	18	raster_2	raster_7	0.90	raster_2
10	26	raster_3	raster_8	0.88	raster_3
10	34	raster_4	raster_9	0.82	raster_4

5.1.15 Create new variables instead of old variables

Data cleaning often involves transforming a variable into a more usable form. Here we round sample to one decimal place. Note that mutate() returns a modified data frame; you typically assign it back if you want to keep the change.

mutate(data_all, 
       sample = round(sample,1))

speed per hour	total dist	group.x	id	sample	group.y
4	2	raster_1	raster_1	0.8	raster_1
4	10	raster_2	raster_2	0.1	raster_2
7	4	raster_3	raster_3	-1.1	raster_3
7	22	raster_4	raster_4	1.2	raster_4
8	16	raster_5	raster_5	0.4	raster_5
9	10	raster_1	raster_6	-0.3	raster_1
10	18	raster_2	raster_7	0.9	raster_2
10	26	raster_3	raster_8	0.9	raster_3
10	34	raster_4	raster_9	0.8	raster_4
11	17	raster_5	raster_10	0.7	raster_5
11	28	raster_1	raster_11	0.6	raster_1
12	14	raster_2	raster_12	-0.1	raster_2
12	20	raster_3	raster_13	-0.3	raster_3
12	24	raster_4	raster_14	-0.4	raster_4
12	28	raster_5	raster_15	-0.7	raster_5
13	26	raster_1	raster_16	-0.2	raster_1
13	34	raster_2	raster_17	-1.3	raster_2
13	34	raster_3	raster_18	2.2	raster_3
13	46	raster_4	raster_19	1.2	raster_4
14	26	raster_5	raster_20	-1.1	raster_5
14	36	raster_1	raster_21	-0.4	raster_1
14	60	raster_2	raster_22	-0.5	raster_2
14	80	raster_3	raster_23	0.8	raster_3
15	20	raster_4	raster_24	-0.1	raster_4
15	26	raster_5	raster_25	0.2	raster_5
15	54	raster_1	raster_26	0.0	raster_1
16	32	raster_2	raster_27	0.0	raster_2
16	40	raster_3	raster_28	1.4	raster_3
17	32	raster_4	raster_29	-0.2	raster_4
17	40	raster_5	raster_30	1.5	raster_5
17	50	raster_1	raster_31	-1.6	raster_1
18	42	raster_2	raster_32	0.6	raster_2
18	56	raster_3	raster_33	0.1	raster_3
18	76	raster_4	raster_34	0.2	raster_4
18	84	raster_5	raster_35	0.4	raster_5
19	36	raster_1	raster_36	-0.5	raster_1
19	46	raster_2	raster_37	-0.3	raster_2
19	68	raster_3	raster_38	-1.0	raster_3
20	32	raster_4	raster_39	-1.1	raster_4
20	48	raster_5	raster_40	0.3	raster_5
20	52	raster_1	raster_41	0.4	raster_1
20	56	raster_2	raster_42	0.0	raster_2
20	64	raster_3	raster_43	0.9	raster_3
22	66	raster_4	raster_44	2.0	raster_4
23	54	raster_5	raster_45	-0.5	raster_5
24	70	raster_1	raster_46	-2.3	raster_1
24	92	raster_2	raster_47	1.0	raster_2
24	93	raster_3	raster_48	-0.7	raster_3
24	120	raster_4	raster_49	-0.7	raster_4
25	85	raster_5	raster_50	1.0	raster_5

5.1.16 summarise statistics

Summarization produces descriptive statistics and quick QA checks. In practice, it is a good idea to confirm that you are summarizing the intended variables and that the variable types are correct.

A practical note for this code chunk: max("total dist") will not compute the maximum of the column; it is taking a character string. In real analyses, always verify that your summary outputs look plausible.

 summarise(data_all,
          mean_speed = mean(sample),
          max_dist = max( "total dist" ))

mean_speed	max_dist
0.1104	total dist

5.1.17 Group dataframe then summarise statistics

Grouping is essential for stratified summaries (by arm, site, subgroup, visit). The typical pattern is:

group_by()
summarise()

This yields one row per group.

data_all_group <-   group_by(data_all, group.x)   
 summarise(data_all_group, 
          mean_speed = mean(sample),
          max_dist = max( "total dist" ))

mean_speed	max_dist
0.1104	total dist

5.1.18 Ungroup then summarise statistics

After group operations, the data may remain grouped. ungroup() removes grouping, which prevents unexpected behavior in later steps.

This is a common best practice: ungroup after grouped summaries unless you intentionally want grouping to persist.

ungroup_data <- ungroup( data_all_group)
 summarise(  ungroup_data , 
          mean_speed = mean(sample),
          max_dist = max( "total dist" ))

mean_speed	max_dist
0.1104	total dist

5.1.19 Summary linear regression model

This section fits a linear regression using the renamed columns. The summary() output provides: - coefficient estimates, - standard errors, - t-tests and p-values (under standard assumptions), - R-squared and residual standard error.

Even when you plan to use more advanced models, a simple linear regression is a valuable baseline for interpretation and for detecting obvious data issues.

mod1 <- lm(cars$`total dist` ~ cars$`speed per hour` )
summary(mod1)

## 
## Call:
## lm(formula = cars$`total dist` ~ cars$`speed per hour`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -17.5791     6.7584  -2.601   0.0123 *  
## cars$`speed per hour`   3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

5.1.20 Create frequency table

Frequency tables help you check distributions across groups, detect empty cells, and validate merges.

Two-way tables are also a quick way to identify whether a categorical variable is unevenly distributed across groups.

table(data_all_group$`speed per hour`,data_all_group$group.x  )

/	raster_1	raster_2	raster_3	raster_4	raster_5
4	1	1	0	0	0
7	0	0	1	1	0
8	0	0	0	0	1
9	1	0	0	0	0
10	0	1	1	1	0
11	1	0	0	0	1
12	0	1	1	1	1
13	1	1	1	1	0
14	1	1	1	0	1
15	1	0	0	1	1
16	0	1	1	0	0
17	1	0	0	1	1
18	0	1	1	1	1
19	1	1	1	0	0
20	1	1	1	1	1
22	0	0	0	1	0
23	0	0	0	0	1
24	1	1	1	1	0
25	0	0	0	0	1

5.1.21 Value and variable label

Labels are especially useful for reporting, tables, and clinical datasets where you want human-readable metadata. This section shows:

inspecting levels of a factor,
relabeling factor levels,
adding a variable label using Hmisc::label().

table(iris$Species)

setosa	versicolor	virginica
50	50	50

iris$Species <- factor(iris$Species,labels = c( "setosanew","versicolornew","virginianew"))
table(iris$Species)

setosanew	versicolornew	virginianew
50	50	50

library(Hmisc)

label(iris$Species) <- "Species types"
table(iris$Species)

setosanew	versicolornew	virginianew
50	50	50

In applied work, consistent labeling helps downstream reporting tools and reduces ambiguity when sharing datasets with collaborators.

5.1.22 Recode a variable

Recoding is frequently used to: - create categorical versions of continuous variables, - define risk groups, - implement analysis definitions (e.g., responder/non-responder).

This chunk uses nested ifelse() to create a derived variable based on Sepal.Length. While nested ifelse() works, in complex real projects, case_when() is often clearer and less error-prone. The key concept remains: define rules explicitly and validate results with a frequency table.

irisifelse <-  iris%>% 
mutate(Sepal.Length2 = ifelse(Sepal.Length < 6 , "level1", ifelse(Sepal.Length < 7 , "level2", Sepal.Length)))

table(irisifelse$Sepal.Length2)

7	7.1	7.2	7.3	7.4	7.6	7.7	7.9	level1	level2
1	1	3	1	1	1	4	1	83	54

5.2 Central Limit Theorem

The Central Limit Theorem (CLT) is one of the most important ideas in statistics: it justifies why normal-based inference often works even when the underlying data are not normal, as long as sample sizes are reasonably large and observations are independent.

In practice, the CLT supports: - approximate confidence intervals for means, - normal approximations for many estimators, - reasoning about sampling variability.

see here

5.3 Common statistical distribution

Statistical distributions are the language of uncertainty. In applied work, you encounter them in: - modeling outcomes (normal, binomial, Poisson), - generating simulations, - defining priors and likelihoods, - interpreting p-values and confidence intervals.

see here

Chapter takeaways

By the end of this chapter, you should be comfortable with:

Inspecting vectors, handling missing values, and diagnosing coercion issues
Generating sequences and repeated patterns for indexing and simulation
Reading/writing data and understanding the working directory
Writing simple functions to standardize repeated steps
Making quick exploratory plots
Fitting and interpreting a basic linear regression
Managing variable names, classes, and joins
Building group-wise summaries and validating derived variables

These are not “intro programming trivia”—they are the daily tools of statistical practice. Once these fundamentals are stable, you can scale up to robust workflows: reproducible reporting, simulation-based power analysis, and model-based inference.

/	raster_1	raster_2	raster_3	raster_4	raster_5
4	1	1	0	0	0
7	0	0	1	1	0
8	0	0	0	0	1
9	1	0	0	0	0
10	0	1	1	1	0
11	1	0	0	0	1
12	0	1	1	1	1
13	1	1	1	1	0
14	1	1	1	0	1
15	1	0	0	1	1
16	0	1	1	0	0
17	1	0	0	1	1
18	0	1	1	1	1
19	1	1	1	0	0
20	1	1	1	1	1
22	0	0	0	1	0
23	0	0	0	0	1
24	1	1	1	1	0
25	0	0	0	0	1

/	raster_1	raster_2	raster_3	raster_4	raster_5
4	1	1	0	0	0
7	0	0	1	1	0
8	0	0	0	0	1
9	1	0	0	0	0
10	0	1	1	1	0
11	1	0	0	0	1
12	0	1	1	1	1
13	1	1	1	1	0
14	1	1	1	0	1
15	1	0	0	1	1
16	0	1	1	0	0
17	1	0	0	1	1
18	0	1	1	1	1
19	1	1	1	0	0
20	1	1	1	1	1
22	0	0	0	1	0
23	0	0	0	0	1
24	1	1	1	1	0
25	0	0	0	0	1

/	raster_1	raster_2	raster_3	raster_4	raster_5
4	1	1	0	0	0
7	0	0	1	1	0
8	0	0	0	0	1
9	1	0	0	0	0
10	0	1	1	1	0
11	1	0	0	0	1
12	0	1	1	1	1
13	1	1	1	1	0
14	1	1	1	0	1
15	1	0	0	1	1
16	0	1	1	0	0
17	1	0	0	1	1
18	0	1	1	1	1
19	1	1	1	0	0
20	1	1	1	1	1
22	0	0	0	1	0
23	0	0	0	0	1
24	1	1	1	1	0
25	0	0	0	0	1