Any recommendations for monitoring my servers?

Aux@lemmy.world · 11 months ago

Any recommendations for monitoring my servers?

MaggiWuerze@feddit.de · 11 months ago

Standard solution would be grafana + Prometheus on one server and a node exporter running on each pi. You then register the node exporters in Prometheus and use that as a data source for grafana. There you build a dashboard showing whatever metrics you want. It can also show some information about the Docker socket, like number of running/stopped containers and such.

markr@lemmy.world · 11 months ago

Add in alertmanager and hook it to slack. Get notified whenever containers or systems are misbehaving.

Aux@lemmy.world · 11 months ago

Thanks!

snekerpimp@lemmy.world · 11 months ago

Second for Netdata for the temps and load info, portainer for docker monitoring. Netdata gives you more real time info than even glances. Portainer is an easy way to look at logs and such, I don’t use it to manage, prefer command line for that. Netdata we’ll give you some docker info, but not logs.

Aux@lemmy.world · edit-2 11 months ago

I use Ansible for management, I just want to see nice graphs and maybe get alerts when things go south. Thanks for recommendation.

Nibodhika@lemmy.world · 11 months ago

I use netdata, it’s quick and easy, but I don’t think it monitors docker containers specifically.

Aux@lemmy.world · 11 months ago

I’ll take a look, thanks!

Mellow@lemmy.world · 11 months ago

Grafana, influxdb, telegraf agents. Easy to setup. Barely any configuration required. Everything you asked for in the default telegraf agent config. There are dashboards with plenty of examples on grafanas website.

Aux@lemmy.world · 11 months ago

What’s the difference between Prometheus and Telegraf? Why do you prefer Telegraf?

Mellow@lemmy.world · 11 months ago

Influxdb is a “time series” database for storing metrics. Temperatures, ram usage, cpu usage with time stamps. Telegraf is the client side agent that sends those metrics to the database in json format. Prometheus does pretty much the same thing but is a bit too bloated for my liking, so I went back to Influx.

foobaz@lemmy.world · 11 months ago

prometheus is bloated?

keyez@lemmy.world · edit-2 11 months ago

My work environments use Prometheus and node-exporter and grafana. At home I use telegraf, influxdb and grafana (and Prometheus for other app specific metrics) but the biggest reason I went with telegraf and influxdb at home is because Prometheus scrapes data from the configured clients (pull), while telegraf sends the data on the configured interval to influxdb (push) and starting my homelab adventure I had 2 VMS in the cloud and 2 pis at home and having telegraf sending the data in to my pis rather than going out and scraping made it a lot easier for that remote setup. I had influxdb setup behind a reverse proxy and auth so telegraf was sending data over TLS and needed to authenticate to just the single endpoint. That is the major difference to me, but there are also subsets of other exporters and plugins and stuff to tailor data for each one depending on what you want.

Aux@lemmy.world · 11 months ago

Ok, great to know, thanks!

ErwinLottemann@feddit.de · 11 months ago

netdata is easy to set up and detects a lot of things on it’s own like databases and ntpd and…

unchain@lemmy.world · edit-2 11 months ago

deleted by creator

johntash@eviltoast.org · 11 months ago

I didnt see it recommended yet, UptimeKuma is really simple if you just want to monitor the basics like if a url works or ping, tcp, etc without an agent.

It doesn’t do CPU/memory style metrics, but I find myself checking it more often because of how simple it is.

Aux@lemmy.world · 11 months ago

I need CPU and other metrics because recently one of my Docker containers got infected with DDOS software and CPU spike was a tell tale.

TheMurphy@lemmy.world · 11 months ago

Omg I have CPU spikes on my Raspberry Pi. Maybe it’s infected too, and how would I ever find out?

Is there some software I can run to check?

Aux@lemmy.world · 11 months ago

Are they small spikes spread across time or large chunks of heavy load, like 80%+ load for hours? If it’s the first, then probably it’s just normal operation. Otherwise check your running processes and start tracking what’s going on during high loads.

TheMurphy@lemmy.world · 11 months ago

I would say it’s 100% load for maybe 3 minutes, so maybe it’s normal.

It makes my system overload so my PiHole stops processing.

But it sounds like maybe it’s normal and a background service using too much sometimes?

Aux@lemmy.world · 11 months ago

Maybe normal, maybe not. What software do you run there?

auf@lemmy.ml · 11 months ago

I’m suggesting Homepage for someone for the second time.

https://github.com/gethomepage/homepage

Aux@lemmy.world · 11 months ago

I’ll take a look, thanks.

bmcgonag@lemmy.world · 11 months ago

I’m kind of loving Zabbix, but not sure if it’s the right solution for your needs. I’d say it would definitely work, but does take a bit of setup initially. This article is interesting, and seems to have a lot of what you want. Not sure if you want to do all of this. https://opensource.com/article/23/3/build-raspberry-pi-dashboard-appsmith

MigratingtoLemmy@lemmy.world · 11 months ago

Monit

hottari@lemmy.ml · 11 months ago

Glances should be right up your alley.

rettet_die_bilche@feddit.de · 11 months ago

Collectd on the host sending data to an openwrt docker container. You can view graphs in the openwrt luci ui

TOR-anon1@lemmy.world · 11 months ago

I don’t use Docker, so this may not help you, but I find bpytop and ssh works just fine. :)

Moonrise2473@feddit.it · 11 months ago

It’s old fashioned and maybe difficult to setup initially but I really like munin

Lemmy Tagginator@utter.online · 11 months ago

deleted by creator